After my previous post I decided there were too many variables.
Here is a sample all can do.
Get this package:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/tensorrt
I modified the script for debug purposes as follows:
diff tftrt_sample.py tftrt_sample.py.org
92d91
< print( datetime.datetime.now(), " getResnet50" )
111d109
< print( datetime.datetime.now(), " getFP32" )
121d118
< print( datetime.datetime.now(), " getFP16" )
146c143
< print(datetime.datetime.now(), "Starting execution")
---
> tf.logging.info("Starting execution")
172c169
< print(datetime.datetime.now(), " Starting Warmup cycle")
---
> tf.logging.info("Starting Warmup cycle")
203c200
< print(datetime.datetime.now(), "Warmup done. Starting real timing")
---
> tf.logging.info("Warmup done. Starting real timing")
267,268c264
< print(datetime.datetime.now(), " Starting")
<
---
> print("Starting at",datetime.datetime.now())
I also removed the --INT8 option from run_all.sh
After $ run_all > TX2.log
The log contains:
Namespace(FP16=True, FP32=True, INT8=False, batch_size=4, dump_diff=False, native=True, num_loops=10, topN=5, update_graphdef=False, with_timeline=False, workspace_size=2048)
2019-04-13 23:26:59.405493 Starting
2019-04-13 23:27:06.691047 getResnet50
2019-04-13 23:27:08.879296 Starting execution
2019-04-13 23:27:12.120849 Starting Warmup cycle
2019-04-13 23:27:38.190371 Warmup done. Starting real timing
iter 0 0.1170225191116333
iter 1 0.11706938743591308
iter 2 0.11715104579925537
iter 3 0.11713536262512207
iter 4 0.11703117370605469
iter 5 0.11687781810760497
iter 6 0.11692732810974121
iter 7 0.11688094139099121
iter 8 0.11711055755615235
iter 9 0.11685168743133545
Comparison= True
images/s : 34.2 +/- 0.0, s/batch: 0.11701 +/- 0.00011
RES, Native, 4, 34.19, 0.03, 0.11701, 0.00011
2019-04-13 23:28:39.120928 getFP32
2019-04-13 23:28:39.122388 getResnet50
2019-04-14 00:05:18.587516 Starting execution
2019-04-14 00:39:11.500612 Starting Warmup cycle
2019-04-14 00:39:55.384542 Warmup done. Starting real timing
iter 0 0.06356308937072754
iter 1 0.06371050834655761
iter 2 0.06345504283905029
iter 3 0.06329115867614746
iter 4 0.06343845844268799
iter 5 0.06320501804351807
iter 6 0.06346035480499268
iter 7 0.0631892728805542
iter 8 0.06757570266723632
iter 9 0.06330945014953614
Comparison= True
images/s : 62.7 +/- 1.2, s/batch: 0.06382 +/- 0.00126
RES, TRT-FP32, 4, 62.68, 1.18, 0.06382, 0.00126
2019-04-14 00:41:19.378257 getFP16
2019-04-14 00:41:19.380426 getResnet50
2019-04-14 00:59:41.581313 Starting execution
2019-04-14 01:32:10.168278 Starting Warmup cycle
2019-04-14 01:32:42.214924 Warmup done. Starting real timing
iter 0 0.03612914085388184
iter 1 0.03567664623260498
iter 2 0.03541929721832275
iter 3 0.03596384525299072
iter 4 0.03592778205871582
iter 5 0.035593876838684084
iter 6 0.0354670524597168
iter 7 0.03562225341796875
iter 8 0.03560783863067627
iter 9 0.035287847518920896
Comparison= True
images/s : 112.1 +/- 0.8, s/batch: 0.03567 +/- 0.00025
RES, TRT-FP16, 4, 112.14, 0.78, 0.03567, 0.00025
Done timing 2019-04-14 01:33:36.285660
native ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
FP32 ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
FP16 ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
This snippet is the issue, why sooooo looooong.
2019-04-13 23:28:39.120928 getFP32
2019-04-13 23:28:39.122388 getResnet50
2019-04-14 00:05:18.587516 Starting execution
2019-04-14 00:39:11.500612 Starting Warmup cycle
2019-04-14 00:39:55.384542 Warmup done. Starting real timing
On Xavier the same script runs much quicker.
RES, TRT-FP32, 4, 160.46, 0.48, 0.02493, 0.00008
2019-04-13 23:18:35.111292 getFP16
2019-04-13 23:18:35.111525 getResnet50
2019-04-13 23:21:33.759369 Starting execution
2019-04-13 23:22:10.313379 Starting Warmup cycle
2019-04-13 23:22:11.314511 Warmup done. Starting real timing
Anyone have any thoughts?