Jetson Xavier benchmarks mismatch

Hi,

I am using the Jetson AGX Xavier with the latest JetPack 4.1.1 (TensorRT 5.0)
I was trying to duplicate results with the benchmarks posted on this site:
[url]https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks[/url]
and found out, I have a gap between the published results and my results.

Can you guide me how to get the same results?


My only interest is in ResNet-50 graph with Batch-size=8.

The published results show:
LATENCY (ms) = 11.2 for 15W Mode
LATENCY (ms) = 6.2 for MAX-N Mode

I assume they used this command:
./trtexec --avgRuns=100 --deploy=resnet50.prototxt --int8 --batch=8 --iterations=10000 --output=prob --useSpinWait

Witch is for GPU only with int8 precision.
(Using DLA is X3 slower with fp16 VS GPU only with fp16)

Please see my ./trtexec output prints using the same command (except --iterations=10):

(15W mode)

avgRuns: 1000
deploy: /home/nvidia/Networks/ResNet-50/deploy.prototxt
int8
batch: 8
iterations: 10
output: prob
useSpinWait
Input “data”: 3x224x224
Output “prob”: 1000x1x1

name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 1000 runs is 14.3147 ms (host walltime is 14.3454 ms, 99% percentile time is 14.3826).
Average over 1000 runs is 14.2869 ms (host walltime is 14.3124 ms, 99% percentile time is 14.3984).
Average over 1000 runs is 14.2821 ms (host walltime is 14.308 ms, 99% percentile time is 14.3534).

(MAX-N Mode)

avgRuns: 100
deploy: /home/nvidia/Networks/ResNet-50/deploy.prototxt
int8
batch: 8
iterations: 10
output: prob
useSpinWait
Input “data”: 3x224x224
Output “prob”: 1000x1x1

name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 100 runs is 9.6837 ms (host walltime is 9.69914 ms, 99% percentile time is 33.8719).
Average over 100 runs is 7.48239 ms (host walltime is 7.49908 ms, 99% percentile time is 8.92989).
Average over 100 runs is 7.49587 ms (host walltime is 7.50919 ms, 99% percentile time is 8.79376).
Average over 100 runs is 7.47715 ms (host walltime is 7.49505 ms, 99% percentile time is 8.53834).

Any idea why the gap between published benchmarks and mine?

Moving to Jetson AGX Xavier devtalk for support coverage.

Hi tavorbental, as mentioned on the page, the benchmark results report the cumulative performance from the concurrent use of GPU (INT8) and two DLAs (FP16). You can launch three instances of trtexec simultaneously, with one instance running per device, as seen in the example commands here:

[url]Jetson Benchmarks | NVIDIA Developer

Hi dusty,

Thank you for the answer,
but can you please be more specific about how you cumulate the performance?

When I run the trtexec simultaneously (MAXN mode):

int8
batch: 6
iterations: 10
output: prob
useSpinWait
Input “data”: 3x224x224
Output “prob”: 1000x1x1

Average over 1000 runs is 6.90604 ms (host walltime is 6.99801 ms, 99% percentile time is 8.89805).

fp16
batch: 1
iterations: 10
output: prob
useSpinWait
useDLACore: 0
allowGPUFallback
Input “data”: 3x224x224
Output “prob”: 1000x1x1

Average over 1000 runs is 7.66789 ms (host walltime is 8.55767 ms, 99% percentile time is 8.84243).

fp16
batch: 1
iterations: 10
output: prob
useSpinWait
useDLACore: 1
allowGPUFallback
Input “data”: 3x224x224
Output “prob”: 1000x1x1

Average over 1000 runs is 7.65092 ms (host walltime is 8.47453 ms, 99% percentile time is 8.77722).

So I had a batch=6 for the GPU and 2x batch=1 for each of the DLA’s.
Because they all run simultaneously - I think the slower one is take into account.
So it’s reasonable for me that Resnet50 batch=8 will have 7.65092 ms

Only when I run the GPU with int8 and batch=6 - I am able to get the same performance.

int8
batch: 6
iterations: 10
output: prob
useSpinWait
Input “data”: 3x224x224
Output “prob”: 1000x1x1

Average over 1000 runs is 6.23396 ms (host walltime is 6.27553 ms, 99% percentile time is 8.75008).

What did I miss?

Thanks,
Bental

Hi Bental, run all 3 devices with batch size 8, as if there is an asynchronous queue of images coming in that need processed. Idea is to measure the sustained throughput.

Calculate the images per second from each trt-exec instance by taking 1000 / latency × batchSize. Then add these these 3 figures together to get the cumulative images per second processed by the system.

To get the average latency of the system, take 1000 / cumulative images per second × batchSize.

Hi, we’ve been running a similar benchmark and to get reasonable results we had to perform measurements by using a larger number of images. Have a look at our report for computer vision scenario: Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Anyway, let’s get in touch, we specialize in NVIDIA architectures.

Hello,

I am regenerating numbers for Resnet-50.
My doubt is,

How to pick 1 latency value from whole trtexec output?
I mean which value to consider?

-Average of all latencies? (115.195+116.495+… 80.447/num of iterations)
-Last latency value among 10000 iterations? (80.4473 ms)
-or least latency value among 10000 iterations? (80.4473 ms)

Average over 100 runs is 115.195 ms (host walltime is 117.162 ms, 99% percentile time is 123.003).
Average over 100 runs is 116.495 ms (host walltime is 118.372 ms, 99% percentile time is 121.337).
Average over 100 runs is 116.474 ms (host walltime is 118.403 ms, 99% percentile time is 120.689).
Average over 100 runs is 116.56 ms (host walltime is 118.522 ms, 99% percentile time is 121.085).
Average over 100 runs is 116.542 ms (host walltime is 118.433 ms, 99% percentile time is 121.146).
Average over 100 runs is 116.459 ms (host walltime is 118.423 ms, 99% percentile time is 120.822).
Average over 100 runs is 117.279 ms (host walltime is 119.162 ms, 99% percentile time is 133.677).
Average over 100 runs is 94.3713 ms (host walltime is 95.3969 ms, 99% percentile time is 122.374).
Average over 100 runs is 89.473 ms (host walltime is 90.1063 ms, 99% percentile time is 107.742).
Average over 100 runs is 87.3919 ms (host walltime is 88.4426 ms, 99% percentile time is 106.499).
Average over 100 runs is 88.3755 ms (host walltime is 89.69 ms, 99% percentile time is 100.572).
Average over 100 runs is 87.3318 ms (host walltime is 88.034 ms, 99% percentile time is 102.486).
Average over 100 runs is 86.0923 ms (host walltime is 87.2495 ms, 99% percentile time is 92.6218).
Average over 100 runs is 84.7149 ms (host walltime is 85.9822 ms, 99% percentile time is 97.3916).
Average over 100 runs is 81.712 ms (host walltime is 82.9645 ms, 99% percentile time is 83.3956).
Average over 100 runs is 85.1401 ms (host walltime is 86.3638 ms, 99% percentile time is 95.7614).
Average over 100 runs is 87.6545 ms (host walltime is 88.9401 ms, 99% percentile time is 95.2939).
Average over 100 runs is 88.959 ms (host walltime is 90.2963 ms, 99% percentile time is 100.425).
Average over 100 runs is 85.7611 ms (host walltime is 87.0619 ms, 99% percentile time is 96.8633).
Average over 100 runs is 80.4473 ms (host walltime is 81.6053 ms, 99% percentile time is 91.5625).

one more surprising observation is, in case of Resnet-50 and Googlenet, DLAs have almost same latency with concurrency(GPU+DLA0+DLA1) or running alone(DLA0).
While in case of Mobilenet and MobilenetSSD, increase in latency is observed in case of concurrent runs which is logical. Has anybody observed similar thing? Any reason behind it?

Many thanks in advance…

Hi,

It’s known that GPU has a longer launch time at the beginning.
So it’s recommended to use the average value without initial few run.
Like average for the 6st to the last one will be good.

For the second question, some operation in MobileNet is not supported by the DLA.
So it will fallback the implementation into GPU, which introduces some latency.

Thanks.

Should we use Host latency or GPU Compute time to compare back to the NVIDIA website? Can you highlight which one is used for NVIDIA’s reference to latency in the equation?

[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.33357 ms - Host latency: 9.49573 ms (end to end 9.75729 ms, enqueue 0.771269 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.75525 ms - Host latency: 7.91821 ms (end to end 8.06902 ms, enqueue 0.665713 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.11953 ms - Host latency: 8.29716 ms (end to end 8.34821 ms, enqueue 0.599188 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.67792 ms - Host latency: 8.84181 ms (end to end 8.92145 ms, enqueue 0.620007 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.09347 ms - Host latency: 9.26399 ms (end to end 9.34272 ms, enqueue 0.674097 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.2568 ms - Host latency: 9.4375 ms (end to end 9.53371 ms, enqueue 0.672026 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.57769 ms - Host latency: 8.74689 ms (end to end 8.82151 ms, enqueue 0.633691 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.50127 ms - Host latency: 8.67333 ms (end to end 8.75274 ms, enqueue 0.65248 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.63877 ms - Host latency: 8.81456 ms (end to end 8.90126 ms, enqueue 0.702876 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.71312 ms - Host latency: 8.89045 ms (end to end 9.0246 ms, enqueue 0.685059 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.94486 ms - Host latency: 8.12813 ms (end to end 8.21245 ms, enqueue 0.674316 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.77395 ms - Host latency: 7.95902 ms (end to end 7.99351 ms, enqueue 0.625469 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.88592 ms - Host latency: 8.07253 ms (end to end 8.10584 ms, enqueue 0.674346 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.78973 ms - Host latency: 8.99992 ms (end to end 9.0258 ms, enqueue 0.684346 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.57253 ms - Host latency: 8.77654 ms (end to end 8.80773 ms, enqueue 0.64541 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.68471 ms - Host latency: 8.89101 ms (end to end 8.92079 ms, enqueue 0.642725 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.60036 ms - Host latency: 8.80706 ms (end to end 8.83261 ms, enqueue 0.647256 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.47596 ms - Host latency: 8.67953 ms (end to end 8.71638 ms, enqueue 0.650225 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.67571 ms - Host latency: 8.88292 ms (end to end 8.91492 ms, enqueue 0.631416 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.63535 ms - Host latency: 8.84291 ms (end to end 8.89926 ms, enqueue 0.648496 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.5765 ms - Host latency: 8.78148 ms (end to end 8.80338 ms, enqueue 0.661523 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.69139 ms - Host latency: 8.89865 ms (end to end 8.94328 ms, enqueue 0.666582 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.6159 ms - Host latency: 8.82232 ms (end to end 8.85398 ms, enqueue 0.628594 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.68561 ms - Host latency: 8.89582 ms (end to end 8.92158 ms, enqueue 0.626816 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.64994 ms - Host latency: 8.85719 ms (end to end 8.89605 ms, enqueue 0.672793 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.74006 ms - Host latency: 8.9502 ms (end to end 8.9707 ms, enqueue 0.650391 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.66104 ms - Host latency: 8.87182 ms (end to end 8.89398 ms, enqueue 0.637109 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.61973 ms - Host latency: 8.83105 ms (end to end 8.87371 ms, enqueue 0.630117 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.7499 ms - Host latency: 8.95922 ms (end to end 8.98853 ms, enqueue 0.677441 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.80172 ms - Host latency: 9.01305 ms (end to end 9.04193 ms, enqueue 0.647207 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.73859 ms - Host latency: 8.95027 ms (end to end 8.9826 ms, enqueue 0.652871 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.78955 ms - Host latency: 9.0009 ms (end to end 9.03297 ms, enqueue 0.6725 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.83809 ms - Host latency: 9.04889 ms (end to end 9.084 ms, enqueue 0.656719 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.7985 ms - Host latency: 9.01031 ms (end to end 9.03568 ms, enqueue 0.660605 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.72455 ms - Host latency: 8.93684 ms (end to end 8.96062 ms, enqueue 0.678691 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.46176 ms - Host latency: 9.69186 ms (end to end 9.71023 ms, enqueue 0.698242 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.37133 ms - Host latency: 9.59941 ms (end to end 9.62773 ms, enqueue 0.651348 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.42059 ms - Host latency: 9.64734 ms (end to end 9.68957 ms, enqueue 0.711836 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.49023 ms - Host latency: 9.7184 ms (end to end 9.74086 ms, enqueue 0.696328 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.49898 ms - Host latency: 9.72859 ms (end to end 9.76164 ms, enqueue 0.637617 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.5834 ms - Host latency: 9.81402 ms (end to end 9.84582 ms, enqueue 0.734922 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.40363 ms - Host latency: 9.63164 ms (end to end 9.66906 ms, enqueue 0.666914 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.60965 ms - Host latency: 9.83996 ms (end to end 9.87727 ms, enqueue 0.668906 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.52805 ms - Host latency: 9.75617 ms (end to end 9.78383 ms, enqueue 0.688086 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 9.20418 ms - Host latency: 9.41672 ms (end to end 9.45926 ms, enqueue 0.685859 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.4818 ms - Host latency: 8.68152 ms (end to end 8.74734 ms, enqueue 0.663594 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.47594 ms - Host latency: 8.6757 ms (end to end 8.72547 ms, enqueue 0.64207 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.56469 ms - Host latency: 8.74938 ms (end to end 8.7841 ms, enqueue 0.691445 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.13559 ms - Host latency: 8.30418 ms (end to end 8.32059 ms, enqueue 0.871445 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.14871 ms - Host latency: 8.31844 ms (end to end 8.33004 ms, enqueue 0.877187 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.16281 ms - Host latency: 8.33254 ms (end to end 8.34617 ms, enqueue 0.891602 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.02547 ms - Host latency: 8.19352 ms (end to end 8.20547 ms, enqueue 0.944258 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.0927 ms - Host latency: 8.2607 ms (end to end 8.27289 ms, enqueue 0.859844 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.12102 ms - Host latency: 8.28895 ms (end to end 8.30758 ms, enqueue 0.904844 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.19523 ms - Host latency: 8.3607 ms (end to end 8.3766 ms, enqueue 0.895312 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.08953 ms - Host latency: 8.25535 ms (end to end 8.26551 ms, enqueue 0.886133 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.02008 ms - Host latency: 8.18461 ms (end to end 8.20219 ms, enqueue 0.887773 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.08355 ms - Host latency: 8.2482 ms (end to end 8.27184 ms, enqueue 0.834063 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.9857 ms - Host latency: 8.14992 ms (end to end 8.16871 ms, enqueue 0.856797 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.19918 ms - Host latency: 8.3641 ms (end to end 8.38414 ms, enqueue 0.857734 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.13555 ms - Host latency: 8.2932 ms (end to end 8.3398 ms, enqueue 0.755625 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.26937 ms - Host latency: 8.42551 ms (end to end 8.47734 ms, enqueue 0.759219 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.70559 ms - Host latency: 8.87527 ms (end to end 8.9043 ms, enqueue 0.837539 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.1634 ms - Host latency: 8.32754 ms (end to end 8.34813 ms, enqueue 0.813633 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.17625 ms - Host latency: 8.34074 ms (end to end 8.3718 ms, enqueue 0.860117 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.11848 ms - Host latency: 8.28055 ms (end to end 8.3173 ms, enqueue 0.817773 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.98367 ms - Host latency: 8.14824 ms (end to end 8.16082 ms, enqueue 0.898945 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.33008 ms - Host latency: 8.49824 ms (end to end 8.52687 ms, enqueue 0.863281 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.82094 ms - Host latency: 7.97488 ms (end to end 8.01293 ms, enqueue 0.793047 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.55594 ms - Host latency: 7.71176 ms (end to end 7.72293 ms, enqueue 0.897539 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.05793 ms - Host latency: 8.22414 ms (end to end 8.23867 ms, enqueue 0.870938 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.01602 ms - Host latency: 8.18379 ms (end to end 8.19996 ms, enqueue 0.916055 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.83637 ms - Host latency: 7.99883 ms (end to end 8.01313 ms, enqueue 0.931016 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.73207 ms - Host latency: 7.88773 ms (end to end 7.89844 ms, enqueue 0.855586 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.79383 ms - Host latency: 7.95219 ms (end to end 7.96996 ms, enqueue 0.873984 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.84797 ms - Host latency: 8.00992 ms (end to end 8.0357 ms, enqueue 0.872891 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.48164 ms - Host latency: 7.63508 ms (end to end 7.64719 ms, enqueue 0.876094 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.24594 ms - Host latency: 8.41281 ms (end to end 8.43914 ms, enqueue 0.965469 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.80094 ms - Host latency: 7.95445 ms (end to end 8.00039 ms, enqueue 0.857188 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.13891 ms - Host latency: 8.30141 ms (end to end 8.32758 ms, enqueue 0.826563 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.28484 ms - Host latency: 8.44719 ms (end to end 8.48773 ms, enqueue 0.757812 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.08539 ms - Host latency: 8.24031 ms (end to end 8.28531 ms, enqueue 0.721406 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.44867 ms - Host latency: 7.59984 ms (end to end 7.62352 ms, enqueue 0.830078 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.9725 ms - Host latency: 8.12664 ms (end to end 8.14602 ms, enqueue 0.696719 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.16 ms - Host latency: 8.32539 ms (end to end 8.33797 ms, enqueue 0.836172 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.31859 ms - Host latency: 8.48594 ms (end to end 8.51492 ms, enqueue 0.832812 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.32758 ms - Host latency: 7.47781 ms (end to end 7.50031 ms, enqueue 0.872891 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.23727 ms - Host latency: 8.40172 ms (end to end 8.42711 ms, enqueue 0.846016 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.20641 ms - Host latency: 8.36578 ms (end to end 8.40961 ms, enqueue 0.738047 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.85203 ms - Host latency: 8.00539 ms (end to end 8.04656 ms, enqueue 0.723203 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.83727 ms - Host latency: 7.99086 ms (end to end 8.01375 ms, enqueue 0.723672 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.3943 ms - Host latency: 8.55953 ms (end to end 8.58352 ms, enqueue 0.757969 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.58969 ms - Host latency: 7.74039 ms (end to end 7.77 ms, enqueue 0.799453 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.55516 ms - Host latency: 8.72523 ms (end to end 8.76688 ms, enqueue 0.740391 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.88031 ms - Host latency: 8.03359 ms (end to end 8.09367 ms, enqueue 0.726016 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.02133 ms - Host latency: 8.17594 ms (end to end 8.22453 ms, enqueue 0.755234 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.66148 ms - Host latency: 7.81281 ms (end to end 7.83414 ms, enqueue 0.789141 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 8.2293 ms - Host latency: 8.3943 ms (end to end 8.43273 ms, enqueue 0.808047 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.87906 ms - Host latency: 8.04117 ms (end to end 8.05414 ms, enqueue 0.842578 ms)
[04/05/2021-12:36:30] [I] Average on 100 runs - GPU latency: 7.72289 ms - Host latency: 7.88188 ms (end to end 7.89523 ms, enqueue 0.857656 ms)
[04/05/2021-12:36:30] [I] Host Latency
[04/05/2021-12:36:30] [I] min: 6.76562 ms (end to end 6.84555 ms)
[04/05/2021-12:36:30] [I] max: 47.05 ms (end to end 47.1611 ms)
[04/05/2021-12:36:30] [I] mean: 8.56727 ms (end to end 8.60459 ms)
[04/05/2021-12:36:30] [I] median: 8.40625 ms (end to end 8.42188 ms)
[04/05/2021-12:36:30] [I] percentile: 11.5508 ms at 99% (end to end 11.6953 ms at 99%)
[04/05/2021-12:36:30] [I] throughput: 929.695 qps
[04/05/2021-12:36:30] [I] walltime: 86.0497 s
[04/05/2021-12:36:30] [I] Enqueue Time
[04/05/2021-12:36:30] [I] min: 0.420166 ms
[04/05/2021-12:36:30] [I] max: 13.7656 ms
[04/05/2021-12:36:30] [I] median: 0.738281 ms
[04/05/2021-12:36:30] [I] GPU Compute
[04/05/2021-12:36:30] [I] min: 6.60791 ms
[04/05/2021-12:36:30] [I] max: 46.8982 ms
[04/05/2021-12:36:30] [I] mean: 8.38616 ms
[04/05/2021-12:36:30] [I] median: 8.23438 ms
[04/05/2021-12:36:30] [I] percentile: 11.3398 ms at 99%
[04/05/2021-12:36:30] [I] total compute time: 83.8616 s
&&&& PASSED TensorRT.trtexec # ./trtexec --avgRuns=100 --deploy=/usr/src/tensorrt/data/resnet50/ResNet50_N2.prototxt --int8 --batch=8 --iterations=10000 --output=prob