Slow inference using tensorrt sampleFasterRCNN, 320ms/frame

hrsht.sarma · February 12, 2018, 2:09pm

Hi everyone

I finally managed to run tensorrt implementation on Jetson TX2 and fasterrcnn sample as provided in /usr/src/tensorrt. Although execution time per frame comes around 320 ms which is way slower than ~20 FPS.

I also tried running jetson_clock.sh to see if it improves anything. But there is no improvement.

To find time taken , I am using following :

// run inference
  long long time_start = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count();
	doInference(*context, data, imInfo, bboxPreds, clsProbs, rois, N);
  long long time_end = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count() ;
  cout << "Total time =" <<(time_end - time_start)<< std::endl;

Will be great if someone can tell me what needs to be done to achieve even 10 FPS using tensorrt on the given example.
Fyi : I installed jetpack 3.1

hrsht.sarma · February 13, 2018, 5:59am

I also tried running jetson_clock.sh , but inference time per frame is same as before 320ms which is way less than expected 10-20 FPS

AastaLLL · February 13, 2018, 7:11am

Hi,

Please use our DetectNet sample to get 10fps object detection pipeline:
[url]https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet[/url]

The target of sampleFasterRCNN is to demonstrate plugin API implementation.
Please use DetectNet for better performance.

Thanks.

hrsht.sarma · February 13, 2018, 7:34am

Can we finetune provided FasterRCNN to achieve better FPS ?

AastaLLL · February 21, 2018, 6:16am

Hi,

1. In sampleFasterRCNN, please noticed that we by default set batchSize=2.

2. In doInference() function, it contains memory allocation, copy buffer from host to device, inference, copy buffer from device to host, release memory.
Usually, you only need to apply memory allocation/release when initialization.
For zero-copy pipeline, ex. MMAPI sample, you don’t need to transfer data between host and device.

So, try to set batchsize=1 and optimize the pipeline with the zero-copy sample of MMAPI.
Thanks.