Hi everyone
I finally managed to run tensorrt implementation on Jetson TX2 and fasterrcnn sample as provided in /usr/src/tensorrt. Although execution time per frame comes around 320 ms which is way slower than ~20 FPS.
I also tried running jetson_clock.sh to see if it improves anything. But there is no improvement.
To find time taken , I am using following :
// run inference
long long time_start = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count();
doInference(*context, data, imInfo, bboxPreds, clsProbs, rois, N);
long long time_end = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count() ;
cout << "Total time =" <<(time_end - time_start)<< std::endl;
Will be great if someone can tell me what needs to be done to achieve even 10 FPS using tensorrt on the given example.
Fyi : I installed jetpack 3.1
I also tried running jetson_clock.sh , but inference time per frame is same as before 320ms which is way less than expected 10-20 FPS
Hi,
Please use our DetectNet sample to get 10fps object detection pipeline:
[url]https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet[/url]
The target of sampleFasterRCNN is to demonstrate plugin API implementation.
Please use DetectNet for better performance.
Thanks.
Can we finetune provided FasterRCNN to achieve better FPS ?
Hi,
1. In sampleFasterRCNN, please noticed that we by default set batchSize=2.
2. In doInference() function, it contains memory allocation, copy buffer from host to device, inference, copy buffer from device to host, release memory.
Usually, you only need to apply memory allocation/release when initialization.
For zero-copy pipeline, ex. MMAPI sample, you don’t need to transfer data between host and device.
So, try to set batchsize=1 and optimize the pipeline with the zero-copy sample of MMAPI.
Thanks.