Faster RCNN ROI issue

Morganh · November 22, 2019, 2:54am

Hi Martin,
For option 1, actually there are two ways for deployment on the Jetson platform.See https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/ for more info. One is to utilize this etlt model. Another is to utilize the TRT engine. But you need to use another version of tlt-converter to generate the TRT engine. See tlt doc.
“For the Jetson platform, the tlt-converter is available to download in the dev zone here. Once the tlt-converter is downloaded, please follow the instructions metioned below to generate a TensorRT engine.”

For option 2, see tlt doc, for dGPU platform, yes, you can utilize the generated TRT engine in deepsteam.
“For deployment platforms with an x86 based CPU and discrete GPU’s, the tlt-converter is distributed within the TLT docker. Therefore, it is suggested to use the docker to generate the engine. However, this requires that the user adhere to the same minor version of TensorRT as distributed with the docker. The TLT docker includes TensorRT version 5.1.5. In order to use the engine with a different minor version of TensorRT, it would be best to copy over the converter from /opt/nvidia/tools/tlt-converter to the target machine and follow the instructions mentioned below to run it and generate a TensorRT engine.”

mbufi · November 22, 2019, 9:37pm

Hi Morgan,

I decided to take my converted int8 .etlt model and use tensorRT to convert it to an int8 engine on the Xavier.

I have done all the steps in the deepstream github, recompliled, replaced files, changed labels.txt along with the pgie file. The whole nine yards.

Everything works great but I cannot run the sample because it is expecting a .h264 video.

Can I not use the deepstream with images? What if I wanted to grab images off a webcam with opencv and process each frame. How can I do this?

Please let me know.

Thank you!

Morganh · November 23, 2019, 2:45pm

Hi Martin,
There is a .h264 video inside Deepstream sdk. You can try to run with it.
As to use the deepstream with images instead of h264, that is really a deepstream topic. I think you can serach some guidance from Deepstream document or forum.

cbasavaraj · January 18, 2020, 11:58am

In the retrain spec, it makes sense to use the pruned model as the pre-trained model right? Otherwise, wouldn’t you be starting from scratch again? So I feel Martin is right. See this particular post: https://devtalk.nvidia.com/default/topic/1065592/transfer-learning-toolkit/faster-rcnn-roi-issue/post/5399053/#5399053

cbasavaraj · January 21, 2020, 8:17pm

Hi @morganh,

Can you please clarify if in the retrain spec, the pretrained model should be:

pretrained_model: “/workspace/tlt-experiments/data/faster_rcnn/model_2_pruned.tlt”
or
pretrained_weights: “/workspace/nvidia-tlt/data/faster_rcnn/resnet18.h5”

If it’s the second option, won’t retraining start from the initial pretrained model and not the trained + pruned model?

Thanks

Morganh · January 22, 2020, 2:53am

Hi chandrachud,
Yes, if you set below in retrain spec,

pretrained_weights: "/workspace/nvidia-tlt/data/faster_rcnn/resnet18.h5"

You will retrain from the initial pretrained weight which is downloaded from ngc.

If you set below in retrain spec,

pretrained_model: "/workspace/tlt-experiments/data/faster_rcnn/model_2_pruned.tlt"

That means you retrain based on the trained_and_pruned model.

cbasavaraj · January 22, 2020, 9:17pm

Great, thanks.

I did some retraining using model_2_pruned. When I run evaluation / inference on the retrained model, I notice that it is not faster than the originally trained model. So is the use of pruning only to reduce the model size and has no speed implications?

Also, faster_rcnn with ResNet18 is taking about 80-90 ms / frame on Tesla P100 with frame size of 1024 x 576. Isn’t this too slow?
I have a PyTorch version with ResNet101 which takes the same time / frame with frame size 1280 x 720 on the same GPU.

Thanks

Morganh · January 23, 2020, 12:03pm

After pruning with tlt-prune, the trainable params and tlt model size will be reduced.
There is prune ratio in the log. Less tlt model size, higher the FPS.
You can try different threshold “pth” to get different tlt models and their FPS.
Note that consider the trade off between mAP and FPS.

For you case, are you running in Nano?
More, compared to Faster-rcnn, TLT detectnet_v2 has a better combination for mAP and FPS.

cbasavaraj · January 27, 2020, 8:57pm

I have a prune ratio of around 0.5, and the model size on disk has gone down accordingly. But the inference time is the same as before. Not sure why, any idea?

Right now my experiments are all on nvidia T4 on a Google Compute Engine instance, but a team-mate is working on deploying on DeepStream on the Jetson Nano. Hopefully it’ll work at a decent FPS.

I am very familiar with Faster R-CNN from previous projects, so chose this one. But can look at DetecNet if you say it is better for both speed and accuracy.

Thanks again!

Morganh · January 28, 2020, 4:06am

It does not make sense for “the inference time is the same as before”.
Can you elaborate how did you generate etlt model, and how did you generate trt engine, how did you test the inference time?
Commands and full logs are appreciated.

cbasavaraj · January 28, 2020, 9:59am

This inference time is on the google cloud instance, not a Jetson Nano. So I’m just using tlt-infer faster_rcnn -e spec_file.txt, with the spec file specifying the pruned and retrained model (tlt file). At this point, there is no need for an etlt model or a trt engine, right?

Morganh · January 28, 2020, 11:55am

The tlt-infer command runs the inference on input images.
It is not related to inference time.

The easier way for checking the inference time is that, for example, if you already generate fp16 etlt model and then fp16 trt engine, then you can run with trtexec tool.

$ /usr/src/tensorrt/bin/trtexec --fp16 --loadEngine=xxx.engine --batch=1 --iterations=100 --output=dense_regress/BiasAdd,dense_class/Softmax,proposal --useSpinWait

cbasavaraj · January 30, 2020, 8:55pm

Thanks a lot, managed to export to etlt and then create an engine and run trtexec. Getting 53 ms for ResNet50 in fp32 datatype, ad 17 ms in fp16. That’s very good.

However, a bit disappointed with recall and precision values. You’ve already recommended DetectNet. Do you have any benchmark AP numbers for a standard dataset like Coco or Pascal VOC? Even KITTI, just to know what to expect? Ideally, with a comparison of DetectNet and FasterRCNN.

Finally, I know FasterRCNN is still in beta. Right now, there’s only ResNet 18 and ResNet 50 backbones supported. Any idea by when ResNet 101 will be supported? It’s mentioned as an option here: Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

Thanks for all your answers!

Morganh · January 31, 2020, 11:43am

Hi chandrachud,
Sorry, below pre-trained models are not available yet. They are planned to available in next TLT relase.

resnet:34, resnet:101, resnet:152

Different pruning ratios will result in different size of tlt models and trt engines, and also different mAP and FPS. More, different network or other training parameters in training spec will affect the mAP. You can trigger some test for standard dataset on your side.