Hi Martin,
For option 1, actually there are two ways for deployment on the Jetson platform.See https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/ for more info. One is to utilize this etlt model. Another is to utilize the TRT engine. But you need to use another version of tlt-converter to generate the TRT engine. See tlt doc.
“For the Jetson platform, the tlt-converter is available to download in the dev zone here. Once the tlt-converter is downloaded, please follow the instructions metioned below to generate a TensorRT engine.”
For option 2, see tlt doc, for dGPU platform, yes, you can utilize the generated TRT engine in deepsteam.
“For deployment platforms with an x86 based CPU and discrete GPU’s, the tlt-converter is distributed within the TLT docker. Therefore, it is suggested to use the docker to generate the engine. However, this requires that the user adhere to the same minor version of TensorRT as distributed with the docker. The TLT docker includes TensorRT version 5.1.5. In order to use the engine with a different minor version of TensorRT, it would be best to copy over the converter from /opt/nvidia/tools/tlt-converter to the target machine and follow the instructions mentioned below to run it and generate a TensorRT engine.”
Hi Martin,
There is a .h264 video inside Deepstream sdk. You can try to run with it.
As to use the deepstream with images instead of h264, that is really a deepstream topic. I think you can serach some guidance from Deepstream document or forum.
Can you please clarify if in the retrain spec, the pretrained model should be:
pretrained_model: “/workspace/tlt-experiments/data/faster_rcnn/model_2_pruned.tlt”
or
pretrained_weights: “/workspace/nvidia-tlt/data/faster_rcnn/resnet18.h5”
If it’s the second option, won’t retraining start from the initial pretrained model and not the trained + pruned model?
I did some retraining using model_2_pruned. When I run evaluation / inference on the retrained model, I notice that it is not faster than the originally trained model. So is the use of pruning only to reduce the model size and has no speed implications?
Also, faster_rcnn with ResNet18 is taking about 80-90 ms / frame on Tesla P100 with frame size of 1024 x 576. Isn’t this too slow?
I have a PyTorch version with ResNet101 which takes the same time / frame with frame size 1280 x 720 on the same GPU.
After pruning with tlt-prune, the trainable params and tlt model size will be reduced.
There is prune ratio in the log. Less tlt model size, higher the FPS.
You can try different threshold “pth” to get different tlt models and their FPS.
Note that consider the trade off between mAP and FPS.
For you case, are you running in Nano?
More, compared to Faster-rcnn, TLT detectnet_v2 has a better combination for mAP and FPS.
I have a prune ratio of around 0.5, and the model size on disk has gone down accordingly. But the inference time is the same as before. Not sure why, any idea?
Right now my experiments are all on nvidia T4 on a Google Compute Engine instance, but a team-mate is working on deploying on DeepStream on the Jetson Nano. Hopefully it’ll work at a decent FPS.
I am very familiar with Faster R-CNN from previous projects, so chose this one. But can look at DetecNet if you say it is better for both speed and accuracy.
It does not make sense for “the inference time is the same as before”.
Can you elaborate how did you generate etlt model, and how did you generate trt engine, how did you test the inference time?
Commands and full logs are appreciated.
This inference time is on the google cloud instance, not a Jetson Nano. So I’m just using tlt-infer faster_rcnn -e spec_file.txt, with the spec file specifying the pruned and retrained model (tlt file). At this point, there is no need for an etlt model or a trt engine, right?
The tlt-infer command runs the inference on input images.
It is not related to inference time.
The easier way for checking the inference time is that, for example, if you already generate fp16 etlt model and then fp16 trt engine, then you can run with trtexec tool.
Thanks a lot, managed to export to etlt and then create an engine and run trtexec. Getting 53 ms for ResNet50 in fp32 datatype, ad 17 ms in fp16. That’s very good.
However, a bit disappointed with recall and precision values. You’ve already recommended DetectNet. Do you have any benchmark AP numbers for a standard dataset like Coco or Pascal VOC? Even KITTI, just to know what to expect? Ideally, with a comparison of DetectNet and FasterRCNN.
Hi chandrachud,
Sorry, below pre-trained models are not available yet. They are planned to available in next TLT relase.
resnet:34, resnet:101, resnet:152
Different pruning ratios will result in different size of tlt models and trt engines, and also different mAP and FPS. More, different network or other training parameters in training spec will affect the mAP. You can trigger some test for standard dataset on your side.