How to do inference with a TLT faster rcnn model?

steventel · December 30, 2019, 3:25pm

Hello everyone,

I have trained a frcnn_resnet18 model with transfer learning toolkit using the docker downloaded from NGC on my host machine.

I can do inference with the deepstream custom app given in the IVA Getting Started Guide, it’s seems to work well on the nano.

My objective now is to run inference with only tensorrt, for this I use the tensorrt sample wich works well with faster rcnn models trained with tensorflow or caffe (and optimized for tensorrt with a uff parser).

The inference of a SSD model trained with TLT and converted to a TRT engine is succefully executed, but it’s not working for a Faster RCNN model: The inference is running but the outputs of the network are weird, the position of the bouding boxes are always between 1 and 3.

Is the post-processing of a Faster RCNN model trained with TLT differents?

Deepstream custom app: GitHub - NVIDIA-AI-IOT/deepstream_4.x_apps: deepstream 4.x samples to deploy TLT training models

Used Tensorrt sample: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleFasterRCNN

Morganh · December 30, 2019, 5:16pm

Hi Steventel,
Is there any log for “The inference is running but the outputs of the network are weird”?
For “the position of the bouding boxes are always between 1 and 3.”, what do you mean by “1” and “3”? Do you refer to class id?

const std::string CLASSES[OUTPUT_CLS_SIZE]{"background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike    ", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"};

steventel · December 31, 2019, 9:14am

Hi Morganh,

The problem is the position of the bouding boxes, the predicted class seems to be right as the confidence score. When I get the output image, i see little bouding boxes in the left top corner of my image. Even if the positions are wrong, the right of the bouding box is always greater than the left, same fact for the bottom and the top of the bouding boxe, that why i think the problem is about the post processing of the bouding boxes, maybe I’m wrong.

I have printed the output of the network after the post processing (line 349 of the tensorrt sample: https://github.com/NVIDIA/TensorRT/blob/572d54f91791448c015e74a4f1d6923b77b79795/samples/opensource/sampleFasterRCNN/sampleFasterRCNN.cpp#L349):

[confidence score] class: [class id]
 [top]  [bottom]

0.999971 classe: 1
0.321713     0     1.50604     1.10493
0.999692 classe: 1
0.807946     0.528997     1.99202     1.64138
0.999317 classe: 1
0.778197     0.560918     1.92445     1.67343
0.999923 classe: 1
0.120525     0.320375     1.2985     1.43041
0.999951 classe: 1
0.340466     0.760247     1.50538     1.87404
0.999848 classe: 1
0.318116     0.635189     1.48855     1.74495
0.999972 classe: 1
0.728438     0.607852     1.88925     1.7232
0.999976 classe: 1
0.624839     0.653931     1.80533     1.76596
0.999318 classe: 1
0.59753     0.0701577     1.78314     1.18366
0.999658 classe: 1
0.664063     0.294367     1.8377     1.406
0.99945 classe: 1
0.874567     0.214179     2     1.33724
0.998457 classe: 1
0.777496     0.441411     1.94824     1.55648

Update:

I have forgotten to notify that I’m not able to get the “im_info” binding from a faster rcnn trained with TLT.

Morganh · December 31, 2019, 9:18am

Hi steventel,
In TLT the RoI coordinates are (y1, x1, y2, x2), while in Caffe, it is (x1, y1, x2, y2).

steventel · December 31, 2019, 10:53am

Thanks for your answer, but does it make any difference to post-processing?

Morganh · December 31, 2019, 2:47pm

The main difference for TLT fasterRCNN and Caffe FasterRCNN postprocessing is the RoI coordinates, as mentioned above.

steventel · February 7, 2020, 2:30pm

Hi Morganh,

Thanks for your answer. Now we have same results with our own C++ application and with the Deepstream sample.

However, we cannot get the same result as with tlt-infer (even with deepstream). We use the same network with the same image size.

I give you some examples of detected boxes (with a black image, I cannot give you the true image):

The good results with tlt-infer:

The results with deepstream or our C++ application:

Is it normal to not get the same results even with the deepstream sample?

Morganh · February 13, 2020, 1:58pm

Hi steventel,
Sorry for late reply.We’re investigating the difference.
For the result with deepstream or your C++ application, you were using trt engine to do inference, right?

I summarize you result as below. Please correct me if any.

tlt-infer + tlt model (good result)
deepstream + trt fp16 engine (not good)
your C++ application + trt fp16 engine (not good)
deepstream + etlt model (unknown)

Could you check the result of above item 4? Thanks.

Morganh · February 13, 2020, 3:42pm

More ideas for the reason of your results.

1.Make sure the visualization confidence threshold, the NMS parameters, etc are the same between tlt-infer and trt inference.

2.The tlt-infer uses fp32 data type. What is the data type in your trt inference? Is it fp16? Lower precision data type can get worse result.

3.What is the rate of detection mismatching? If the rate is high, I’m afraid there is something wrong in the inference code or deepstream configuration.

steventel · February 17, 2020, 1:04pm

Hi Morganh,

I summarize my result as below

tlt-infer + tlt model → good result
deepstream + trt fp16 or fp32 engine (generated automaticaly by deepstream from etlt) → not good
my C++ application + trt fp16 or fp32 engine → not good
deepstream + etlt model → not good
I’ve checked that visualization confidence threshold are the same (0.6), same for NMS parameters.
I have tested with fp32 and fp16 models. Got similar bad results.
Yes the rate seems to be high, for this reason, I’ve just send you in private message my tlt training folder with some images, and also the deepstream sample application used.

Thanks for your help

Morganh · February 17, 2020, 3:06pm

Thanks steventel for the information. It is helpful.
Our internal team is trying to find where is the gap.

Morganh · February 20, 2020, 3:44am

Hi steventel,
Could you please update your latest result per our offline syncing? Thanks.

Morganh · February 20, 2020, 3:48am

You mentioned issue is gone after you changed to use TLT 1.0.1 docker.

Could you please share your TLT 1.0 training spec and TLT 1.0.1 training spec?

Morganh · February 20, 2020, 7:00am

We identified a bug in current release in FRCNN that “pool_size_2x: True” is not properly handled.

We will fix this issue in next release.

Morganh · May 22, 2020, 7:28am

Just reminder: the 2.0_dp tlt is released on May 1st.