TensorRT fails to build FasterRCNN GIE model with using INT8

Hi,

I am working with TensorRT 2.1 right now, namely I am trying to quantize the Faster R-CNN model provided with examples. I adopted the entropy INT8 calibrator and batch stream from the sampleINT8 (I generated the batches data by myself so it corresponds to the Faster RCNN input), however when trying to build the CUDA engine, I am getting the following error:

Begin parsing model...
End parsing model...
Begin building engine...
cudnnEngine.cpp (330) - Cuda Error in execute: 11
faster-rcnn_debug: cudnnBuilder2.cpp:798: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion `it != tensorScales.end()' failed.
Aborted (core dumped)

Unfortunately, these error messages are not very informative. Could you shed some light on what is going on there? What is the tensorScales?

Is it possible to quantize a model that uses IPlugin (i.e. custom) layers?

I’ve asked our engineers to look at this… they may be able to provide better feedback.

You should be able to run INT8 optimizations on a network that uses IPlugin layers, however currently only the non-plugin layers will be optimized to INT8. The output from the previous layer will be expressed in FP32 for the plugin layers and the reverse will be done after the plugin layer.

Thank you! I can provide my code and the batches data for debugging purposes.

Hi.
“Assertion `it != tensorScales.end()’ failed” this error indicates that the TensorRT builder fails to find the scale for some tensor after reading the calibration table you generated with your batch data. As shown in sampleINT8.cpp, a “CalibrationTable” will be cached into disk once you finished the calibration process. The next time you run your model with INT8, the TensorRT builder will load the “CalibrationTable” (if the “readCache” to Int8EntropyCalibrator is True) so you don’t need to run the calibration over and over again. So I guess there’s a line missing in your calibration table, probably was deleted accidentally.
Here’s what my calibration table for the Faster R-CNN sample looks like:
1
conv2_1: 41e5412c
conv2_2: 4244967a
rpn_cls_prob_reshape: 3c010a14
conv5_1: 3fee34d9
pool2: 426f8ecd
bbox_pred: 3c58fe39
rpn_cls_prob: 3c010a14
fc6: 3dc55582
conv1_2: 41b3533e
im_info: 407c07af
conv5_3: 3e9fc861
conv4_2: 412a62a8
conv3_3: 4231c606
rpn_cls_score_reshape: 3e0323eb
pool3: 4231c606
conv3_1: 42454e34
conv5_2: 3f2e24b5
cls_prob: 3c010a14
pool4: 40a72b3b
conv4_3: 40a72b3b
rpn_bbox_pred: 3c972c9d
conv1_1: 40570513
conv3_2: 422a8c9c
rpn_cls_score: 3e0323eb
conv4_1: 41b48585
data: 3f99411a
rpn/output: 3d37fa38
rois: 407b86a5
count: 2
cls_score: 3e8b776b
pool5: 3ebc089c
pool1: 41b44fa8
fc7: 3d19e7bf

Hi

How do you use tensorRT to run faster rcnn detector? Can you give some suggestions?

thanks.

You can build Faster R-CNN sample locally by simply “make” under TensorRT-2.1.2/samples/sampleFasterRCNN assuming you have CUDA and CUDNN installed. Then you need to download the weights (caffemodel) from the link the author of Faster R-CNN gave. Please follow the instructions in TensorRT-2.1.2/samples/sampleFasterRCNN/README.txt. Once you have the model and you added the TensorRT-2.1.2/lib to your library path, you can run the sample with the images in TensorRT-2.1.2/data/faster-rcnn. Or you can run the detector with your own images. Here’s how you convert other image formats to PPM format (You also need to change the sample code to specify the image you want to test):

https://devtalk.nvidia.com/default/topic/1014350/gpu-accelerated-libraries/tensorrt-sampleminist-somequestion-about-readpgmfile/?offset=3#5176864

Thank you for reply!

Indeed, I found my CalibrationTable has only one record for some reason…

1
data: 3f99411a

When I remove the CalibrationTable file and re-run the executable to create one, I can see numerous errors “Cuda Error in execute: 11”:

Begin parsing model...
End parsing model...
Begin building engine...
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
cudnnEngine.cpp (330) - Cuda Error in execute: 11
faster-rcnn: cudnnBuilder2.cpp:798: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion `it != tensorScales.end()' failed.
Aborted (core dumped)

The number of errors corresponds to the number of calibration batches, so I suppose the calibration records are not created due to the failed processing.

Can you tell me what does error 11 mean?

Hi,

“11” means “cudaErrorInvalidValue” which indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.

In fact, here the TensorRT engine is trying to copy your input bindings. Faster R-CNN has 2 inputs: data and im_info. Please make sure you

  • bind them to the buffers
  • successfully allocate the buffers
  • and copy the input data to the buffers
    as indicated in the sampleFasterRCNN.cpp (within the “doInference” function, line 195-242)

Please also notice that im_info’s N dimension is equal to the batch size. We should provide the image info for each image within a batch.

Hi FanYE,

How to find the directory where tensorrt has been installed?

Thanks.

Hi,

I assume that you were doing debian install. You can also do tar file install in which case you are able to choose the directory where you want to place the package.

Hi FanYE,

thank you for the useful hint! I noticed that my calibrator only produces inputs to the ‘data’, but not to the ‘im_info’. By adding this:

CHECK(cudaMalloc(&mImInfoInput, batchSize * 3 * sizeof(float)));                                                                                                                                                            
                                                                                                                                                                                                                                    
float *imInfo = new float[batchSize * 3];                                                                                                                                                                                   
for (int i = 0; i < batchSize; i++) {                                                                                                                                                                                       
    imInfo[i * 3] = 375.f;     // num of rows                                                                                                                                                                               
    imInfo[i * 3 + 1] = 500.f; // num of colums                                                                                                                                                                             
    imInfo[i * 3 + 2] = 1.f;   // image scale                                                                                                                                                                               }                                                                                                                                                                                                                           
                                                                                                                                                                                                                                    
CHECK(cudaMemcpy(mImInfoInput, imInfo, batchSize * 3 * sizeof(float), cudaMemcpyHostToDevice));                                                                                                                             
delete[] imInfo;          

....

bool getBatch(void *bindings[], const char *names[], int nbBindings) override {                                                                                                                                                 
    if (!mStream.next()) return false;                                                                                                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                    
    CHECK(cudaMemcpy(mDataInput, mStream.getBatch(), mInputCount * sizeof(float), cudaMemcpyHostToDevice));                                                                                                                     
    assert(!strcmp(names[0], "data"));                                                                                                                                                                                          
    assert(!strcmp(names[1], "im_info"));                                                                                                                                                                                       
    bindings[0] = mDataInput;                                                                                                                                                                                                   
    bindings[1] = mImInfoInput;                                                                                                                                                                                                 
    return true;                                                                                                                                                                                                                
}

I managed to advance further with building the engine. Now my calibration table looks like this:

1
conv2_1: 41f0dd3e
conv2_2: 424cdb8f
rpn_cls_prob_reshape: 3c010a14
conv5_1: 3ff64fde
pool2: 426cdfde
bbox_pred: 3c6982d2
rpn_cls_prob: 3c010a14
fc6: 3dcd9807
conv1_2: 418b7870
im_info: 407c07af
conv5_3: 3eb8f7d3
conv4_2: 411f4097
conv3_3: 42220c2c
rpn_cls_score_reshape: 3dce83b0
pool3: 422b6ad9
conv3_1: 42453a13
conv5_2: 3f43b5ba
cls_prob: 3c010a14
pool4: 40a5b66e
conv4_3: 409a1add
rpn_bbox_pred: 3ca4020f
conv1_1: 40559583
conv3_2: 421dd9c9
rpn_cls_score: 3dce83b0
conv4_1: 419dc337
data: 3f99411a
rpn/output: 3d44d8de
rois: 407b86a5
count: 2
cls_score: 3ea9f0cb
pool5: 3ee02f66
pool1: 41973147
fc7: 3d43e3f0

Now I am getting this error during building the engine:

Begin building engine...
faster-rcnn_debug: cudnnBuilderWeightConverters.cpp:118: float nvinfer1::builder::makeFullyConnectedInt8Weights(nvinfer1::FullyConnectedParameters&, const nvinfer1::cudnn::EngineTensor&, const nvinfer1::cudnn::EngineTensor&, nvinfer1::CpuMemoryGroup&, bool): Assertion `in.region->getDimensions() == in.extent' failed.

I would be greatly appreciated for further help.

Hi,

Glad to see the progress you made with generating the complete calibration table. The new error that you encountered is a known bug. We are actively working on it and will be fixed in an upcoming release. Please stay tuned!

Hi FanYE,

I have downloaded the tar file, but I have no idea how to run it. Can you do me a favor?

Thanks.

Hi,

  1. Ensure that you have the dependencies satisfied. TensorRT 2.1 requires CUDA 7.5 or 8.0 and cuDNN 6.0.20
  2. Choose where you want to install. This tar file will install everything into a directory called TensorRT-2.1. This directory will have subdirectories like lib, include, src, etc…
  3. Unpack the tarfile: tar xf TensorRT-2.1.2.x86_64.cuda-8.0.tar.bz2 -C /install dir
    Note: there is a known issue with the simlinks. They will point to the aarch64 files included for cross development purposes, please update them to point to the correct directory with:
    ln -sf targets/x86_64-linux-gnu/bin
    ln -sf targets/x86_64-linux-gnu/lib
    ln -sf targets/x86_64-linux-gnu/samples
  4. You can build Faster R-CNN sample locally by simply “make” under TensorRT-2.1.2/samples/sampleFasterRCNN assuming you have CUDA and CUDNN installed. Then you need to download the weights (caffemodel) from the link the author of Faster R-CNN gave. Please follow the instructions in TensorRT-2.1.2/samples/sampleFasterRCNN/README.txt. Once you have the model and you added the TensorRT-2.1.2/lib to your library path, you can run the sample with the images in TensorRT-2.1.2/data/faster-rcnn. Or you can run the detector with your own images. Here’s how you convert other image formats to PPM format (You also need to change the sample code to specify the image you want to test):

https://devtalk.nvidia.com/default/topic/1014350/gpu-accelerated-libraries/tensorrt-sampleminist-somequestion-about-readpgmfile/?offset=3#5176864

Hi FanYE,

I have two questions:
1 how to do forward pass using cpu mode in TensorRT?
2 how to change the “im_info” so that it can support pvanet instead of vgg16?

Thanks.

Hi,

1 how to do forward pass using cpu mode in TensorRT?

TensorRT is meant to be operated with GPU for production. We do not have such a CPU mode in TensorRT.

2 how to change the “im_info” so that it can support pvanet instead of vgg16?

In this example, im_info is unique to the sample Faster RCNN. If you want to deploy other networks, you need to create your own sample. And feed the protobuf that defines the network to the TensorRT, then implement the layers that do not exist in TensorRT (not in the native layers nor in the plugin layer library), then recalibrate the weights and activations in order to run inference with INT8.

MISSENT DUPLICATE CONTENT

MISSENT DUPLICATE CONTENT

I don’t know how to generate batches files in faster-rcnn with using TensorRT INT8 inference, can you give me some suggestions in it?

Please refer to the sampleINT8 in the TensorRT package. (samples/sampleINT8)