How to build the objection detection framework SSD with tensorRT on tx2?
Currently,I have been build the objection detection framework SSD with https://github.com/weiliu89/caffe/tree/ssd on TX2,but the speed is only about 4 frames per second.Hence,I want to speed it using tensorRT on TX2.
Currently,I have been build the objection detection framework SSD with https://github.com/weiliu89/caffe/tree/ssd on TX2,but the speed is only about 4 frames per second.Hence,I want to speed it using tensorRT on TX2.

#1
Posted 05/05/2017 04:02 AM   
Hi, Thanks for your question. Please first check if layers used in your model is supported by tensorRT. If yes, TensorRT can support caffe format. Here is the sample code: https://github.com/dusty-nv/jetson-inference
Answer Accepted by Forum Admin
Hi,
Thanks for your question.

Please first check if layers used in your model is supported by tensorRT.

If yes, TensorRT can support caffe format.
Here is the sample code: https://github.com/dusty-nv/jetson-inference

#2
Posted 05/05/2017 05:25 AM   
[quote=""]Hi, Thanks for your question. Please first check if layers used in your model is supported by tensorRT. If yes, TensorRT can support caffe format. Here is the sample code: https://github.com/dusty-nv/jetson-inference [/quote] Hi AastaLLL, I have installed Jetpack3.1 on my Jetson_tx2.Is it possible to implement these layers in the SSD framework,such as PriorBox,Normalize,Concat,Permute,Flatten.If it can, can you give a specific example?
said:Hi,
Thanks for your question.

Please first check if layers used in your model is supported by tensorRT.

If yes, TensorRT can support caffe format.
Here is the sample code: https://github.com/dusty-nv/jetson-inference


Hi AastaLLL,
I have installed Jetpack3.1 on my Jetson_tx2.Is it possible to implement these layers in the SSD framework,such as PriorBox,Normalize,Concat,Permute,Flatten.If it can, can you give a specific example?

#3
Posted 08/01/2017 01:45 AM   
Hi, Please check sampleFasterRCNN, samplePlugin for details. Located at /usr/src/tensorrt/
Hi,

Please check sampleFasterRCNN, samplePlugin for details.

Located at /usr/src/tensorrt/

#4
Posted 08/01/2017 02:33 AM   
[quote=""]Hi, Please check sampleFasterRCNN, samplePlugin for details. Located at /usr/src/tensorrt/ [/quote] Hi, AastaLLL Error occurred,when I tested my prototxt file which included Deconvolution layer with TensorRT2.1.The error code in my Terminal:[code] Begin parsing model... Caffe Parser: groups are not supported for deconvolutions error parsing layer type Deconvolution index 139 End parsing model... Segmentation fault (core dumped) [/code] When I check the TensorRT2-1-User-Guide.pdf,I kown that The tensorRT2.1 support the implementation of this layer.[code] Deconvolution The Deconvolution layer implements a deconvolution, with or without bias. [/code] How to slove the problem?Could you give me some suggestion?
said:Hi,

Please check sampleFasterRCNN, samplePlugin for details.

Located at /usr/src/tensorrt/

Hi, AastaLLL
Error occurred,when I tested my prototxt file which included Deconvolution layer with TensorRT2.1.The error code in my Terminal:
Begin parsing model...
Caffe Parser: groups are not supported for deconvolutions
error parsing layer type Deconvolution index 139
End parsing model...
Segmentation fault (core dumped)

When I check the TensorRT2-1-User-Guide.pdf,I kown that The tensorRT2.1 support the implementation of this layer.
Deconvolution
The Deconvolution layer implements a deconvolution, with or without bias.

How to slove the problem?Could you give me some suggestion?

#5
Posted 08/04/2017 07:07 AM   
Hi, Could you share your deconvolution definition? TensorRT doesn't support: 1. deconv kernel size != stride 2. group property Thanks.
Hi,

Could you share your deconvolution definition?
TensorRT doesn't support:
1. deconv kernel size != stride
2. group property

Thanks.

#6
Posted 08/07/2017 01:54 AM   
[quote=""]Hi, Could you share your deconvolution definition? TensorRT doesn't support: 1. deconv kernel size != stride 2. group property Thanks.[/quote] Hi, This is my deconvolution definition. [code] layer { name: "upsample" type: "Deconvolution" bottom: "inc4e" top: "upsample" param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 256 kernel_size: 4 stride: 2 pad: 1 group: 256 weight_filler: { type: "bilinear" } bias_term: false } } [/code] It includes group property.How to slove the problem?
said:Hi,

Could you share your deconvolution definition?
TensorRT doesn't support:
1. deconv kernel size != stride
2. group property

Thanks.

Hi,
This is my deconvolution definition.
layer {
name: "upsample"
type: "Deconvolution"
bottom: "inc4e"
top: "upsample"
param { lr_mult: 0 decay_mult: 0 }
convolution_param {
num_output: 256
kernel_size: 4 stride: 2 pad: 1
group: 256
weight_filler: { type: "bilinear" }
bias_term: false
}
}


It includes group property.How to slove the problem?

#7
Posted 08/07/2017 08:10 AM   
Hi, Deconvolution layer of TensorRT doesn't support: 1. kernel size != stride 2. group property These two features are in our next release plan. Currently, we have custom API to allow a user to implement non-supported layer by their own. Thanks and sorry for the inconvenience.
Hi,

Deconvolution layer of TensorRT doesn't support:
1. kernel size != stride
2. group property

These two features are in our next release plan.
Currently, we have custom API to allow a user to implement non-supported layer by their own.

Thanks and sorry for the inconvenience.

#8
Posted 08/08/2017 02:06 AM   
Hi, We have written a face-recognition sample to demonstrate TensorRT2.1 Plugin API. Please check this GitHub for more details: https://github.com/AastaNV/Face-Recognition
Hi,

We have written a face-recognition sample to demonstrate TensorRT2.1 Plugin API.
Please check this GitHub for more details:

https://github.com/AastaNV/Face-Recognition

#9
Posted 08/25/2017 05:38 AM   
[quote=""]Hi, We have written a face-recognition sample to demonstrate TensorRT2.1 Plugin API. Please check this GitHub for more details: https://github.com/AastaNV/Face-Recognition[/quote] Hi, I'm very happy you let me know this new example at the first time! I test the demo on jetson-tx2. Some debug appeared. [code] ./face-recognition Building and running a GPU inference engine for /home/nvidia/Face-Recognition/data/deploy.prototxt, N=1... [gstreamer] initialized gstreamer, version 1.8.3.0 [gstreamer] gstreamer decoder pipeline string: nvcamerasrc fpsRange="30.0 30.0" ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink successfully initialized video device width: 1280 height: 720 depth: 12 (bpp) Bindings after deserializing: Binding 0 (data): Input. Binding 1 (coverage_fd): Output. Binding 2 (bboxes_fd): Output. Binding 3 (count_fd): Output. Binding 4 (bbox_fr): Output. Binding 5 (bbox_id): Output. Binding 6 (softmax_fr): Output. Binding 7 (label): Output. loaded image /home/nvidia/Face-Recognition/data/fontmapA.png (256 x 512) 2097152 bytes [cuda] cudaAllocMapped 2097152 bytes, CPU 0x102a00000 GPU 0x102a00000 [cuda] cudaAllocMapped 8192 bytes, CPU 0x102c00000 GPU 0x102c00000 default X screen 0: 1920 x 1080 [OpenGL] glDisplay display window initialized [OpenGL] creating 1280x720 texture [gstreamer] gstreamer transitioning pipeline to GST_STATE_PLAYING Available Sensor modes : 2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 [gstreamer] gstreamer changed state from NULL to READY ==> mysink [gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1 [gstreamer] gstreamer changed state from NULL to READY ==> nvvconv0 [gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0 [gstreamer] gstreamer changed state from NULL to READY ==> nvcamerasrc0 [gstreamer] gstreamer changed state from NULL to READY ==> pipeline0 [gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1 [gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv0 [gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0 [gstreamer] gstreamer stream status CREATE ==> src [gstreamer] gstreamer changed state from READY to PAUSED ==> nvcamerasrc0 [gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0 [gstreamer] gstreamer msg new-clock ==> pipeline0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvcamerasrc0 NvCameraSrc: Trying To Set Default Camera Resolution. Selected sensorModeIndex = 1 WxH = 2592x1458 FrameRate = 30.000000 ... [gstreamer] gstreamer stream status ENTER ==> src [gstreamer] gstreamer msg stream-start ==> pipeline0 Allocate memory: input blob Allocate memory: coverage Allocate memory: box Allocate memory: count Allocate memory: selected bbox Allocate memory: selected index Allocate memory: softmax Allocate memory: label [gstreamer] gstreamer decoder onPreroll [cuda] cudaAllocMapped 1382400 bytes, CPU 0x103200000 GPU 0x103200000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x103400000 GPU 0x103400000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x103600000 GPU 0x103600000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x103800000 GPU 0x103800000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x103a00000 GPU 0x103a00000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x103c00000 GPU 0x103c00000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x103e00000 GPU 0x103e00000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104000000 GPU 0x104000000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104200000 GPU 0x104200000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104400000 GPU 0x104400000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104600000 GPU 0x104600000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104800000 GPU 0x104800000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104a00000 GPU 0x104a00000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104c00000 GPU 0x104c00000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x104e00000 GPU 0x104e00000 [cuda] cudaAllocMapped 1382400 bytes, CPU 0x105000000 GPU 0x105000000 [cuda] gstreamer camera -- allocated 16 ringbuffers, 1382400 bytes each [gstreamer] gstreamer changed state from READY to PAUSED ==> mysink [gstreamer] gstreamer msg async-done ==> pipeline0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0 [cuda] gstreamer camera -- allocated 16 RGBA ringbuffers ROI: 0 0 0 0 0 bounding boxes detected [cuda] registered 14745600 byte openGL texture for interop access (1280x720) ROI: 0 0 0 0 0 bounding boxes detected ROI: 0 0 0 0 0 bounding boxes detected pass 0 to trt ROI: 259 248 96 139 Cuda failure: unspecified launch failure at line 328 Aborted (core dumped) [/code] Bug(Aborted (core dumped)) occurs whenever a human face is detected.How to solve it? Could you give me some suggestion? Thank you very much in advance!
said:Hi,

We have written a face-recognition sample to demonstrate TensorRT2.1 Plugin API.
Please check this GitHub for more details:

https://github.com/AastaNV/Face-Recognition


Hi,
I'm very happy you let me know this new example at the first time! I test the demo on jetson-tx2.
Some debug appeared.
./face-recognition 
Building and running a GPU inference engine for /home/nvidia/Face-Recognition/data/deploy.prototxt, N=1...
[gstreamer] initialized gstreamer, version 1.8.3.0
[gstreamer] gstreamer decoder pipeline string:
nvcamerasrc fpsRange="30.0 30.0" ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
successfully initialized video device
width: 1280
height: 720
depth: 12 (bpp)

Bindings after deserializing:
Binding 0 (data): Input.
Binding 1 (coverage_fd): Output.
Binding 2 (bboxes_fd): Output.
Binding 3 (count_fd): Output.
Binding 4 (bbox_fr): Output.
Binding 5 (bbox_id): Output.
Binding 6 (softmax_fr): Output.
Binding 7 (label): Output.
loaded image /home/nvidia/Face-Recognition/data/fontmapA.png (256 x 512) 2097152 bytes
[cuda] cudaAllocMapped 2097152 bytes, CPU 0x102a00000 GPU 0x102a00000
[cuda] cudaAllocMapped 8192 bytes, CPU 0x102c00000 GPU 0x102c00000
default X screen 0: 1920 x 1080
[OpenGL] glDisplay display window initialized
[OpenGL] creating 1280x720 texture
[gstreamer] gstreamer transitioning pipeline to GST_STATE_PLAYING

Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> nvcamerasrc0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvcamerasrc0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer msg new-clock ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvcamerasrc0

NvCameraSrc: Trying To Set Default Camera Resolution. Selected sensorModeIndex = 1 WxH = 2592x1458 FrameRate = 30.000000 ...

[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer msg stream-start ==> pipeline0
Allocate memory: input blob
Allocate memory: coverage
Allocate memory: box
Allocate memory: count
Allocate memory: selected bbox
Allocate memory: selected index
Allocate memory: softmax
Allocate memory: label
[gstreamer] gstreamer decoder onPreroll
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x103200000 GPU 0x103200000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x103400000 GPU 0x103400000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x103600000 GPU 0x103600000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x103800000 GPU 0x103800000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x103a00000 GPU 0x103a00000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x103c00000 GPU 0x103c00000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x103e00000 GPU 0x103e00000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104000000 GPU 0x104000000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104200000 GPU 0x104200000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104400000 GPU 0x104400000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104600000 GPU 0x104600000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104800000 GPU 0x104800000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104a00000 GPU 0x104a00000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104c00000 GPU 0x104c00000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x104e00000 GPU 0x104e00000
[cuda] cudaAllocMapped 1382400 bytes, CPU 0x105000000 GPU 0x105000000
[cuda] gstreamer camera -- allocated 16 ringbuffers, 1382400 bytes each
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer msg async-done ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
[cuda] gstreamer camera -- allocated 16 RGBA ringbuffers
ROI: 0 0 0 0
0 bounding boxes detected
[cuda] registered 14745600 byte openGL texture for interop access (1280x720)
ROI: 0 0 0 0
0 bounding boxes detected
ROI: 0 0 0 0
0 bounding boxes detected
pass 0 to trt
ROI: 259 248 96 139
Cuda failure: unspecified launch failure at line 328
Aborted (core dumped)


Bug(Aborted (core dumped)) occurs whenever a human face is detected.How to solve it? Could you give me some suggestion? Thank you very much in advance!

#10
Posted 08/25/2017 07:54 AM   
Hi, Thanks for your feedback. There is a bug in handling image boundary. We fix the bug already. Please recheck it. https://github.com/AastaNV/Face-Recognition/commit/7bdab40c4b54ce3b6410ddb32a8c198768824789 Thanks for your feedback, and sorry for the inconvenience.
Hi,

Thanks for your feedback.

There is a bug in handling image boundary.
We fix the bug already. Please recheck it.

https://github.com/AastaNV/Face-Recognition/commit/7bdab40c4b54ce3b6410ddb32a8c198768824789


Thanks for your feedback, and sorry for the inconvenience.

#11
Posted 08/28/2017 02:43 AM   
I know this face-recognition example is meant to demonstrate the plug-in API functionality, but I'm curious about the model and dataset used to train it. Is the model and the dataset publicly available? It works great at detecting my face at around 15fps, but of course my face isn't in the data so it misidentifies me as a bunch of different celebrities. So I'd love to be able to retrain it with additional data to see how accurate it would be. It also seems to be some kind of merged model which I'm not familiar with.
I know this face-recognition example is meant to demonstrate the plug-in API functionality, but I'm curious about the model and dataset used to train it.

Is the model and the dataset publicly available? It works great at detecting my face at around 15fps, but of course my face isn't in the data so it misidentifies me as a bunch of different celebrities. So I'd love to be able to retrain it with additional data to see how accurate it would be.

It also seems to be some kind of merged model which I'm not familiar with.

#12
Posted 08/28/2017 08:18 PM   
Hi, We generate this model by combining DetectNet and GoogleNet; both can be found in [url=https://github.com/NVIDIA/DIGITS]DIGITs[/url]. Share our steps: 1. Train DetectNet(detection) with FDDB database 2. Train GoogleNet(classification) with VGG_Face database 3. Run this [url=https://github.com/AastaNV/Face-Recognition/blob/master/script/rename_model.py]script[/url] to generate a merged prototxt 4. Randomly generate a caffemodel of new prototxt (DIGITs is a useful tool) 5. Replace weights of No.4. with weight in No.1 and No.2 via this [url=https://github.com/AastaNV/Face-Recognition/blob/master/script/merge_model.py]script[/url] 6. Add the plugin layer to prototxt.(ex. bboxMerge) For your use case, you can train your classification network with GoogleNet. Then use this [url=https://github.com/AastaNV/Face-Recognition/blob/master/script/merge_model.py]script[/url] to overwrite the FR weights.
Hi,

We generate this model by combining DetectNet and GoogleNet; both can be found in DIGITs.

Share our steps:
1. Train DetectNet(detection) with FDDB database
2. Train GoogleNet(classification) with VGG_Face database
3. Run this script to generate a merged prototxt
4. Randomly generate a caffemodel of new prototxt (DIGITs is a useful tool)
5. Replace weights of No.4. with weight in No.1 and No.2 via this script
6. Add the plugin layer to prototxt.(ex. bboxMerge)

For your use case, you can train your classification network with GoogleNet.
Then use this script to overwrite the FR weights.

#13
Posted 08/29/2017 01:57 AM   
Thanks. I didn't even realize merging DetectNet and GoogleNet was even possible. A few months ago I wrote a DualNet program that combined DetectNet and AlexNet, but I pipelined them. Where I trained each one separately, and did the Inference separately. I used DetectNet to detect Playing Cards, and then I sent the Region of Interest to an AlexNet model that determined which card it was. Doing it this way will allow me to combine these two networks into one if I'm understanding it correctly.
Thanks.

I didn't even realize merging DetectNet and GoogleNet was even possible. A few months ago I wrote a DualNet program that combined DetectNet and AlexNet, but I pipelined them. Where I trained each one separately, and did the Inference separately.

I used DetectNet to detect Playing Cards, and then I sent the Region of Interest to an AlexNet model that determined which card it was.

Doing it this way will allow me to combine these two networks into one if I'm understanding it correctly.

#14
Posted 08/29/2017 03:19 AM   
For steps 1, and 2 I trained the networks using the datasets I have for the Playing cards. The script seemed to work fine where it renamed the detectnet layers to having _fd on the end, and _fr on the end of the GoogleNet Classification layers. But, I ran into a Digits error on Step 4. Using the merged prototxt it gave an error of "Layer 'deploy_transform_fd' references bottom 'data_fd' at the TRAIN stage however this blob is not included at that stage." The input layer is still named "data" as it didn't get changed. But, there are two input layers name data. The one for the DetectNet (mine is 1280x944x3), and GoogleNet (224x224). Is there an example prototxt for step 4? The prototxt that's included seems to be for step 6 where it has a dataRoi layer as the input to the GoogleNet classification.
For steps 1, and 2 I trained the networks using the datasets I have for the Playing cards.

The script seemed to work fine where it renamed the detectnet layers to having _fd on the end, and _fr on the end of the GoogleNet Classification layers.

But, I ran into a Digits error on Step 4.

Using the merged prototxt it gave an error of "Layer 'deploy_transform_fd' references bottom 'data_fd' at the TRAIN stage however this blob is not included at that stage."

The input layer is still named "data" as it didn't get changed. But, there are two input layers name data. The one for the DetectNet (mine is 1280x944x3), and GoogleNet (224x224).

Is there an example prototxt for step 4?

The prototxt that's included seems to be for step 6 where it has a dataRoi layer as the input to the GoogleNet classification.

#15
Posted 08/30/2017 12:10 AM   
Scroll To Top

Add Reply