Merging object detection and object classification
Hi all, I am trying to merge object detection and classification together into one instead of calling the inference separately. After looking from this link [url]https://devtalk.nvidia.com/default/topic/1007313/jetson-tx2/how-to-build-the-objection-detection-framework-ssd-with-tensorrt-on-tx2-/2[/url] I have some question regarding to it: To merge DetectNet and GoogleNet. For step 3, where do I remove the data declaration from classification network. It is after I merged the prototxt or before I merged it, and where do I remove the data declaration. Thanks in advance!
Hi all,

I am trying to merge object detection and classification together into one instead of calling the inference separately. After looking from this link https://devtalk.nvidia.com/default/topic/1007313/jetson-tx2/how-to-build-the-objection-detection-framework-ssd-with-tensorrt-on-tx2-/2

I have some question regarding to it:

To merge DetectNet and GoogleNet. For step 3, where do I remove the data declaration from classification network. It is after I merged the prototxt or before I merged it, and where do I remove the data declaration.


Thanks in advance!

#1
Posted 01/02/2018 03:01 AM   
DetectNet already classifies the objects it detects. Why do you want to add GoogleNet? A very rough approximation of how detectnet work, is to say that it runs GoogleNet-like object detection as a convolutional step, on a coarse grid, and simultaneously also outputs the predicted corners of the objects it classifies. If you want to build a hierarchical model, such that detectnet detects "car" and googlenet detects "1967 Camaro," then the better way to do this would be to look at the bounding boxes that come out of detectnet, and extract those boxes as small object images from the input image, and run each of those small object images into a googlenet trained on smaller images. You don't do this by merging the prototxt files; you do this by writing code that loads both models, and knows how to extract/scale the data from the bounding boxes, and forward to the next network (presumably in some pipelined loop.)
DetectNet already classifies the objects it detects. Why do you want to add GoogleNet?

A very rough approximation of how detectnet work, is to say that it runs GoogleNet-like object detection as a convolutional step, on a coarse grid, and simultaneously also outputs the predicted corners of the objects it classifies.

If you want to build a hierarchical model, such that detectnet detects "car" and googlenet detects "1967 Camaro," then the better way to do this would be to look at the bounding boxes that come out of detectnet, and extract those boxes as small object images from the input image, and run each of those small object images into a googlenet trained on smaller images. You don't do this by merging the prototxt files; you do this by writing code that loads both models, and knows how to extract/scale the data from the bounding boxes, and forward to the next network (presumably in some pipelined loop.)

#2
Posted 01/02/2018 03:10 AM   
Hi, Check more information here: [url]https://devtalk.nvidia.com/default/topic/1023699/jetson-tx2/questions-about-face-recongnition/post/5209485/#5209485[/url] Thanks.

#3
Posted 01/02/2018 05:32 AM   
Hi AastaLLL, I followed the link that you provide, however when I tried to build a new classification model I keep having error code -11. Here are my setting for the classification model: Training epochs: 1 Snapshot interval: 1 Validation interval: 1 Solver type: SGD Base Learning Rate: 0.01 Policy: Step Down Step Size 33 Gamma: 0.1 Subtract Mean: Image For custom network I used the one I generate in step 4. For the error code -11 below is the following message that I got: [code]Test net output #19998: prob_fr = 0.000746771 Test net output #19999: prob_fr = 0.0013324 Test net output #20000: prob_fr = 0.000699792 Test net output #20001: prob_fr = 0.000730556 Test net output #20002: prob_fr = 0.000465725 Test net output #20003: prob_fr = 0.00140473 Test net output #20004: prob_fr = 0.000972217 Test net output #20005: prob_fr = 0.0011883 Test net output #20006: prob_fr = 0.00168505 Test net output #20007: prob_fr = 0.00119044 Test net output #20008: prob_fr = 0.000624847 Test net output #20009: prob_fr = 0.000463807 Test net output #20010: prob_fr = 0.0017488 Test net output #20011: prob_fr = 0.00053642 Test net output #20012: prob_fr = 0.000488077 Test net output #20013: prob_fr = 0.00112253 Test net output #20014: prob_fr = 0.000426881 Test net output #20015: prob_fr = 0.00155486 Optimization Done. Optimization Done.[/code] Thanks!
Hi AastaLLL,

I followed the link that you provide, however when I tried to build a new classification model I keep having error code -11. Here are my setting for the classification model:

Training epochs: 1
Snapshot interval: 1
Validation interval: 1

Solver type: SGD
Base Learning Rate: 0.01
Policy: Step Down
Step Size 33
Gamma: 0.1

Subtract Mean: Image

For custom network I used the one I generate in step 4.

For the error code -11 below is the following message that I got:
Test net output #19998: prob_fr = 0.000746771
Test net output #19999: prob_fr = 0.0013324
Test net output #20000: prob_fr = 0.000699792
Test net output #20001: prob_fr = 0.000730556
Test net output #20002: prob_fr = 0.000465725
Test net output #20003: prob_fr = 0.00140473
Test net output #20004: prob_fr = 0.000972217
Test net output #20005: prob_fr = 0.0011883
Test net output #20006: prob_fr = 0.00168505
Test net output #20007: prob_fr = 0.00119044
Test net output #20008: prob_fr = 0.000624847
Test net output #20009: prob_fr = 0.000463807
Test net output #20010: prob_fr = 0.0017488
Test net output #20011: prob_fr = 0.00053642
Test net output #20012: prob_fr = 0.000488077
Test net output #20013: prob_fr = 0.00112253
Test net output #20014: prob_fr = 0.000426881
Test net output #20015: prob_fr = 0.00155486
Optimization Done.
Optimization Done.


Thanks!

#4
Posted 01/02/2018 06:32 AM   
Hi AastaLLL, I manage to solve the error issue and was able to run the program. It can detect the object and display the object (I used the dog example) well, but when i switch to my own (to detect watch). Once it detect it, it crash straight away. Is there any way to debug this? Below is the error: [code]HERE HERE HERE: 0x3891280 ROI: 0 0 0 0 0 bounding boxes detected HERE HERE HERE: 0x3891280 ROI: 0 0 0 0 0 bounding boxes detected HERE HERE HERE: 0x3891280 pass 0 to trt 150.984 102.734 246.594 204.188 ROI: 151 103 96 101 ID=0, label=833 1 bounding boxes detected bounding box 0 (402.625000, 154.101562) (657.583374, 306.281250) w=254.958374 h=152.179688 HERE HERE HERE: 0x3891280 Segmentation fault (core dumped) [/code] Thanks!
Hi AastaLLL,

I manage to solve the error issue and was able to run the program. It can detect the object and display the object (I used the dog example) well, but when i switch to my own (to detect watch). Once it detect it, it crash straight away.

Is there any way to debug this?

Below is the error:

HERE HERE HERE: 0x3891280
ROI: 0 0 0 0
0 bounding boxes detected

HERE HERE HERE: 0x3891280
ROI: 0 0 0 0
0 bounding boxes detected

HERE HERE HERE: 0x3891280
pass 0 to trt
150.984 102.734 246.594 204.188
ROI: 151 103 96 101
ID=0, label=833
1 bounding boxes detected
bounding box 0 (402.625000, 154.101562) (657.583374, 306.281250) w=254.958374 h=152.179688

HERE HERE HERE: 0x3891280
Segmentation fault (core dumped)



Thanks!

#5
Posted 01/03/2018 02:48 AM   
Hi, Network size is hardcoded due to no parameter support in Plugin API. You may need to modify the size here for your custom model: [url]https://github.com/AastaNV/Face-Recognition/blob/master/pluginImplement.cpp#L253[/url] [url]https://github.com/AastaNV/Face-Recognition/blob/master/pluginImplement.cpp#L288[/url] Thanks.
Hi,

Network size is hardcoded due to no parameter support in Plugin API.
You may need to modify the size here for your custom model:
https://github.com/AastaNV/Face-Recognition/blob/master/pluginImplement.cpp#L253
https://github.com/AastaNV/Face-Recognition/blob/master/pluginImplement.cpp#L288

Thanks.

#6
Posted 01/03/2018 07:50 AM   
Hi, Can I ask why for the classification training use 224x224 but we need to change it back to 640x640 in the end? The size of my custom model followes the same parameter as the dog example, it still crash when it detect watch. I train the classification model as 224x224, but for the step4.prototxt once I change it to 3, 480,480 as my detection model was train using 480x480. If I modify the size to 3,480,480 it shows this error message: [code]nvidia@tegra-ubuntu:~/Face-Recognition-master/build/aarch64/bin$ ./face-recognition Building and running a GPU inference engine for /home/nvidia/Desktop/Merge_example/Tank/step4.prototxt, N=1... [gstreamer] initialized gstreamer, version 1.8.3.0 [gstreamer] gstreamer decoder pipeline string: nvcamerasrc fpsRange="30.0 30.0" ! video/x-raw(memory:NVMM), width=(int)768, height=(int)576, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink successfully initialized video device width: 768 height: 576 depth: 12 (bpp) loss3/classifier_fr: kernel weights has count 1024000 but 82944000 was expected face-recognition: /home/nvidia/Face-Recognition-master/tensorNet.cpp:34: void TensorNet::caffeToTRTModel(const string&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, unsigned int): Assertion `engine' failed. Aborted (core dumped)[/code] Thanks!
Hi,

Can I ask why for the classification training use 224x224 but we need to change it back to 640x640 in the end?

The size of my custom model followes the same parameter as the dog example, it still crash when it detect watch. I train the classification model as 224x224, but for the step4.prototxt once I change it to 3, 480,480 as my detection model was train using 480x480. If I modify the size to 3,480,480 it shows this error message:

nvidia@tegra-ubuntu:~/Face-Recognition-master/build/aarch64/bin$ ./face-recognition 
Building and running a GPU inference engine for /home/nvidia/Desktop/Merge_example/Tank/step4.prototxt, N=1...
[gstreamer] initialized gstreamer, version 1.8.3.0
[gstreamer] gstreamer decoder pipeline string:
nvcamerasrc fpsRange="30.0 30.0" ! video/x-raw(memory:NVMM), width=(int)768, height=(int)576, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
successfully initialized video device
width: 768
height: 576
depth: 12 (bpp)

loss3/classifier_fr: kernel weights has count 1024000 but 82944000 was expected
face-recognition: /home/nvidia/Face-Recognition-master/tensorNet.cpp:34: void TensorNet::caffeToTRTModel(const string&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, unsigned int): Assertion `engine' failed.
Aborted (core dumped)




Thanks!

#7
Posted 01/03/2018 09:28 AM   
Hi, The input size of DetectNet is 640x360 and the input size of googleNet is 224x224. That's why there is an ROI resize layer to scale an ROI region into 224x224 via CUDA. From your log, there is something wrong when merging the detection and classification model. Please recheck your procedure with the information mentioned in following topics: https://devtalk.nvidia.com/default/topic/1023699/jetson-tx2/questions-about-face-recongnition/ https://devtalk.nvidia.com/default/topic/1007313/jetson-tx2/how-to-build-the-objection-detection-framework-ssd-with-tensorrt-on-tx2-/ Thanks
Hi,

The input size of DetectNet is 640x360 and the input size of googleNet is 224x224.
That's why there is an ROI resize layer to scale an ROI region into 224x224 via CUDA.

From your log, there is something wrong when merging the detection and classification model.
Please recheck your procedure with the information mentioned in following topics:
https://devtalk.nvidia.com/default/topic/1023699/jetson-tx2/questions-about-face-recongnition/
https://devtalk.nvidia.com/default/topic/1007313/jetson-tx2/how-to-build-the-objection-detection-framework-ssd-with-tensorrt-on-tx2-/

Thanks

#8
Posted 01/04/2018 08:40 AM   
Hi AastaLLL, Thanks for the input, I did follow and recompile again but it still have the same error whenever I change the my size (480x480) in pluginImplement.cpp. For the training of new classification, is it suppose to be like this? (I attach the photo below) Can I check with you on one last thing? Does face-recognition support external camera e.g IP or v4l2 camera? The on board camera seem to be working fine but when I switch over to IP/v4l2 camera it keep saying this [code]failed to capture frame failed to convert from NV12 to RGBA [cuda] cudaPreImageNetMean((float4*)imgRGBA, camera->GetWidth(), camera->GetHeight(), data, dimsData.w(), dimsData.h(), make_float3(127.0f, 127.0f, 127.0f)) [cuda] invalid device pointer (error 17) (hex 0x11) [cuda] /home/nvidia/Face-Recognition/face-recognition/face-recognition.cpp:223 cudaPreImageNetMean failed [/code] OR [code][cuda] registered 7077888 byte openGL texture for interop access (768x576) Segmentation fault (core dumped) [/code] I tried testing it on detectnet/imagenet camera and both seem to be working fine with IP/v4l2 camera. Thanks!
Hi AastaLLL,

Thanks for the input, I did follow and recompile again but it still have the same error whenever I change the my size (480x480) in pluginImplement.cpp. For the training of new classification, is it suppose to be like this? (I attach the photo below)

Can I check with you on one last thing? Does face-recognition support external camera e.g IP or v4l2 camera? The on board camera seem to be working fine but when I switch over to IP/v4l2 camera it keep saying this
failed to capture frame
failed to convert from NV12 to RGBA
[cuda] cudaPreImageNetMean((float4*)imgRGBA, camera->GetWidth(), camera->GetHeight(), data, dimsData.w(), dimsData.h(), make_float3(127.0f, 127.0f, 127.0f))
[cuda] invalid device pointer (error 17) (hex 0x11)
[cuda] /home/nvidia/Face-Recognition/face-recognition/face-recognition.cpp:223
cudaPreImageNetMean failed

OR
[cuda]   registered 7077888 byte openGL texture for interop access (768x576)
Segmentation fault (core dumped)



I tried testing it on detectnet/imagenet camera and both seem to be working fine with IP/v4l2 camera.

Thanks!

#9
Posted 01/05/2018 02:54 AM   
Hi, Face recognition sample only supports the onboard camera. If you are finding a sample for the V4L2 camera, please check our [i][color="gray"]jeston_inference[/color][/i] sample here: https://github.com/dusty-nv/jetson-inference Thanks.
Answer Accepted by Forum Admin
Hi,

Face recognition sample only supports the onboard camera.
If you are finding a sample for the V4L2 camera, please check our jeston_inference sample here:
https://github.com/dusty-nv/jetson-inference

Thanks.

#10
Posted 01/08/2018 07:48 AM   
Scroll To Top

Add Reply