can the nvidia TensorRT accelerate SSD(single shot detector)?

Arleyzhang · August 9, 2017, 6:25am

I have run SSD in my jetson tx2, but the max speed was only 3.5FPS, it was too slow. Is there any method could speed up?
I want to use the nvidia TensorRT to accelerate SSD， and I have installed TensorRT2.1 on the jetson tx2. But I don’t know how to start.
Is there any documents about this?

I would appreciate it if someone helped me. Thanks！

AastaLLL · August 9, 2017, 7:04am

Hi,

Try to maximize TX2 performance first:

sudo ~/jetson_clocks.sh
sudo nvpmodel -m 0

We have tested SSD on TX2 and got around 8-9 fps.

It’s recommended to port SSD into TensorRT for better performance.
Benchmark of Caffe vs. TensorRT:
https://devblogs.nvidia.com/parallelforall/jetpack-doubles-jetson-tx1-deep-learning-inference/

Please note that SSD contains some TensorRT non-supported layer. It will require you to implement these layers.

Arleyzhang · August 9, 2017, 7:16am

Thanks very much!
I did not carry out the steps you mentioned!
I will try it！

luopengfei · August 9, 2017, 7:53am

Hi
I also want to speed up the SSD in Jetson TX2 by using TensorRT 2.1, but I don’t know how to do it.Can we discuss about it?
Thanks!

Arleyzhang · August 9, 2017, 8:53am

Sure!
But what disappointed you was that I’m a freshman in this area. Even so, I’m working hard.

About the question that how to accelerate SSD using tensorRT, I have searched on the internet.
The result was I didn’t find any successful solutions.

As AastaLLL said, “SSD contains some TensorRT non-supported layer. It will require you to implement these layers.”. Maybe implement these layers is complex. I don’t know how to do so far.

As for tensorRT,there was a guide
[url]https://github.com/dusty-nv/jetson-inference[/url]

reference:
[url]https://devtalk.nvidia.com/default/topic/971970/real-time-object-detection-on-jetson/?offset=1[/url]
[url]https://devblogs.nvidia.com/parallelforall/deploying-deep-learning-nvidia-tensorrt/[/url]
[url]NVIDIA Documentation Center | NVIDIA Developer
[url]https://devtalk.nvidia.com/default/topic/1004468/how-to-start-tensorrt-on-tx1-/[/url]

Arleyzhang · August 9, 2017, 9:10am

Sorry，I have implemented these two code statement. But the speed was still 3.5fps, is there anything wrong with my operation?

When should I perform these operations? Before compiling the caffe-ssd or Before test the SSD?

Thanks!

AastaLLL · August 10, 2017, 7:10am

Hi,

We share SSD building steps here:
[url]https://devtalk.nvidia.com/default/topic/1021356/jetson-tx2/caffe-ssd-on-tx2-cudnn_status_internal_error/post/5198822/#5198822[/url]

For how to implement a plugin layer in TensorRT, you can find more details here:
[url]NVIDIA Documentation Center | NVIDIA Developer

Arleyzhang · August 10, 2017, 7:14am

Hi,AastaLLL
Could you share your Makefile.config?
When I compile my caffe-ssd at the step “make runtest”, I met an error problem as follow, but it didn’t influence the test of object detection.

F0423 22:59:13.380147 8200 test_bbox_util.cpp:279] Check failed: out_bbox.xmax() == 50. (50 vs. 50)
*** Check failure stack trace: ***
@ 0x7f7bc2a718 google::LogMessage::Fail()
@ 0x7f7bc2c614 google::LogMessage::SendToLog()
@ 0x7f7bc2a290 google::LogMessage::Flush()
@ 0x7f7bc2ceb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x5be528 caffe::CPUBBoxUtilTest_TestOutputBBox_Test::TestBody()
@ 0xa544bc testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0xa4c764 testing::Test::Run()
@ 0xa4c8a0 testing::TestInfo::Run()
@ 0xa4c960 testing::TestCase::Run()
@ 0xa4dac0 testing::internal::UnitTestImpl::RunAllTests()
@ 0xa4ddd4 testing::UnitTest::Run()
@ 0x53b568 main
@ 0x7f7a3158a0 __libc_start_main

Arleyzhang · August 10, 2017, 10:17am

Thankyou very much!
I’ll try it again.

Arleyzhang · August 14, 2017, 2:47am

Hi,AastaLLL,
sorry to bother you again.

I builded the caffe-ssd according to your instructions. But, it was still the same mistake and the speed was still 3.5fps.
I don’t know why.

I flashed the TX2 with JetPack3.0, then I replaced the cuDNN, CUDA and tensorRT with the latest version using JetPack3.1, and the other components stay the same with JetPack3.0.
Is this the cause of this error or such a low speed?

Do you think it necessary that I should flash the TX2 and install all components with JetPack 3.1?

Thanks!

AastaLLL · August 15, 2017, 2:07am

Hi,

JetPack3.1 using TensorRT2.1 and cuDNNv6, which is 2x faster than JetPack3.0.
We test this score with the TensorRT engine, not sure how much acceleration Caffe can get.

I got 8-9 fps with ssd_pascal_video.py script. Which example do you use?

Arleyzhang · August 15, 2017, 3:05am

Thanks，AastaLLL

You mean you got 8-9fps with the TensorRT engine? I’m sorry I thought you got such a high speed without TensorRT.

But I have found someone got 8fps without TensorRT engine, and his platform was jetson tx 1, see this
[url]https://myurasov.github.io/2016/11/27/ssd-tx1.html[/url]
I was confused.

When I building the SSD according to your instructions there was always an error that I put it at #8. But I still could test the example.

I tested the example “ssd_pascal_video.py” and “ssd_pascal_webcam.py”, and none of them exceeded 4fps without TensorRT engine. Is this the normal performance?

By the way, could you please provide a sample-code about SSD using TensorRT？ I tried to deploy SSD with TensorRT according to the TensorRT User Guid you supplied， But I haven’t had any success so far.

Thanks!

AastaLLL · August 16, 2017, 2:04am

Hi,

Sorry for the confusing.

We got 8-9fps with the following setting:

Jetpack3.1
SSD Caffe branch
ssd_pascal_video.py

I just want to explain that JetPack3.1 can give you 2x acceleration, but we test this with TensorRT.
It looks like Caffe also get 2x acceleration with JetPack3.1. (Cool!)

So please re-flash your device with JetPack3.1, and you can get 8-9 fps.
Please notice that the package in JetPack3.1 is for branch rel-28.1.
It may cause some error if you install it with JetPack3.0, which is rel-27.1.

For SSD-TensorRT, there are some non-supported layers.
You need to implement these layers with custom API.

Thanks.

Arleyzhang · August 17, 2017, 1:25am

Thanks,This is the big help!

I will reflash TX2 with JetPack3.1.
About TensorRT， I didn’t make a good inquiry， I’ll study it again.

Thanks again!

dongsheng_wang · September 11, 2017, 7:21am

Hi,AastaLLL
Could you share your Makefile.config?
When I compile my caffe-ssd at the step “make runtest”, I met an error problem as follow, but it didn’t influence the test of object detection.

F0423 22:59:13.380147 8200 test_bbox_util.cpp:279] Check failed: out_bbox.xmax() == 50. (50 vs. 50)
*** Check failure stack trace: ***
@ 0x7f7bc2a718 google::LogMessage::Fail()
@ 0x7f7bc2c614 google::LogMessage::SendToLog()
@ 0x7f7bc2a290 google::LogMessage::Flush()
@ 0x7f7bc2ceb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x5be528 caffe::CPUBBoxUtilTest_TestOutputBBox_Test::TestBody()
@ 0xa544bc testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0xa4c764 testing::Test::Run()
@ 0xa4c8a0 testing::TestInfo::Run()
@ 0xa4c960 testing::TestCase::Run()
@ 0xa4dac0 testing::internal::UnitTestImpl::RunAllTests()
@ 0xa4ddd4 testing::UnitTest::Run()
@ 0x53b568 main
@ 0x7f7a3158a0 __libc_start_main

hi Arleyzhang
I met the same error as you,
Could you tell me how did you solve this error?

AastaLLL · September 11, 2017, 7:34am

Hi,

Please check the installation steps here:
[url]https://devtalk.nvidia.com/default/topic/1021356/jetson-tx2/caffe-ssd-on-tx2-cudnn_status_internal_error/post/5198822/#5198822[/url]

Thanks.

Arleyzhang · September 11, 2017, 11:26am

Hello, dongsheng_wang.
I didn’t do anything. This error didn’t influence the test of object detection.
And，this didn’t affect the use of caffe.
I don’t know how to solve it. The source file for this error is “test_bbox_util.cpp”, which is created by SSD’s author.I think that maybe checking out the source code could solve it.

dongsheng_wang · September 12, 2017, 1:49am

Hi,Arleyzhang

Thank for your reply.
I will try to check out it.

hnlyxacj · October 31, 2017, 6:30am

#8

Hi,AastaLLL
Could you share your Makefile.config?
When I compile my caffe-ssd at the step “make runtest”, I met an error problem as follow, but it didn’t influence the test of object detection.

F0423 22:59:13.380147 8200 test_bbox_util.cpp:279] Check failed: out_bbox.xmax() == 50. (50 vs. 50)
*** Check failure stack trace: ***
@ 0x7f7bc2a718 google::LogMessage::Fail()
@ 0x7f7bc2c614 google::LogMessage::SendToLog()
@ 0x7f7bc2a290 google::LogMessage::Flush()
@ 0x7f7bc2ceb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x5be528 caffe::CPUBBoxUtilTest_TestOutputBBox_Test::TestBody()
@ 0xa544bc testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0xa4c764 testing::Test::Run()
@ 0xa4c8a0 testing::TestInfo::Run()
@ 0xa4c960 testing::TestCase::Run()
@ 0xa4dac0 testing::internal::UnitTestImpl::RunAllTests()
@ 0xa4ddd4 testing::UnitTest::Run()
@ 0x53b568 main
@ 0x7f7a3158a0 __libc_start_main

I met the same problem now.Is there anyone can give a credible method to solve it ? Thank you very much

hnlyxacj · November 1, 2017, 2:39am

how do you reach 8-9fps I just get 5fps in unstable status?