Low GPU Usage with Tensorflow Inference on Jetson Tx2
I created a realtime object detection pipline (https://github.com/GustavZ/realtime_object_detection) for Inference Based on Google's well-known Object Detection API. It runs on the Jetson Tx2 with around 5fps using SSD_Mobilenet, which is the smallest and fastest network provided by Googles API. And i am not happy with this performance Of course i prepare the jetson with: [code]sudo nvpmodel -m 0 sudo ./jetson_clocks.sh[/code] While the script is running [i]sudo ./tegrastats[/i] gives me following output: [code]RAM 7565/7851MB (lfb 5x4MB) CPU [46%@2025,20%@2035,12%@2034,44%@2029,45%@2031,45%@2028] EMC_FREQ 5%@1866 GR3D_FREQ 6%@1300 APE 150 MTS fg 0% bg 0% BCPU@34.5C MCPU@34.5C GPU@40.5C PLL@34.5C AO@32C Tboard@29C Tdiode@32.25C PMIC@100C thermal@33.7C VDD_IN 6342/4735 VDD_CPU 2063/1405 VDD_GPU 1069/368 VDD_SOC 992/934 VDD_WIFI 19/42 VDD_DDR 1514/1316 [/code] It seems that the whole RAM is used, which is good. The CPU Usage is only between around 10 and 50%, which is i would say not optimal? Right?! But the biggest Problem is that the GPU Usage is only at 6%. Does Anybody know how i can increase the GPU Usage? This can't be the end of the story. I am sure the Jetson Tx2 can go way faster than 5fps. I am very thankfull for any hints to increase the GPU Usage using Tensorflow for Inference on realtime Object Detection
I created a realtime object detection pipline (https://github.com/GustavZ/realtime_object_detection) for Inference Based on Google's well-known Object Detection API.

It runs on the Jetson Tx2 with around 5fps using SSD_Mobilenet, which is the smallest and fastest network provided by Googles API. And i am not happy with this performance

Of course i prepare the jetson with:
sudo nvpmodel -m 0
sudo ./jetson_clocks.sh


While the script is running sudo ./tegrastats gives me following output:
RAM 7565/7851MB (lfb 5x4MB) CPU [46%@2025,20%@2035,12%@2034,44%@2029,45%@2031,45%@2028] EMC_FREQ 5%@1866 GR3D_FREQ 6%@1300 APE 150 MTS fg 0% bg 0% BCPU@34.5C MCPU@34.5C GPU@40.5C PLL@34.5C AO@32C Tboard@29C Tdiode@32.25C PMIC@100C thermal@33.7C VDD_IN 6342/4735 VDD_CPU 2063/1405 VDD_GPU 1069/368 VDD_SOC 992/934 VDD_WIFI 19/42 VDD_DDR 1514/1316


It seems that the whole RAM is used, which is good.
The CPU Usage is only between around 10 and 50%, which is i would say not optimal? Right?!
But the biggest Problem is that the GPU Usage is only at 6%.

Does Anybody know how i can increase the GPU Usage? This can't be the end of the story.
I am sure the Jetson Tx2 can go way faster than 5fps.

I am very thankfull for any hints to increase the GPU Usage using Tensorflow for Inference on realtime Object Detection

#1
Posted 01/03/2018 04:59 PM   
Hi, Here are two suggestions for your use case: [b]1.[/b] Could you try TensorFlow default sample to check if the GPU utilization is higher? [b]2.[/b] Please try our TensorRT engine which has optimized for Jetson platform. [url]https://developer.nvidia.com/embedded/downloads?#?search=jetpack%203.2[/url] We also have a detection sample to demonstrate the power of TX2: [url]https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet[/url] Thanks.
Hi,

Here are two suggestions for your use case:
1. Could you try TensorFlow default sample to check if the GPU utilization is higher?
2. Please try our TensorRT engine which has optimized for Jetson platform.
https://developer.nvidia.com/embedded/downloads?#?search=jetpack%203.2

We also have a detection sample to demonstrate the power of TX2:
https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet

Thanks.

#2
Posted 01/04/2018 06:44 AM   
@AastaLLL i read all the NVIDIA Blogs about TensorRT and i think i understand now what it is. But i still dont know how to apply it to my needs: I want to run the tensorflow object detection API with ssd Mobile Net using a webcam as input with TensorRT as accelerator. At this point i am not interested yet in training a new Network yet, i just want to do inference and compare speed/perfomance on the jetson for different pre-trained Networks. Is there any Tutorial or Blog or other explanation available how to do this? I know the Github of dusty-nv, but unfortunately that does not cover this topic. Any Help is highly appreciated!
@AastaLLL i read all the NVIDIA Blogs about TensorRT and i think i understand now what it is.

But i still dont know how to apply it to my needs:

I want to run the tensorflow object detection API with ssd Mobile Net using a webcam as input with TensorRT as accelerator. At this point i am not interested yet in training a new Network yet, i just want to do inference and compare speed/perfomance on the jetson for different pre-trained Networks.

Is there any Tutorial or Blog or other explanation available how to do this?
I know the Github of dusty-nv, but unfortunately that does not cover this topic.

Any Help is highly appreciated!

#3
Posted 01/04/2018 12:00 PM   
Hi, Information you need to know for TensorRT on Jetson: [b]1.[/b] Flow for TensorFlow-based user is [i]TensorFlow -> UFF -> TensorRT[/i] [b]2.[/b] TensorFlow -> UFF requires TensorRT python API and only available for x86-based machine [b]3.[/b] Flow for a Jetson user should be like this: [b][color="green"](1).[/color][/b] Convert TensorFlow model to UFF format on x86-based machine with Python interface [b][color="green"](2).[/color][/b] Create TensorRT engine from UFF on Jetson with C++ interface Here is sample information: [b]1.[/b] Convert TensorFlow to UFF model: [color="green"]/usr/local/lib/python2.7/dist-packages/tensorrt/examples/tf_to_trt/[/color] [b]2.[/b] Creat TensorRT engine from UFF: [color="green"]/usr/src/tensorrt/samples/sampleUffMNIST/[/color] [b]3.[/b] Camera -> TensorRT sample: https://github.com/dusty-nv/jetson-inference Thanks
Hi,

Information you need to know for TensorRT on Jetson:
1. Flow for TensorFlow-based user is TensorFlow -> UFF -> TensorRT

2. TensorFlow -> UFF requires TensorRT python API and only available for x86-based machine

3. Flow for a Jetson user should be like this:
(1). Convert TensorFlow model to UFF format on x86-based machine with Python interface
(2). Create TensorRT engine from UFF on Jetson with C++ interface


Here is sample information:
1. Convert TensorFlow to UFF model:
/usr/local/lib/python2.7/dist-packages/tensorrt/examples/tf_to_trt/

2. Creat TensorRT engine from UFF:
/usr/src/tensorrt/samples/sampleUffMNIST/

3. Camera -> TensorRT sample:
https://github.com/dusty-nv/jetson-inference

Thanks

#4
Posted 01/05/2018 08:06 AM   
@AastaLLL Thank you for your reply, that helps! One question still remains: What exactly is/does the TensorRT engine? What am i doing with it once its created? How do i run inference with it with a videostream as input? I read dusty's tutorial, but there is no explanation on how to do this on a own tensorflow model. running the detectnet sample does not help me understand how to use tensorRT. EDIT: It fails to install TensorRT on my x86 host machine. Same problem as in this thread https://devtalk.nvidia.com/default/topic/1027618/libnvinfer-has-unmet-dependencies/?offset=3#5230819
@AastaLLL

Thank you for your reply, that helps!

One question still remains: What exactly is/does the TensorRT engine?
What am i doing with it once its created?

How do i run inference with it with a videostream as input?

I read dusty's tutorial, but there is no explanation on how to do this on a own tensorflow model. running the detectnet sample does not help me understand how to use tensorRT.


EDIT:
It fails to install TensorRT on my x86 host machine.
Same problem as in this thread https://devtalk.nvidia.com/default/topic/1027618/libnvinfer-has-unmet-dependencies/?offset=3#5230819

#5
Posted 01/05/2018 10:09 AM   
Hi, In dusty's tutorial, we demonstrate how to use TensorRT with Caffe frameworks. The main idea of TensorFlow user is similar. Please check comment [b][url=https://devtalk.nvidia.com/default/topic/1028234/jetson-tx2/low-gpu-usage-with-tensorflow-inference-on-jetson-tx2/post/5230788/#5230788]#4[/url][/b] for the workflow. Here is our sample for TensorFlow to TensorRT: (on x86 package) [i][color="green"]'/usr/local/lib/python2.7/dist-packages/tensorrt/examples/tf_to_trt/'[/color][/i] For installation issue, please remember to download TensorRT for x86 package from [b][url=https://developer.nvidia.com/nvidia-tensorrt-download]this page[/url][/b]. And [b][url=http://developer2.download.nvidia.com/compute/machine-learning/tensorrt/secure/3.0/ga/TensorRT-Installation-Guide.pdf?R0QAC5_xDYgfbUhL0TwvvINAUahC5hs_beCg0SxyWnTdz1fNIWMrDNcPpHVHVwMP3timKhduxXdGVVQjEWjsbbu9bntddTb0QWBf3njt5rHa-txpNwjI9BqhymDMJ9nYqadMqQVj7Yh0uPMWkd8rjdyERdDIficnn3RPdiVyk-EjW3pyR3WjiXThvIZkuw]here[/url][/b] is our installation guide for your reference. Thanks.
Hi,

In dusty's tutorial, we demonstrate how to use TensorRT with Caffe frameworks.
The main idea of TensorFlow user is similar. Please check comment #4 for the workflow.

Here is our sample for TensorFlow to TensorRT: (on x86 package)
'/usr/local/lib/python2.7/dist-packages/tensorrt/examples/tf_to_trt/'

For installation issue, please remember to download TensorRT for x86 package from this page.
And here is our installation guide for your reference.

Thanks.

#6
Posted 01/08/2018 10:40 AM   
hi, Installation of TensorRT was now successful on the x86 machine although it is located at usr/lib/... not usr/local/lib/... , but i guess that doesnt make any difference. To Convert TF to UFF Model and to Create the TensorRT Engine later is it necessary to go through the sample code and try to change it somehow according to the own needs? Is there no Tutorial or sample step-by-step guide about how and what to do? It seems a little bit difficult to just adapt everything without having no clue/introduction about how to use TensorRT (again i now understand what is in theory, but that does not help apply it :) )
hi,

Installation of TensorRT was now successful on the x86 machine although it is located at usr/lib/...
not usr/local/lib/... , but i guess that doesnt make any difference.

To Convert TF to UFF Model and to Create the TensorRT Engine later is it necessary to go through the sample code and try to change it somehow according to the own needs?
Is there no Tutorial or sample step-by-step guide about how and what to do?

It seems a little bit difficult to just adapt everything without having no clue/introduction about how to use TensorRT (again i now understand what is in theory, but that does not help apply it :) )

#7
Posted 01/08/2018 01:38 PM   
Hi, Please use TensorRT python API to convert a TensorFlow model into UFF. Here is a sample for your reference: [color="green"]/usr/local/lib/python2.7/dist-packages/tensorrt/examples/tf_to_trt/[/color] Thanks.
Hi,

Please use TensorRT python API to convert a TensorFlow model into UFF.
Here is a sample for your reference:
/usr/local/lib/python2.7/dist-packages/tensorrt/examples/tf_to_trt/

Thanks.

#8
Posted 01/12/2018 06:09 AM   
Unfortunately TensorRT does not support/include the Nodes used in SSD Mobilenet.
Unfortunately TensorRT does not support/include the Nodes used in SSD Mobilenet.

#9
Posted 01/12/2018 09:01 AM   
[quote="gustavvz"]Unfortunately TensorRT does not support/include the Nodes used in SSD Mobilenet. [/quote] Which nodes specifically is it missing? I am just beginning on this track as well and it seems like the performance boost of the optimization from the TensorRT will go a long way on the Jetson Tx2 BUT if it doesn't support the main Object Detection network designed for small boards I will have to abandon using TensorRT.
gustavvz said:Unfortunately TensorRT does not support/include the Nodes used in SSD Mobilenet.


Which nodes specifically is it missing?

I am just beginning on this track as well and it seems like the performance boost of the optimization from the TensorRT will go a long way on the Jetson Tx2 BUT if it doesn't support the main Object Detection network designed for small boards I will have to abandon using TensorRT.

#10
Posted 01/14/2018 03:05 AM   
Hey tabor473, For example the whole postprocessing, which is MultBatchNonMaxSupression for the SSD Mobilenet. I did not have a look at all the nodes and layers, but for now it is too much work to get it run on TensorRT. But if you try your luck, or write the customized nodes needed it would be very nice if you let me know and share your experience!
Hey tabor473,

For example the whole postprocessing, which is MultBatchNonMaxSupression for the SSD Mobilenet.
I did not have a look at all the nodes and layers, but for now it is too much work to get it run on TensorRT.

But if you try your luck, or write the customized nodes needed it would be very nice if you let me know and share your experience!

#11
Posted 01/15/2018 08:59 AM   
@gustavvz does this mean that it is unable to run with TensorRT or did you succeed in running the model with TensorRT?
@gustavvz does this mean that it is unable to run with TensorRT or did you succeed in running the model with TensorRT?

#12
Posted 01/15/2018 01:30 PM   
Hi, Not all TensorFlow operations are supported by TensorRT. Please check our document for details: Located at [color="green"]/usr/share/doc/tensorrt/[/color] [i]>> 1.1. TensorRT Layers[/i] Thanks.
Hi,

Not all TensorFlow operations are supported by TensorRT.
Please check our document for details:
Located at /usr/share/doc/tensorrt/
>> 1.1. TensorRT Layers

Thanks.

#13
Posted 01/22/2018 07:44 AM   
Scroll To Top

Add Reply