Tensorflow not using GPU in Jetson TX2

Hi everyone,

I am currently running a regression Tensorflow model in the Jetson TX2. but the frame rate is very low.

I speed up the jetson with:

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

When running the model Tegrastats gives me this:

RAM 1971/7851MB (lfb 812x4MB) cpu [1%@2035,0%@2035,100%@2035,0%@2035,0%@2036,1%@2034] EMC 2%@1866 APE 150 GR3D 0%@1300

I am using python 3 and Tensorflow from: installTensorFlowJetsonTX/tensorflow-1.3.0-cp35-cp35m-linux_aarch64.whl at master · jetsonhacks/installTensorFlowJetsonTX · GitHub

I installed everything in Jetpack 3.1

I am assuming that there are bugs in CUDA to Tensorflow link in this wheel.

I understand that TensorRT will increase the Fps. but i can’t find examples of TensorRT and the main issue is that Tensorflow is not using GPU in the Jetson.

So am i doing something wrong? Should i build Tensorflow from scratch? Should i install CUDA or cuDNN? Is there any hidden dependencies for Tensorflow? How can i increase Performance in terms of Fps? Where can i find the examples for TensorRT?

Thank you in advance!

Hi,

Could you share the following log with us?

1. The TensorFlow log when a session is inited.
2. Device placement of your model:
[url]TensorFlow Core

Thanks.

Hi,

I got this from the session:

2018-01-30 15:54:20.118792: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-01-30 15:54:20.118938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 1.11GiB
2018-01-30 15:54:20.119016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-01-30 15:54:20.119055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-01-30 15:54:20.119092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)

And i got this form the Device placement code from the link:

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0
2018-01-30 15:58:18.358564: I tensorflow/core/common_runtime/direct_session.cc:300] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
2018-01-30 15:58:18.365397: I tensorflow/core/common_runtime/simple_placer.cc:872] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
2018-01-30 15:58:18.365489: I tensorflow/core/common_runtime/simple_placer.cc:872] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
2018-01-30 15:58:18.365535: I tensorflow/core/common_runtime/simple_placer.cc:872] a: (Const)/job:localhost/replica:0/task:0/gpu:0

Available devices are:

[ /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0 ]

Also if you have any suggestions on were to find Tensorflow to TensorRT conversion, please post the links.

Thanks.

Hi,

1. Could you profile your TF model with this approach and share with us:

2. Convert a TF model to TensorRT requires python API. Python API is only available on x86 linux environment and sample can be found at ‘/usr/local/lib/python2.7/dist-packages/tensorrt/examples/tf_to_trt/’

Thanks.

Hi,

  1. I am using Keras with Tensorflow backend for my model. so converting it to Tensorflow will take time.

    But I have used this program from the link to get the profiles:
    https://gist.github.com/ikhlestov/54a894a7e5c06dd536dc0b7f6c5acd04#file-02_example_with_placeholders_and_for_loop-py

    The profiles are in :
    https://github.com/RameshKamath/TFprofile/tree/master/files

    I checked the Tegrastats when running the model it only used one CPU core.

  2. I didn't find the TensorRT in: '/usr/local/lib/python2.7/dist-packages/tensorrt'

Thanks.

Hi,

1. Please transfer .json to image for better visualization.

2. Python API is only available on our TensorRT x86 package.

Thanks.

Hi,

I have taken screenshots of the .json and posted at:

https://github.com/RameshKamath/TFprofile/tree/master/images

Thanks.

Hi,

From your profiling data, the SoftmaxCrossEntropyWithLogits takes a long time.
Do you need to calculate accuracy on run-time or it can be replaced with standard softmax?

Thanks.

Hi,

  • Like I said in the 5th post: https://devtalk.nvidia.com/default/topic/1029308/jetson-tx2/tensorflow-not-using-gpu-in-jetson-tx2/post/5236592/#5236592

    I am not using my own model. To get the profile I am using (complicated example) code from:
    https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d
    It’s a simple code and the time doesn’t matter.

  • The problem is Tensorflow is not running the model in GPU of Jetson TX2.
  • I think I found the cause of the problem.

    I got this when running the session.

    2018-01-30 15:54:20.118792: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
    2018-01-30 15:54:20.118938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
    name: NVIDIA Tegra X2
    major: 6 minor: 2 memoryClockRate (GHz) 1.3005
    pciBusID 0000:00:00.0
    Total memory: 7.67GiB
    Free memory: 1.11GiB
    2018-01-30 15:54:20.119016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
    2018-01-30 15:54:20.119055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
    2018-01-30 15:54:20.119092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
    

    When running the model with every thing on high settings. Tegrastats gives me this:

    RAM 1971/7851MB (lfb 812x4MB) cpu [1%@2035,0%@2035,100%@2035,0%@2035,0%@2036,1%@2034] EMC 2%@1866 APE 150 GR3D 0%@1300
    

    Note (Creating TensorFlow device (/gpu:0)) from:

    2018-01-30 15:54:20.119092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
    

    And Jetson devices are in:

    [ /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0 ]
    

    So I think Tensorflow is not accessing the correct GPU device for Jetson. I’ll try to build Tensorflow.

Thanks.

Hi,

Could you share more information about your investigation?
We are also checking this issue but not clear about the comment #9.

From the log message, it looks like TensorFlow correctly find the Jetson device and launch session on it.
Could you explain more about your concern?

Thanks.

Hi,

My concern was that when I ran a model in jetson it didn’t run on GPU instead it ran on CPU.
I gave the tegrastat of me running the model in comment #1.
The issue is that it initialized on GPU in session as given in the comment #3.

I was using keras to write my code and run tensorflow in the backend.

Now I wrote my model in tensorflow and found out that keras was causing the issue by running the model on CPU even though it was initialized with GPU.
And the tensorflow in Jetson used CPU for smaller codes(i tried out smaller codes when keras code didn’t work to see if the problem is with keras or tensorflow).

Thank you for answering my comments.

Thanks for your feedback.

Looks like you already found the root cause.
Is there anything we can help?