Tensorflow not using GPU on drive PX2

I’ve built tensorflow from source on my drive PX2 (Cuda 9.2, Cudnn 7.1.2). I’m trying to run a mobilenet network in inference mode on the px2. I’m seeing an average of 12 seconds for inference whereas I expect it to run within a few hundred milliseconds.

Based on the logs, I see that the GPUs are being recognized:

2019-01-24 11:53:56.615435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0, 1
2019-01-24 11:53:56.615583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-24 11:53:56.615629: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0 1 
2019-01-24 11:53:56.615668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N N 
2019-01-24 11:53:56.615702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1:   N N 
2019-01-24 11:53:56.615811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3401 MB memory) -> physical GPU (device: 0, name: DRIVE PX 2 AutoChauffeur, pci bus id: 0000:04:00.0, compute capability: 6.1)
2019-01-24 11:53:56.616331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 1796 MB memory) -> physical GPU (device: 1, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

But tegrastats shows 0% utilization of the gpus even when I run it at a very high frequency.
I also used nvpmodel to set mode 0 which I think is the highest clock settings

Also, I don’t understand why in the device matrix, it lists the gpu usage as ‘N N’ but it says after that it’s created tensorflow devices on each GPU

Here’s what I can hint from my current experience with very same device:

Firstly - don’t expect to see any load on dGPU with tegrastats now. There’s some bug as described at: https://devtalk.nvidia.com/default/topic/1036238/general/can-t-detect-the-dgpu-utilization-through-tegrastat/1

Secondly, for testing, try forcing the graph to iGPU, e.g. with

export CUDA_VISIBLE_DEVICES=1
export TF_MIN_GPU_MULTIPROCESSOR_COUNT=2

and then check tegrastats.

You can also try using nvprof to profile the code run on GPU, if any.

I tried both those options but didn’t see any change in the output from tegrastats. I saw the tegrastats bug but not sure if that’s been resolved. And based on my observations, if my code was running on GPU, I would see a much faster inference time with tensorflow

Hi,

Do you build TensorFlow with the correct GPU architecture?
For PX2, you should use sm=6.1 and sm=6.2.

Please specify a list of comma-separated Cuda compute capabilities you want to build with. 
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. 
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] <b>6.1, 6.2</b>

Thanks.

Yes, I specified 6.1 and 6.2 for compute capability. My full config during build is as follows

Please specify the location of python. [Default is /usr/bin/python3]:
 
 
Found possible Python library paths:
  /opt/ros/kinetic/lib/python2.7/dist-packages
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.5/dist-packages
Please input the desired Python library path to use.  Default is [/opt/ros/kinetic/lib/python2.7/dist-packages]
/usr/lib/python3/dist-packages
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with Amazon AWS Platform support? [Y/n]: n
No Amazon AWS Platform support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with nGraph support? [y/N]: n
No nGraph support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
 
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.2
 
 
Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
 
 
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1.2
 
 
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/aarch64-linux-gnu/
 
 
Do you wish to build TensorFlow with TensorRT support? [y/N]: Y
TensorRT support will be enabled for TensorFlow.
 
Please specify the location where TensorRT library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/aarch64-linux-gnu/
 
Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3
 
 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 6.1,6.2
 
 
Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.
 
 
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
 
 
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
 
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
 
 
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Hi,

To check if your program is running on GPU or not, could you help to profile it with nvprof first?

nvprof python [app].py -o [output]

Thanks.