Available: TensorFlow 1.5 for Jetson TX2

Hi guys,

Thank you for all the good information that you make available to us here.

I just wanted to share with you that I successfully build and installed TensorFlow 1.5 on the Jetson TX2.
I have made the wheel-file for installing publicly available at: GitHub - JesperChristensen89/TensorFlow-Jetson-TX2: Pre-built wheel files for installing TensorFlow on Jetson TX2

I have tested the installation with CUDA 8 and cuDNN 6 and have successfully deployed SSD models from the TensorFlow Object Detection API in a Jupyter Notebook environment with TensorFlow.

jesp-hc,
Thanks for sharing your effort with Jetson community.

Thanks for your sharing.

We have released CUDA 9.0 in JetPack3.2 DP.
Could you also try to build TensorFlow-1.5 with JetPack3.2 DP?

Thanks.

I tested TensorFlow 1.5 along with CUDA 9.0 and cuDNN 7.0 from JetPack3.2 DP and can confirm that it works as well.
I am still not able to run any of the Faster R-CNN models, see: Faster R-CNN: too many resources requested for launch - Jetson TX2 - NVIDIA Developer Forums
However smaller models as SSD works perfectly fine.

Hi,

Let’s check this issue on topic 1028798:

Thanks.

Hi Jesper & AastaLLL

I’ve built tensorflow 1.5 on Jetpack 3.2 with the following combinations:

Bazel 0.8.0 / Bazel 0.9.0
GCC 4.8.5 / 5.4.0
CUDA 9.0

My experience on Tx2 is that the stability (doesn’t always start) and inference performance isn’t that great.
Could you please share some some of the details related to your build so I can reproduce on Jetpack 3.2.

I’m on Jetpack 3.2 because the performance/stability was roughly the same on 3.1.

Best regards,
Kalevi

Hi Kalevi,

Could you please expand a bit on you issues and what informations you seek?

Have you tried with my built on TF 1.5 on JetPack 3.2?

Best,
Jesper

Hi Jesper,

I tried your wheel but as its built for Jetpack 3.1 and CUDA 8 it can’t load the cuda 8 libraries.

I’m looking for some your specific build choices and assume you built this on Jetson TX2 itself ?

If I look at tensorflow.org how to build from source and the table at the very end of the desciption page it says that it has been tested with:

tensorflow_gpu-1.5.0, GPU, Python: 2.7, 3.3-3.6, GCC 4.8, Bazel 0.8.0, CuDNN:7, CUDA:9

I’m looking for the above information and also your ./configure choices.

Bellow you can see my configuration choices.

My problem is are running either inceptionV3 or ssd_mobilenet the launch time can be minutes and inference performance resembles inference on a i7-CPU

Best regards,
Kalevi

– ./configure –

Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]:
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]:
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: CUDA GPUs - Compute Capability | NVIDIA Developer.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]

Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]:

Add “–config=mkl” to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable “TF_MKL_ROOT” every time before build.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Hi,

Here is some data on what I describe as slow to launch. This was built with CUDA9. GCC 5, TF1.5, Bazel 0.9.1. Freshly booted TX2.

First run 1m30s
Second run 7s
Third run 3s

Best regards,
Kalevi

nvidia@tegra-ubuntu:~$ cat hellotf.py
import tensorflow as tf

hello = tf.constant(“helloe world”)
sess = tf.Session()
print (sess.run(hello))


nvidia@tegra-ubuntu:~$ time python hellotf.py
2018-02-06 14:37:58.414481: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-06 14:37:58.414604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 6.12GiB
2018-02-06 14:37:58.414676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) → (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-06 14:39:28.770466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
helloe world

real 1m34.633s
user 1m30.380s
sys 0m2.104s

nvidia@tegra-ubuntu:~$ time python hellotf.py
2018-02-06 14:39:45.115259: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-06 14:39:45.115373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 4.90GiB
2018-02-06 14:39:45.115427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) → (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-06 14:39:51.184749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
helloe world

real 0m8.195s
user 0m7.452s
sys 0m0.612s
nvidia@tegra-ubuntu:~$ time python hellotf.py
2018-02-06 14:39:59.755967: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-06 14:39:59.756095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 4.90GiB
2018-02-06 14:39:59.756155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) → (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-06 14:40:01.071648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
helloe world

real 0m3.425s
user 0m2.756s
sys 0m0.532s
nvidia@tegra-ubuntu:~$

Hi, kallud32qg

Thanks for your feedback.

Could you also apply the #9 experiment on a desktop GPU?
This will figure out the unstable issue is from TensorFlow or TX2.

Thanks.

Hi AastaLLL,

If I were to run the similar test on a desktop GPU that I ran on post #9 I would take a pre-built python wheel which would be called e.g. tensorflow_gpu-1.4.0rc0-cp27-none-linux_x86_64.whl. (please note the X86_64 in the naming convention)

The reason for building a python wheel myself is that there are no pre-built wheels for E.G. ARMv7 application processors and GPU support. This is the reason Jesper released his wheel.

As mentioned earlier I build Tensorflow on the board itself. I don’t cross compile on a PC so I would have to cross compile on the TX2 for x86 to run the test.

I’m more than happy to provide a detailed build steps if someone wants to replicate a build.

Best regards,
Kalevi

Hi,

There is an official release for TensorFlow + GPU + x86 Linux machine.
You can install the TensorFlow package via apt-get directly.

We want to check the unstable issue is from TF implementation or Jetson-only.
Please help to apply the experiment on a desktop environment to narrow down the root cause.

Thanks.

Hi AastaLLL

Thanks for you link. I’m aware of it. I assume the x86+GPU gets used a lot and it works.

I don’t have a Nvidia GPU desktop PC to try. However as you can seen from my earlier post feel free to save the following in to a file and run it:

import tensorflow as tf

hello = tf.constant(“helloe world”)
sess = tf.Session()
print (sess.run(hello))

Here is a heavier example that is very slow to start: (the inception model which I refer to)

git clone GitHub - tensorflow/models: Models and examples built with TensorFlow

cd models/tutorials/image/imagenet
python classify_image.py

(note that on first run it downloads the model in to /tmp)

On my board this fails most of the time. However the above helloworld is not relevant except that there seems to be a memory “leak” and strange behavior related to consecutive runs improving the startup time.

Best regards,
Kalevi

Hi,

TensorFlow will generate some CUDA PTX code at the beginning.
So it may take a long time when first launching.

Thanks.

Hi Aasta,

Would you mind running the bellow python command twice. So I have an idea of what constitutes normal.
Thanks in advance,
Kalevi


git clone GitHub - tensorflow/models: Models and examples built with TensorFlow
cd models/tutorials/image/imagenet

time python classify_image.py

Here is my 2 runs using GTX 1050i board under x86 Ubuntu,

chuang@chijen-All-Series:~/ai/models/tutorials/image/imagenet$ time python classify_image.py

Downloading inception-2015-12-05.tgz 100.0%
Successfully downloaded inception-2015-12-05.tgz 88931400 bytes.
2018-02-15 10:11:25.927997: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-15 10:11:26.577493: W tensorflow/core/framework/op_def_util.cc:334] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296)
custard apple (score = 0.00147)
earthstar (score = 0.00117)

real 0m23.560s
user 0m3.988s
sys 0m2.500s
chuang@chijen-All-Series:~/ai/models/tutorials/image/imagenet$ time python classify_image.py
2018-02-15 10:14:55.869540: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-15 10:14:56.273840: W tensorflow/core/framework/op_def_util.cc:334] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296)
custard apple (score = 0.00147)
earthstar (score = 0.00117)

real 0m2.810s
user 0m3.332s
sys 0m1.324s
chuang@chijen-All-Series:~/ai/models/tutorials/image/imagenet$

This conforms to what AastaLLL described.

Thanks Chijen!

Best regards,
Kalevi

Hi,

This might be a noobish question, but will the wheel file work for a Jetson TX1 board as well?

Thanks

Hi,

You can try the wheel included here:
[url]https://github.com/peterlee0127/tensorflow-nvJetson[/url]

Thanks.