Hi,i’m running tensorflow with python3.5 on tx2 but this seems unstable.It runs normally only first time i launched python script,but the else i got message like below and stuck.
2017-12-12 06:02:47.064075: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:02:47.064203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.18GiB
2017-12-12 06:02:47.064255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-12 06:02:47.064279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-12 06:02:47.064310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-12-12 06:04:09.279612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:04:09.279745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.33GiB
2017-12-12 06:04:09.279795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-12 06:04:09.279830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-12 06:04:09.279868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
The information about gpu shows twice, it should show only once if ran normally .
I reboot tx2 just now and got error message like this:
2017-12-12 06:21:32.375742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:21:32.375870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.09GiB
2017-12-12 06:21:32.375923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-12 06:21:32.376007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-12 06:21:32.376039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-12-12 06:22:14.858684: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2017-12-12 06:22:14.858769: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0xaedda10: CUDA_ERROR_LAUNCH_FAILED
2017-12-12 06:22:14.858799: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0xaedda10: CUDA_ERROR_LAUNCH_FAILED
2017-12-12 06:22:14.858956: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
2017-12-12 06:23:02.713872: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:23:02.713999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.72GiB
2017-12-12 06:23:02.714054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-12 06:23:02.714079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-12 06:23:02.714105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
Hi,
Which TensorFlow build do you use?
Usually, we use this public build:
We can launch TensorFlow correctly with JetPack3.1.
Could you also give it a try?
Thanks.
I build my tensorflow according to https://syed-ahmed.gitbooks.io/nvidia-jetson-tx2-recipes/content/first-question.html and it runs ok.
python3
Python 3.5.2 (default,Nov 23 2017,16:37:01)
[GCC 5.4.0 20160609] on linux
Type "Help","copyright" for more information
>>> import tensorflow as tf
>>> print(tf.__version__)
1.3.0
>>>
Hi there my problem remains even Tensorflow seems work fine.Can you help me?
Hi,
Could you try Tensorflow 1.3.0 or the wheel shared in comment #4?
Based on this issue, the CUDA_ERROR_LAUNCH_FAILED error is gone after upgrading environment to TensorFlow 1.3.0 and cuDNN v6.
Thanks.
Hi,
CUDA_ERROR_LAUNCH_FAILED usually comes from incorrect CUDA version/driver or GPU architecture.
Here is another public TensorFlow build for Python 3.5:
Could you reflash TX2 with JetPack3.1 and give this wheel a try?
Thanks.
Yes,i do flashed my tx2 with JetPack3.1 and i just uninstall & install the tensorflow as you recommend ,but error remain the same.Thank you for your help.
Hi,
Thanks for your feedback.
We will check this issue and reply information to you later.
Hi,
We can run TensorFlow correctly with python 3.5:
nvidia@tegra-ubuntu:/media/nvidia/NVIDIA$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2017-12-15 03:22:31.509179: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-15 03:22:31.509304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 369.01MiB
2017-12-15 03:22:31.509358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-15 03:22:31.509383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-15 03:22:31.509406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
Here are our steps:
1. Flash TX2 with JetPack3.1
2. Upgrade cuDNNv7 via this package
3. Install TensorFlow
$ sudo apt-get install -y python3-pip python3-dev
$ pip3 install tensorflow-1.3.0-cp35-cp35m-linux_aarch64.whl
Could you follow our steps and check if the issue remains?
If yes, please help to test a CUDA sample for GPU functionality.
$ /usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh .
$ cd NVIDIA_CUDA-8.0_Samples/0_Simple/vectorAdd
$ make && ./vectorAdd
Thanks, and please let us know the results.
Updating cuda by tar file seems not work,the output of test is:
sudo dpkg -l | grep TensorRT
[sudo] password for nvidia:
ii libnvinfer-dev 3.0.2-1+cuda8.0 arm64 TensorRT development libraries and headers
ii libnvinfer3 3.0.2-1+cuda8.0 arm64 TensorRT runtime libraries
ii tensorrt-2.1.2 3.0.2-1+cuda8.0 arm64 Meta package of TensorRT
While by installing deb file can works better ?
sudo dpkg -l | grep TensorRT
ii libnvinfer-dev 4.0.0-1+cuda8.0 arm64 TensorRT development libraries and headers
ii libnvinfer-samples 4.0.0-1+cuda8.0 arm64 TensorRT samples and documentation
ii libnvinfer3 3.0.2-1+cuda8.0 arm64 TensorRT runtime libraries
ii libnvinfer4 4.0.0-1+cuda8.0 arm64 TensorRT runtime libraries
ii tensorrt 3.0.0-1+cuda8.0 arm64 Meta package of TensorRT
ii tensorrt-2.1.2 3.0.2-1+cuda8.0 arm64 Meta package of TensorRT
Coping test files
/usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh .
Copying samples to ./NVIDIA_CUDA-8.0_Samples now...
Finished copying samples.
Running test code:
make && ./vectorAdd
/usr/local/cuda-8.0/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_62,code=compute_62 -o vectorAdd.o -c vectorAdd.cu
/usr/local/cuda-8.0/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_62,code=compute_62 -o vectorAdd vectorAdd.o
mkdir -p ../../bin/aarch64/linux/release
cp vectorAdd ../../bin/aarch64/linux/release
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
The Tensorflow test script can run well, but you may notice some information like “Total memory: 7.67GiB
Free memory: 369.01MiB” .I ran my inference script and problem remain
2017-12-15 05:55:47.193361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-15 05:55:47.193494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.67GiB
2017-12-15 05:55:47.193548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-15 05:55:47.193576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-15 05:55:47.193603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-12-15 05:57:12.935098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-15 05:57:12.935343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.82GiB
2017-12-15 05:57:12.935439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-15 05:57:12.935483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-12-15 05:57:12.935531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
I’m testing faster-rcnn(resnet v2) inference script, this script can be ran well by a small pc box(intel i5,4GB RAM,no GPU) in 44 secs per image (600x800).
I check my scipt running state,every time i launched my script message_server.py, there are two thread ran like:
ps -aux | grep python
nvidia 2945 39.6 2.0 1800792 165368 pts/7 Sl+ 06:23 0:07 python3 message_server.py
nvidia 3021 95.5 8.2 1713920 662340 pts/7 R+ 06:23 0:13 python3 message_server.py
nvidia 3034 0.0 0.0 5560 604 pts/2 S+ 06:23 0:00 grep -color=auto message_server.py
I test another script test_tensorflow.py whose content is :
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
and there will only one thread appear.
So the problem may be caused by competition usage of GPU by two thread? But how did this happen?
After upgrading to cuDNNv7, it works with Python 3.5 for me. Thanks AastaLLL.
Sorry,my problem remains,but updating cuDNNv7 works according to reply from garrett.floft.i’ll close this topic.
Actually I met a situation where the tensorflow on TX1 stucks and it runs very slow with both python3.5 and python 2.7. My TX1 has R28.1.
Does anyone know how to update to cudnn v7?
I fixed my tensorflow stuck with rebuilding the Terga 28.1 kernel and creating a swap file.
And the NUMA warning does not really affect things.