Tensorflow 1.6 not working with Jetpack 3.2
I installed tensorflow 1.6 with Jetpack 3.2 as outlined here: https://gist.github.com/vellamike/7c26158c93e89ef155c1cc953bbba956 however when I try and run the following trivial example: [code]import tensorflow as tf # Creates a graph. print('a') a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') print('b') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') print('c') c = tf.matmul(a, b) print('d') # Creates a session with log_device_placement set to True. print('e') sess = tf.Session()#config=tf.ConfigProto(log_device_placement=True)) ## Runs the op. print('f') print(sess.run(c)) [/code] I get the following error over and over again: [code]2016-05-06 05:43:48.865368: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 1048576 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:43:48.865480: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1048576 2016-05-06 05:43:48.865508: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 943872 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:43:48.865532: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 943872 2016-05-06 05:43:48.865568: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 849664 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:43:48.865619: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 849664 2016-05-06 05:43:48.865644: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 764928 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:43:48.865667: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 764928 [/code]
I installed tensorflow 1.6 with Jetpack 3.2 as outlined here: https://gist.github.com/vellamike/7c26158c93e89ef155c1cc953bbba956


however when I try and run the following trivial example:

import tensorflow as tf
# Creates a graph.
print('a')
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
print('b')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
print('c')
c = tf.matmul(a, b)
print('d')

# Creates a session with log_device_placement set to True.
print('e')
sess = tf.Session()#config=tf.ConfigProto(log_device_placement=True))
## Runs the op.
print('f')
print(sess.run(c))

I get the following error over and over again:

2016-05-06 05:43:48.865368: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 1048576 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865480: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1048576
2016-05-06 05:43:48.865508: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 943872 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865532: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 943872
2016-05-06 05:43:48.865568: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 849664 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865619: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 849664
2016-05-06 05:43:48.865644: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 764928 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865667: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 764928

#1
Posted 02/07/2018 01:47 PM   
Hi, Thanks for your feedback. We are checking this issue internally. Will update information to you later.
Hi,

Thanks for your feedback.
We are checking this issue internally. Will update information to you later.

#2
Posted 02/08/2018 03:21 AM   
Thank you @AsataLL - It will be interesting to see if you are able to reproduce the problem internally. If you are able to get this TF1.6 to run with Jetpack 3.2 can you provide detailed instructions of how you did this? Also if possible would you be able to provide a Python wheel? I am using Python 3.
Thank you @AsataLL - It will be interesting to see if you are able to reproduce the problem internally.

If you are able to get this TF1.6 to run with Jetpack 3.2 can you provide detailed instructions of how you did this? Also if possible would you be able to provide a Python wheel? I am using Python 3.

#3
Posted 02/08/2018 09:14 AM   
Hi, We can reproduce this error internally. Guess that there is an authority issue since we can launch tf session successfully before rebooting. Our script and pip wheel can be found here(python2 only): https://github.com/AastaNV/JEP/tree/master/script/TensorFlow_1.6 Will update information with you later. Thanks.
Hi,

We can reproduce this error internally.
Guess that there is an authority issue since we can launch tf session successfully before rebooting.

Our script and pip wheel can be found here(python2 only):

https://github.com/AastaNV/JEP/tree/master/script/TensorFlow_1.6


Will update information with you later.

Thanks.

#4
Posted 02/09/2018 09:22 AM   
@AastaLLL - thank you for your support. Please let me know when you have found a resolution to this issue. Mike
@AastaLLL - thank you for your support. Please let me know when you have found a resolution to this issue.

Mike

#5
Posted 02/09/2018 09:31 AM   
Hi, We have tested TF-1.6[b]rc1[/b] on JetPack3.1 and it work correctly. (Previous is TF-1.6[b]rc0[/b]) Could you help to check TF-1.6rc1 on JetPack3.2 DP? You can build it with this script: https://github.com/AastaNV/JEP/tree/master/script/TensorFlow_1.6 Thanks.
Hi,

We have tested TF-1.6rc1 on JetPack3.1 and it work correctly. (Previous is TF-1.6rc0)

Could you help to check TF-1.6rc1 on JetPack3.2 DP?
You can build it with this script:

https://github.com/AastaNV/JEP/tree/master/script/TensorFlow_1.6


Thanks.

#6
Posted 02/13/2018 09:28 AM   
Hello all I built the 1.6rc0 release with JetPack 3.2 yesterday and also had problems like above when running a network. Thanks for providing the scripts updated for the latest version JetPack! I just rebuilt everything using the latest version on the 1.6rc1 branch and it works perfectly with JetPack 3.2 DP.
Hello all

I built the 1.6rc0 release with JetPack 3.2 yesterday and also had problems like above when running a network.

Thanks for providing the scripts updated for the latest version JetPack!

I just rebuilt everything using the latest version on the 1.6rc1 branch and it works perfectly with JetPack 3.2 DP.

#7
Posted 02/13/2018 04:05 PM   
It seems I was a bit too quick to celebrate. Sometimes the same error presents itself. It's intermittent and very difficult to pinpoint. At first I got it after a reboot. Later after attempting to free up memory to fit a larger model. I could run my tests, but the stability is abysmal.
It seems I was a bit too quick to celebrate. Sometimes the same error presents itself. It's intermittent and very difficult to pinpoint.

At first I got it after a reboot. Later after attempting to free up memory to fit a larger model.

I could run my tests, but the stability is abysmal.

#8
Posted 02/13/2018 04:48 PM   
Dear @AstaLLL I am afraid that I agree with @Hallon - the script (tf1.6_install_wheel.sh) does not work correctly. When I first ran the script it worked. However, after reboot, when running even a very simple operation such as the following: [code]>>> import tensorflow as tf >>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') >>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') >>> c = tf.matmul(a,b) >>> sess = tf.Session() 2016-05-06 05:45:36.431471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] ARM64 does not support NUMA - returning NUMA node zero 2016-05-06 05:45:36.431593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1208] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005 pciBusID: 0000:00:00.0 totalMemory: 7.67GiB freeMemory: 5.56GiB 2016-05-06 05:45:36.431662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1308] Adding visible gpu devices: 0 2016-05-06 05:45:37.126531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5208 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) >>> sess.run(c)[/code] I get errors which look like this: [code]2016-05-06 05:46:07.460227: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460244: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460261: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460279: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460295: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460313: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460336: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460354: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460371: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460388: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460442: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460463: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460480: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460497: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460514: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460531: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460547: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460565: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460581: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460599: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460615: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460633: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460649: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460667: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460683: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460701: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460717: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 2016-05-06 05:46:07.460735: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN 2016-05-06 05:46:07.460751: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304 [/code] Can you please reboot the jetson where you ran your install script and try my code to verify that you observe the same issue as me?
Dear @AstaLLL I am afraid that I agree with @Hallon - the script (tf1.6_install_wheel.sh) does not work correctly. When I first ran the script it worked. However, after reboot, when running even a very simple operation such as the following:


>>> import tensorflow as tf
>>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a,b)
>>> sess = tf.Session()
2016-05-06 05:45:36.431471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] ARM64 does not support NUMA - returning NUMA node zero
2016-05-06 05:45:36.431593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1208] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 5.56GiB
2016-05-06 05:45:36.431662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1308] Adding visible gpu devices: 0
2016-05-06 05:45:37.126531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5208 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
>>> sess.run(c)


I get errors which look like this:

2016-05-06 05:46:07.460227: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460244: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460261: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460279: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460295: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460313: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460336: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460354: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460371: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460388: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460442: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460463: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460480: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460497: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460514: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460531: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460547: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460565: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460581: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460599: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460615: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460633: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460649: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460667: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460683: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460701: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460717: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460735: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460751: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304


Can you please reboot the jetson where you ran your install script and try my code to verify that you observe the same issue as me?

#9
Posted 43 minutes ago   
Scroll To Top

Add Reply