however when I try and run the following trivial example:
import tensorflow as tf
# Creates a graph.
print('a')
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
print('b')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
print('c')
c = tf.matmul(a, b)
print('d')
# Creates a session with log_device_placement set to True.
print('e')
sess = tf.Session()#config=tf.ConfigProto(log_device_placement=True))
## Runs the op.
print('f')
print(sess.run(c))
I get the following error over and over again:
2016-05-06 05:43:48.865368: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 1048576 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865480: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1048576
2016-05-06 05:43:48.865508: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 943872 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865532: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 943872
2016-05-06 05:43:48.865568: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 849664 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865619: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 849664
2016-05-06 05:43:48.865644: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 764928 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:43:48.865667: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 764928
Thank you @AsataLL - It will be interesting to see if you are able to reproduce the problem internally.
If you are able to get this TF1.6 to run with Jetpack 3.2 can you provide detailed instructions of how you did this? Also if possible would you be able to provide a Python wheel? I am using Python 3.
Dear @AstaLLL I am afraid that I agree with @Hallon - the script (tf1.6_install_wheel.sh) does not work correctly. When I first ran the script it worked. However, after reboot, when running even a very simple operation such as the following:
>>> import tensorflow as tf
>>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a,b)
>>> sess = tf.Session()
2016-05-06 05:45:36.431471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] ARM64 does not support NUMA - returning NUMA node zero
2016-05-06 05:45:36.431593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1208] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 5.56GiB
2016-05-06 05:45:36.431662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1308] Adding visible gpu devices: 0
2016-05-06 05:45:37.126531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5208 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
>>> sess.run(c)
I get errors which look like this:
2016-05-06 05:46:07.460227: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460244: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460261: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460279: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460295: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460313: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460336: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460354: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460371: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460388: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460442: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460463: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460480: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460497: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460514: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460531: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460547: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460565: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460581: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460599: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460615: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460633: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460649: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460667: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460683: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460701: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460717: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
2016-05-06 05:46:07.460735: E tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN
2016-05-06 05:46:07.460751: W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
Can you please reboot the jetson where you ran your install script and try my code to verify that you observe the same issue as me?
@AastaLLL thank you for confirming that you are able to reproduce the error. It’s interesting that it’s CUDA 9.0 related. I look forward to the resolution on this.
@AastaLLL this appears to fix the problem! My understanding is that the allow_growth option increases graphics memory dynamically and as needed, rather than mapping all the available GPU memory to the tensorflow process. What is the original cause of the problem and what is the reason that this is solving the issue?
Per Tensor RT documentation: ------
by default it will try to allocate all the available GPU memory.
------
On fresh boot the available memory will be very high (6.2 GB).
On iGPU environment, such a huge memory allocation will fail in general as host and GPU share the same memory.
The workaround restrict the amount of memory allocation hence it passes.