Faster R-CNN: too many resources requested for launch

I have successfully run SSD MobileNet in Tensorflow 1.3 on the Jetson TX2. ~5 FPS - would really benefit from being run in TensorRT!
However, when trying to run Faster R-CNN from Tensorflows Object Detection API I get this “too many resources requested” error.
Did anybody overcome this?
I have no problems running on my 1060 6GB GPU, but it will not run on 8 GB Jetson TX2.

Hi,

We also found low GPU utilization in this API.
Have you profiled it with ./tegrastats? Could you share your result with us?

sudo ~/tegrastats

For TensorFlow on Jetson, it’s recommended to monitor the memory status of TF session:

2018-01-17 03:27:06.178308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
<b>totalMemory: 7.67GiB freeMemory: 4.49GiB</b>
2018-01-17 03:27:06.178396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

Check the freeMemory status. If the freeMemory is abnormal, try to reboot your device.
Thanks.

@AastaLL when running Faster R-CNN the session returns:

2018-01-17 08:34:59.716369: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] ARM64 does not support NUMA - returning NUMA node zero
2018-01-17 08:34:59.716582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.55GiB
2018-01-17 08:34:59.716646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 0
2018-01-17 08:35:00.744659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4016 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

effectively having the same amount of memory available as when I run on my GTX1060 6 GB. I have tested on the TX2 with both TensorFlow 1.3 and 1.5.

The stats when running SSD MobileNet in TensorFlow is shown below:

RAM 7021/7851MB (lfb 5x4MB) cpu [64%@2035,97%@2034,29%@2036,60%@2033,65%@2035,62%@2036] EMC 14%@1866 APE 150 GR3D 3%@1300
RAM 7022/7851MB (lfb 5x4MB) cpu [58%@2006,99%@2034,30%@2034,56%@2005,59%@2008,58%@2006] EMC 13%@1866 APE 150 GR3D 5%@1300
RAM 7022/7851MB (lfb 5x4MB) cpu [62%@2036,68%@2034,68%@2035,61%@2030,57%@2035,61%@2035] EMC 13%@1866 APE 150 GR3D 8%@1300
RAM 7022/7851MB (lfb 5x4MB) cpu [64%@2034,34%@2034,98%@2036,59%@2035,58%@2035,58%@2034] EMC 13%@1866 APE 150 GR3D 7%@1300
RAM 7021/7851MB (lfb 5x4MB) cpu [67%@2034,32%@2035,98%@2035,62%@2035,62%@2034,61%@2034] EMC 13%@1866 APE 150 GR3D 6%@1300
RAM 7022/7851MB (lfb 5x4MB) cpu [60%@2035,43%@2035,76%@2035,61%@2035,58%@2035,60%@2035] EMC 13%@1866 APE 150 GR3D 6%@1300
RAM 7023/7851MB (lfb 5x4MB) cpu [58%@2013,30%@2035,98%@2034,60%@2009,56%@2010,59%@2008] EMC 13%@1866 APE 150 GR3D 10%@1300
RAM 7023/7851MB (lfb 5x4MB) cpu [59%@2015,60%@2034,64%@2035,64%@2034,61%@2035,60%@2034] EMC 14%@1866 APE 150 GR3D 5%@1300

I have clocked the device in preparation to executing the model.

I did some more testing today, and I can supply the above with the following dumps from trying to run Faster R-CNN in TensorFlow on the Jetson TX2.

Jupyter Notebook terminal output:

2018-01-17 16:16:19.584106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] ARM64 does not support NUMA - returning NUMA node zero
2018-01-17 16:16:19.584261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.97GiB
2018-01-17 16:16:19.584312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 0
2018-01-17 16:16:20.824479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4437 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-01-17 16:17:09.816477: E tensorflow/stream_executor/cuda/cuda_driver.cc:1080] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2018-01-17 16:17:09.816703: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x7f001959b0: CUDA_ERROR_LAUNCH_FAILED
2018-01-17 16:17:09.816771: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x7f001959b0: CUDA_ERROR_LAUNCH_FAILED
2018-01-17 16:17:09.816912: E tensorflow/stream_executor/cuda/cuda_dnn.cc:2456] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
2018-01-17 16:17:10.174651: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x7f001959b0: CUDA_ERROR_LAUNCH_FAILED
2018-01-17 16:17:10.174772: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x7f001959b0: CUDA_ERROR_LAUNCH_FAILED
2018-01-17 16:17:10.174806: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x7f001959b0: CUDA_ERROR_LAUNCH_FAILED
2018-01-17 16:17:10.174836: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x7f001959b0: CUDA_ERROR_LAUNCH_FAILED
2018-01-17 16:17:10.174865: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x7f001959b0: CUDA_ERROR_LAUNCH_FAILED

Error dump from printout inside the notebook:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "<ipython-input-5-a51933cd03d8>", line 19, in worker
    im, t_elapsed = detect_objects(frame_rgb, sess, detection_graph)
  File "<ipython-input-4-6c8da66803e2>", line 19, in detect_objects
    feed_dict={image_tensor: image_np_expanded})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
InternalError: cuDNN launch failure : input shape([1,64,138,256]) filter shape([3,3,64,192])
	 [[Node: FirstStageFeatureExtractor/InceptionV2/InceptionV2/Conv2d_2c_3x3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/InceptionV2/InceptionV2/Conv2d_2b_1x1/Relu, FirstStageFeatureExtractor/InceptionV2/Conv2d_2c_3x3/weights/read/_47__cf__53)]]
	 [[Node: BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/SortByField/Equal/_883 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_10508...ield/Equal", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopBatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/non_max_suppression/iou_threshold/_1)]]

Caused by op u'FirstStageFeatureExtractor/InceptionV2/InceptionV2/Conv2d_2c_3x3/Conv2D', defined at:
  File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "<ipython-input-5-a51933cd03d8>", line 10, in worker
    tf.import_graph_def(od_graph_def, name='')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 548, in import_graph_def
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3176, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1617, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): cuDNN launch failure : input shape([1,64,138,256]) filter shape([3,3,64,192])
	 [[Node: FirstStageFeatureExtractor/InceptionV2/InceptionV2/Conv2d_2c_3x3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/InceptionV2/InceptionV2/Conv2d_2b_1x1/Relu, FirstStageFeatureExtractor/InceptionV2/Conv2d_2c_3x3/weights/read/_47__cf__53)]]
	 [[Node: BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/SortByField/Equal/_883 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_10508...ield/Equal", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopBatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/non_max_suppression/iou_threshold/_1)]]

Output of tegrastats at the point of error:

RAM 3151/7851MB (lfb 915x4MB) cpu [2%@345,100%@2034,99%@2034,1%@348,3%@348,6%@349] EMC 5%@1866 APE 150 GR3D 0%@114
RAM 3151/7851MB (lfb 915x4MB) cpu [0%@345,100%@1981,100%@1988,3%@348,5%@348,4%@349] EMC 5%@1866 APE 150 GR3D 0%@114
RAM 3152/7851MB (lfb 915x4MB) cpu [2%@345,100%@2021,100%@2021,4%@348,5%@348,2%@349] EMC 5%@1866 APE 150 GR3D 0%@114
RAM 3152/7851MB (lfb 915x4MB) cpu [2%@345,100%@2035,100%@2034,3%@349,4%@348,2%@348] EMC 5%@1866 APE 150 GR3D 0%@114
RAM 3152/7851MB (lfb 915x4MB) cpu [1%@345,100%@2016,100%@2019,2%@345,1%@349,3%@348] EMC 5%@1866 APE 150 GR3D 0%@114
RAM 3181/7851MB (lfb 898x4MB) cpu [21%@806,100%@2021,56%@2024,8%@499,10%@500,3%@500] EMC 5%@1866 APE 150 GR3D 24%@114
RAM 3210/7851MB (lfb 887x4MB) cpu [8%@345,100%@2018,32%@2026,7%@345,24%@345,13%@349] EMC 5%@1866 APE 150 GR3D 99%@114
RAM 3327/7851MB (lfb 838x4MB) cpu [2%@1573,100%@1987,31%@1992,35%@1574,13%@1575,5%@1573] EMC 5%@1866 APE 150 GR3D 8%@114
RAM 3578/7851MB (lfb 758x4MB) cpu [19%@1806,100%@2080,0%@2035,7%@2035,2%@2035,56%@1727] EMC 5%@1866 APE 150 GR3D 10%@114
RAM 3732/7851MB (lfb 715x4MB) cpu [2%@345,100%@2034,83%@2035,5%@348,21%@345,2%@346] EMC 7%@1866 APE 150 GR3D 99%@624
RAM 3732/7851MB (lfb 715x4MB) cpu [94%@2036,100%@2035,97%@2034,87%@1987,13%@2035,1%@2035] EMC 4%@1866 APE 150 GR3D 43%@1032
RAM 3659/7851MB (lfb 727x4MB) cpu [2%@653,81%@2022,20%@2027,28%@652,2%@655,4%@655] EMC 3%@1866 APE 150 GR3D 0%@114
RAM 3661/7851MB (lfb 727x4MB) cpu [1%@345,100%@2033,0%@2035,1%@346,2%@348,3%@349] EMC 3%@1866 APE 150 GR3D 0%@114
RAM 3661/7851MB (lfb 727x4MB) cpu [2%@345,100%@2035,0%@2034,0%@348,3%@348,0%@348] EMC 2%@1866 APE 150 GR3D 0%@114
RAM 3661/7851MB (lfb 727x4MB) cpu [3%@345,100%@2034,0%@2035,1%@348,1%@348,3%@348] EMC 2%@1866 APE 150 GR3D 0%@114
RAM 3661/7851MB (lfb 727x4MB) cpu [2%@345,100%@2034,0%@2034,2%@348,4%@348,1%@348] EMC 2%@1866 APE 150 GR3D 0%@114
RAM 3661/7851MB (lfb 727x4MB) cpu [4%@345,100%@1988,0%@1987,2%@346,2%@345,1%@345] EMC 2%@1866 APE 150 GR3D 9%@114
RAM 3661/7851MB (lfb 727x4MB) cpu [4%@345,100%@2026,0%@2026,1%@347,0%@348,3%@348] EMC 2%@1866 APE 150 GR3D 0%@114
RAM 3661/7851MB (lfb 727x4MB) cpu [8%@345,100%@2024,0%@2028,5%@345,8%@345,1%@345] EMC 2%@1866 APE 150 GR3D 0%@114

As you can see, the RAM are nowhere near full in this case.

@AastaLLL any updates here?

Hi,

Could you check if this issue is from limited resource via following experiments:

1. Limiting the GPU usage

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.7
session = tf.Session(config=config, ...)

2. Traced the memory consumption via tegrastats

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
sudo ~/tegrastats

Thanks

Hi AastaLLL,

Running model http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2017_11_08.tar.gz without limiting resources:

2018-01-22 08:49:31.074043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 3.20GiB
2018-01-22 08:49:31.074101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 0
2018-01-22 08:49:31.756800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2705 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-01-22 08:49:55.624323: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-01-22 08:49:55.624461: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-01-22 08:49:55.624497: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

Tegrastats:

RAM 6469/7851MB (lfb 5x4MB) cpu [51%@1981,off,off,8%@1980,56%@1980,100%@1983] EMC 2%@1600 APE 150 GR3D 0%@114
RAM 6471/7851MB (lfb 5x4MB) cpu [54%@2035,off,off,8%@2034,50%@2034,100%@2035] EMC 2%@1600 APE 150 GR3D 0%@114
RAM 6475/7851MB (lfb 5x4MB) cpu [9%@2049,off,off,10%@2034,100%@2035,100%@2033] EMC 2%@1600 APE 150 GR3D 0%@216
RAM 6475/7851MB (lfb 5x4MB) cpu [7%@2036,off,off,6%@2034,100%@2035,100%@2036] EMC 2%@1600 APE 150 GR3D 26%@114
RAM 6475/7851MB (lfb 5x4MB) cpu [5%@1982,off,off,3%@1981,100%@1982,100%@1981] EMC 2%@1600 APE 150 GR3D 0%@114

RAM is not filled up - same error goes therefore with limiting resources.

If I try a larger model http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_2017_11_08.tar.gz without restricting memory I get the error presented in #4 (https://devtalk.nvidia.com/default/topic/1028798/jetson-tx2/faster-r-cnn-too-many-resources-requested-for-launch/post/5233403/#5233403).

I have the same problem. I can run a SSD model but it not satisfactory and i need a faster_rcnn.
I installed JetPack 3.2 RC and compiled the last tensorflow master from last week.

If i don’t limit the GPU usage and only set allow_growth :

RAM 7142/7852MB (lfb 17x4MB) SWAP 46/16384MB (cached 3MB) CPU [6%@2022,95%@2035,0%@2012,9%@2022,10%@2019,8%@2022] EMC_FREQ 6%@1866 GR3D_FREQ 0%@1300
RAM 7638/7852MB (lfb 13x4MB) SWAP 68/16384MB (cached 8MB) CPU [12%@2021,15%@2035,28%@2035,17%@2015,20%@2019,14%@2017] EMC_FREQ 4%@1866 GR3D_FREQ 0%@1300
RAM 7738/7852MB (lfb 13x4MB) SWAP 68/16384MB (cached 7MB) CPU [20%@2013,47%@2034,16%@2035,15%@2013,14%@2015,11%@2019] EMC_FREQ 9%@1866 GR3D_FREQ 98%@1300
RAM 7732/7852MB (lfb 13x4MB) SWAP 80/16384MB (cached 12MB) CPU [9%@2023,75%@2034,0%@2035,6%@2023,13%@2025,18%@2019] EMC_FREQ 18%@1866 GR3D_FREQ 99%@1300
RAM 7724/7852MB (lfb 10x4MB) SWAP 80/16384MB (cached 5MB) CPU [17%@2036,68%@2035,0%@2034,5%@2036,10%@2035,13%@2036] EMC_FREQ 18%@1866 GR3D_FREQ 99%@1300
RAM 7722/7852MB (lfb 10x4MB) SWAP 93/16384MB (cached 15MB) CPU [13%@2032,88%@2035,0%@2034,2%@2033,9%@2019,3%@2032] EMC_FREQ 22%@1866 GR3D_FREQ 99%@1300
RAM 7713/7852MB (lfb 10x4MB) SWAP 93/16384MB (cached 7MB) CPU [7%@2014,0%@2035,0%@2034,1%@2030,13%@2020,8%@2022] EMC_FREQ 19%@1866 GR3D_FREQ 0%@1300
RAM 6835/7852MB (lfb 63x4MB) SWAP 93/16384MB (cached 7MB) CPU [4%@2021,0%@2034,0%@2034,10%@2017,8%@2019,12%@2020] EMC_FREQ 9%@1866 GR3D_FREQ 0%@1300
tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
tensorflow/core/framework/op_kernel.cc:1198] Internal: WhereOp: Could not launch cub::DeviceSelect::Flagged to copy indices out, status: too many resources requested for launch

Limiting the CPU usage has the same result. The only difference is that i don’t get the CUDA_ERROR_OUT_OF_MEMORY error.

It seems that indeed all my memory is consumed but i don’t understand why running the model on a single image would need so much ressources.
I have trained my own model using faster_rcnn_resnet101_coco.config and the pre-trained dataset that come with it.

For information this is using C++, if i try using python3 i only get :

tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to record completion event; therefore, failed to create inter-stream dependency
tensorflow/stream_executor/event.cc:40] could not create CUDA event: CUDA_ERROR_UNKNOWN

Also i can no longer export frozen graphs since i did the JetPack 3.2 update.
The tensorflow-models export_inference_graph.py script spam the following when i try :

tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2304
tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN

Hi,

What is the batch size you used when inferencing?
If it’s more than 1, could you set it to 1 and give it a try?

Thanks.

Hi AastaLLL,
Batch size is 1.

Hi AastaLLL,

Post #7 was a TensorFlow error - I have now fixed this and are back to the original error as described in my first few posts.
This is regardless of which Faster R-CNN model used.

Hi,

Could you share which faster-rcnn model do you use?
[url]https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md[/url]

Have you check if there is an error with faster_rcnn_inception_v2_coco?

Thanks.

That’s is the one I have mainly tried my luck with: [url]http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2017_11_08.tar.gz[/url]
and I have also tested: [url]http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_2017_11_08.tar.gz[/url]
Same error applies.

On my side i have tested with the matching train configuration :
faster_rcnn_inception_v2_coco
faster_rcnn_resnet101_coco
ssd_mobilenet_v1_coco
ssd_inception_v2_coco

The training was done on the cloud, i only want to load a frozen graph on the Jetson but it always failed at the first frame i want to analyse.

Only SSD can be used, all faster_* fails with “too many resources requested for launch”

It’s the same with CUDA8/9, Jetpack 3.1/3.2, Tensorflow 1.4/1.5

I could try converting my frozen graph using TensorRT but i have no idea how to do that and could not find ressources to make it compatible.

I really wonder if faster_rccn is simply possible on the Jetson at all.

@jimmy: you are having the issues as I am. In order to have AaastaLL focus on the issue to run Faster R-CNN models from TensorFlow, lets not get into into a discussion on how to convert the graphs to TensorRT. This will require you to rewrite all unsupported layers as plugins for TensorRT.

@AaastaLL: We continue to seek help on how to run Faster R-CNN by overcoming the problem of “too many resources required”.

Hi,

We have tested the faster_rcnn_inception_resnet_v2_atrous_coco_2017_11_08 model.
This model is too large to fit into Jetson.

RAM 7749/7844MB (lfb 1x512kB) CPU ...
<b>RAM 7767/7844MB (lfb 1x512kB)</b> CPU ... <- reach the maximum
RAM 5626/7844MB (lfb 31x4MB) CPU ...

Have you run this model on desktop GPU with TF GPU mode successfully?

Thanks.

Yes, I have tested this with a GTX 1060 6GB and it can run without problems.
Could you try testing faster_rcnn_inception_v2_coco? - this should be somewhat smaller and still causes problems on the Jetson.
There must be something a bit odd here. Another thing I tried was to run the Jetson without graphical user interface leaving about 6 GB free memory - even here it failed. When testing on my GTX1060 I effectively have 4.7 GB free memory and this does not cause problems.

Hi,

Could you check if there is also an OOM error in CPU mode?

If both CPU mode and GPU mode hit the OOM error, this model is too large to run on Jetson.
If the error only occurs in GPU mode, it might something incorrect in GPU implementation.

It’s hard to compare TX2 to a desktop use case since we don’t know the exactly memory placement of TF.
Usually, a desktop have 6G GPU memory + 8G CPU memory but TX2 only have 8G memory in total.

Thanks.

@AastaLLL, I did not have success forcing TensorFlow to use the CPU only on the Jetson platform. Do you know how to do this?
I tried all that was suggested for PCs and it works on PC but on the Jetson it keeps finding and using the Tegra GPU.

Hi,

You can add tf.device(‘/cpu:0’) before importing a model.

with tf.device('/cpu:0'):
    tf.import_graph_def(od_graph_def, name='')

Thanks.