RuntimeError : cuda runtime error

hi i still stuck running the deep learning model help me

=================================================================
my question

Hi, I want to run the deep learning model on Jeson tx2.

I installed pytorch without problems and cuda 9.0 with cudnn 7.0
ex) i checked by ‘import torch’ , i checked by ‘nvcc --version’
but When I try to run erfnet code, I got stuck

“RuntimeError : cuda runtime error(7) : too many resources requested for launch at /home/nvidia/pytorch/aten/src/THCUNN/im2col.h”

please help me!.

=================================================================
your reply

Hi,

This error usually occurs when running out of memory.

Could you try to reboot the TX2 and try it again?
Installation may hold some memory resource and can be released via restart.

Thanks.

=================================================================
question again. still got stuck

thank you for your reply

when I did reboot, I still can not run the model.

$free m
total : 8032548
used : 939124
free : 6500083
shared : 28992
buff/cache : 593340
available : 6982456

I really need your help

Hi,

We redirect the topic 1032040 here since duplicated.

Another common cause is that your model is too large to run on Jetson.
Have you tried to deploy it on x86 environment?
If yes, could you share the status report of nvidia-smi?

More, could you also monitor the TX2 GPU status with tegrastats and share with us?

sudo ./tegrastats

Thanks.

This problem has been fixed if you’re still interested.

This has to do with CUDA 9.0 attempting to allocate more registers to each thread. This can be fixed by setting a launch bound on the cuda kernels in im2col.h. You should be able to just pull the latest pytorch version and re-install it and it would work.

For more details look at this thread:

Hi, @singularity7 I met the similar question when I run pytorch model on tx2. could you give me some advice?

THCudaCheck FAIL file=/home/nvidia/Documents/pytorch/aten/src/THC/THCTensorSort.cu line=62 error=7 : too many resources requested for launch
when I run the maskrcnn (project link: https://github.com/facebookresearch/maskrcnn-benchmark) inference demo on nvidia jetson TX2, and it occurs the runtime error (7). by the way, I also test the YOLO v3 network on TX2, which works well.

  • PyTorch : torch-1.1.0a0+7c66ad7
  • install PyTorch : from source
  • Python version: 3.5
  • CUDA/cuDNN version: cuda9.0 and cudnn7.1.5 (original from TX2 jetpack 3.3)

error log:

THCudaCheck FAIL file=/home/nvidia/Documents/pytorch/aten/src/THC/THCTensorSort.cu line=62 error=7 : too many resources requested for launch
Traceback (most recent call last):
File “demo.py”, line 88, in
predictions = coco_demo.run_on_opencv_image(image)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/demo/predictor.py”, line 93, in run_on_opencv_image
predictions = self.compute_prediction(image)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/demo/predictor.py”, line 124, in compute_prediction
predictions = self.model(image_list)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 492, in call
result = self.forward(*input, **kwargs)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py”, line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 492, in call
result = self.forward(*input, **kwargs)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/rpn.py”, line 96, in forward
return self._forward_test(anchors, objectness, rpn_box_regression)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/rpn.py”, line 122, in _forward_test
boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 492, in call
result = self.forward(*input, **kwargs)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/inference.py”, line 138, in forward
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/inference.py”, line 93, in forward_for_single_feature_map
objectness, topk_idx = objectness.topk(pre_nms_top_n, dim=1, sorted=True)
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/Documents/pytorch/aten/src/THC/THCTensorSort.cu:62

Solution of comment #4 can be found in the following topic:
[url]https://devtalk.nvidia.com/default/topic/1047497/jetson-tx2/runtime-error-7-too-many-resources-requested-for-launch-at-pytorch-aten-src-thc-thctensorsort-cu-/post/5316121/#5316121[/url]

Thanks.