TensorRT 2.1 OutOfMemory Error in buildSingleLayer

mjones · August 22, 2017, 7:44pm

Hey Nvidia,

I’m looking for some help with TensorRT.

I receive the following error when passing a deployed (.prototxt) file to the giexec tool (I also encounter the same error in my own implementation of the code where TensorRT is integrated with the API).

Internal error: could not find any implementation for node `global_pool`, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
cudnnBuilder2.cpp (586) - OutOfMemory Error in buildSingleLayer

In net.prototxt, global_pool is implemented as follows

layer {
  name: "global_pool"
  type: "Pooling"
  bottom: "concat"
  top: "global_pool"
  pooling_param {
     pool: AVE
     kernel_size: 32
     stride: 32
     pad: 0
  }
}

What I’ve done to address this error:

Increasing the workspace size (code block below). I've tried everything for n ranging from 1, 20; for N, I've done the same (ranging from 1, 20 ... though, there is a maximum available memory for temporary operations on the GPU).

builder->setMaxWorkspaceSize(n << N);

Using identical project files on laptop/desktop with TensorRT 2.1 installed ... inference works perfectly.

I’m not sure where to go from here. Looking for some advice.

Thanks,
Matthew J

Honey_Patouceul · August 22, 2017, 8:52pm

Not sure at all it will help, but if you have a SD Card or some disk, you may add swap and give it a try.

AastaLLL · August 23, 2017, 5:46am

Hi,

Could you check how many memory is used with TensorRT on the desktop version?
Thanks.

mjones · August 23, 2017, 1:53pm

Hey AastaLLL,

I should add that the code is the same on both the Tegra and the Desktop. That being said, the GPU memory usage on the Desktop does not exceed 340 MiB according to nvidia-smi.

I should add that I’m trying to implement the network in 16 bit floating point. Everything works as expected in 32 bit floating point. Both 16 and 32 bit modes work fine on the Desktop/Laptop … only 32 bit works on the Tegra. I forgot to add this important point in my original post.

Thanks,
MJ

AastaLLL · August 24, 2017, 3:11am

Hi,

What is your batch size? Could you lower the batch size, and try it again?

xiaoyang · November 21, 2017, 8:25am

Hi mjones,
Did you solve this issue? I met the same problem with you.

AastaLLL · November 22, 2017, 3:45am

Hi,

There are two suggestions about this issue:

1. Decrease batch size

2. Increase workspace size
Please check this page for more information:
[url]NVIDIA Documentation Center | NVIDIA Developer

Thanks.

jd_ruan · December 6, 2017, 8:18pm

I hit the same issue when building FP16 model for Tesla P100. Are you sure this is related to workspace size? Why does building FP16 need more memory than building the same FP32 counterpart?

I kept increasing the workspace to 16 GB and then I hit gieCudaMalloc failures

Total Activation Memory: 17213442048
resources.cpp (57) - Cuda Error in gieCudaMalloc: 2

AastaLLL · December 7, 2017, 5:48am

Hi,

Could you share your model file?
We want to reproduce this issue on our side and give a further suggestion.

Thanks.

jd_ruan · December 7, 2017, 4:44pm

It was a bug in my code. I was building FP16 model by calling C++ IBuilder::setHalf2Mode(true), but I still set weights as DataType::DT_kFLOAT somewhere in my code. After I converted weights to FP16 and set the type to DT_kHALF, I could successfully build the model.

We should fix the error message.

AastaLLL · December 8, 2017, 6:29am

Thanks for your feedback.