Memory leak in OpenCL under Linux when the number of kernels calls is huge
Hi folks! As I do not see an OpenCL specific forum I had to post this here... As the title says I am facing a memory leak in OpenCL that is really hard to find... I am running Monte Carlo simulations of magnetic particles and I have to do many kernels calls. Everything goes fine until the number of kernels calls gets close to 10^9. At this point the memory consumption goes from around 70 mb to more than 4 gb and the kernel kills the executable. ON the screen appears the error code -6, related to CL_OUT_OF_HOST_MEMORY. If the simulation parameters are so that the total number of calls is not this high the code runs ok, everything correct... It is a little big but I can post parts of it here if it helps. At the dmesg output appears the following message when a memory leak occurs: NVRM: Xid (0000:02:00): 31, Ch 00000001, engmask 00000101, intr 10000000 My system specs are the following: Operational system: Arch Linux Kernel version: 3.7.4 Nvidia driver: 313.18 What does this NVRM error means? What can I do to have a clue of where the problem is? Thanks in advance, Wellington
Hi folks!

As I do not see an OpenCL specific forum I had to post this here... As the title says I am facing a memory leak in OpenCL that is really hard to find... I am running Monte Carlo simulations of magnetic particles and I have to do many kernels calls. Everything goes fine until the number of kernels calls gets close to 10^9. At this point the memory consumption goes from around 70 mb to more than 4 gb and the kernel kills the executable. ON the screen appears the error code -6, related to CL_OUT_OF_HOST_MEMORY. If the simulation parameters are so that the total number of calls is not this high the code runs ok, everything correct... It is a little big but I can post parts of it here if it helps. At the dmesg output appears the following message when a memory leak occurs:

NVRM: Xid (0000:02:00): 31, Ch 00000001, engmask 00000101, intr 10000000

My system specs are the following:

Operational system: Arch Linux
Kernel version: 3.7.4
Nvidia driver: 313.18

What does this NVRM error means? What can I do to have a clue of where the problem is?

Thanks in advance,

Wellington

#1
Posted 02/06/2013 03:09 PM   
There is something I forgot to say on the previous post. I do not use events when I submit a kernel. The kernel is enqueued this way: [code] a.global_size = {p.N_first_reduction}; a.local_size = {p.local_size}; p.queue.enqueueNDRangeKernel(p.E_int, a.kernel_offset, a.global_size, a.local_size); [/code] It is inside a for loop that calls it many times
There is something I forgot to say on the previous post. I do not use events when I submit a kernel. The kernel is enqueued this way:

a.global_size = {p.N_first_reduction};
a.local_size = {p.local_size};
p.queue.enqueueNDRangeKernel(p.E_int, a.kernel_offset, a.global_size, a.local_size);


It is inside a for loop that calls it many times

#2
Posted 02/06/2013 03:28 PM   
I ran across a similar problem recently. Do you remember to release the clEvent after each kernel call?
I ran across a similar problem recently. Do you remember to release the clEvent after each kernel call?

#3
Posted 02/07/2013 08:41 PM   
Is this necessary even when not using events? I am aware that when events are used and not released a memory leak occurs... But I do not use them... This is driving me nuts... I tried enqueueing the kernel with [code] p.queue.enqueueNDRangeKernel(p.E_int, a.kernel_offset, a.global_size, a.local_size, NULL, NULL); [/code] But nothing has changed... Is it possible that clEvent leaks memory even when not used? I am using the c++ wrapper.
Is this necessary even when not using events? I am aware that when events are used and not released a memory leak occurs... But I do not use them... This is driving me nuts... I tried enqueueing the kernel with

p.queue.enqueueNDRangeKernel(p.E_int, a.kernel_offset, a.global_size, a.local_size, NULL, NULL);


But nothing has changed...

Is it possible that clEvent leaks memory even when not used? I am using the c++ wrapper.

#4
Posted 02/08/2013 01:41 PM   
My understanding is that if you pass in NULL as the event, it should be OK. It is only when you pass in a clEvent, it allocates some (maybe not so small) memory for bookkeeping and does not release it until you tell it to explicitly.
My understanding is that if you pass in NULL as the event, it should be OK. It is only when you pass in a clEvent, it allocates some (maybe not so small) memory for bookkeeping and does not release it until you tell it to explicitly.

#5
Posted 02/08/2013 02:31 PM   
As I have posted in the nvidia linux driver forum [url]https://devtalk.nvidia.com/default/topic/529147/linux/memory-leak-in-opencl/[/url] when I run the simulation with msi enabled and thermal monitor disabled in nvidia-settings there is a tremendous reduction on the occurrence of the memory leak. When it happens it takes a longer time and sometimes there is no leak at all. It seams to me that there is a bug in nvidia driver, but as they do not care about opencl this is probably never going to be solved...
As I have posted in the nvidia linux driver forum https://devtalk.nvidia.com/default/topic/529147/linux/memory-leak-in-opencl/ when I run the simulation with msi enabled and thermal monitor disabled in nvidia-settings there is a tremendous reduction on the occurrence of the memory leak. When it happens it takes a longer time and sometimes there is no leak at all.

It seams to me that there is a bug in nvidia driver, but as they do not care about opencl this is probably never going to be solved...

#6
Posted 02/19/2013 12:43 PM   
Hey, did you make any further progress on this problem? I am facing the exact same issue. I've also posted on the [url=http://www.khronos.org/message_boards/showthread.php/9152-Does-Nvidia-API-grow-an-unbounded-command-queue-unless-clFinish%28%29-is-called]Khronos forum[/url] but so far no easy solution. I am running a similar style of simulation (neurons instead of particles). Mine also dies at around 2.5*10^8 kernel calls. I've tried it on all of the other APIs and it does not have this issue, strongly suggesting that the issue is with Nvidia rather than my code. So far, I've tried adding calls to clFinish() every 10^5 kernel calls: no change. I've just launched some sims where I actually request the cl_event from my buffer read/writes and kernel launch, even though I use blocking read/writes throughout, I then release() those events at every step. We'll see tomorrow what kind of result I get from that. One final thing to note, which I'm not sure if you experienced, is that I did not have this problem before I switched to using mapped memory buffers. I previously used normal read/write buffers and had no memory issues. But I switched to mapped buffers as they use pinned memory and I notice a serious speed improvement. But since then I run into this issue. Please let me know if you found out any alternative approaches. Thanks, Dave.
Hey, did you make any further progress on this problem? I am facing the exact same issue. I've also posted on the Khronos forum but so far no easy solution.

I am running a similar style of simulation (neurons instead of particles). Mine also dies at around 2.5*10^8 kernel calls. I've tried it on all of the other APIs and it does not have this issue, strongly suggesting that the issue is with Nvidia rather than my code.

So far, I've tried adding calls to clFinish() every 10^5 kernel calls: no change.

I've just launched some sims where I actually request the cl_event from my buffer read/writes and kernel launch, even though I use blocking read/writes throughout, I then release() those events at every step. We'll see tomorrow what kind of result I get from that.

One final thing to note, which I'm not sure if you experienced, is that I did not have this problem before I switched to using mapped memory buffers. I previously used normal read/write buffers and had no memory issues. But I switched to mapped buffers as they use pinned memory and I notice a serious speed improvement. But since then I run into this issue.

Please let me know if you found out any alternative approaches.

Thanks,
Dave.

#7
Posted 10/21/2013 11:52 PM   
Scroll To Top