As I do not see an OpenCL specific forum I had to post this here… As the title says I am facing a memory leak in OpenCL that is really hard to find… I am running Monte Carlo simulations of magnetic particles and I have to do many kernels calls. Everything goes fine until the number of kernels calls gets close to 10^9. At this point the memory consumption goes from around 70 mb to more than 4 gb and the kernel kills the executable. ON the screen appears the error code -6, related to CL_OUT_OF_HOST_MEMORY. If the simulation parameters are so that the total number of calls is not this high the code runs ok, everything correct… It is a little big but I can post parts of it here if it helps. At the dmesg output appears the following message when a memory leak occurs:
Is this necessary even when not using events? I am aware that when events are used and not released a memory leak occurs… But I do not use them… This is driving me nuts… I tried enqueueing the kernel with
My understanding is that if you pass in NULL as the event, it should be OK. It is only when you pass in a clEvent, it allocates some (maybe not so small) memory for bookkeeping and does not release it until you tell it to explicitly.
As I have posted in the nvidia linux driver forum [url]Memory leak in OpenCL - Linux - NVIDIA Developer Forums when I run the simulation with msi enabled and thermal monitor disabled in nvidia-settings there is a tremendous reduction on the occurrence of the memory leak. When it happens it takes a longer time and sometimes there is no leak at all.
It seams to me that there is a bug in nvidia driver, but as they do not care about opencl this is probably never going to be solved…
Hey, did you make any further progress on this problem? I am facing the exact same issue. I’ve also posted on the Khronos forum but so far no easy solution.
I am running a similar style of simulation (neurons instead of particles). Mine also dies at around 2.5*10^8 kernel calls. I’ve tried it on all of the other APIs and it does not have this issue, strongly suggesting that the issue is with Nvidia rather than my code.
So far, I’ve tried adding calls to clFinish() every 10^5 kernel calls: no change.
I’ve just launched some sims where I actually request the cl_event from my buffer read/writes and kernel launch, even though I use blocking read/writes throughout, I then release() those events at every step. We’ll see tomorrow what kind of result I get from that.
One final thing to note, which I’m not sure if you experienced, is that I did not have this problem before I switched to using mapped memory buffers. I previously used normal read/write buffers and had no memory issues. But I switched to mapped buffers as they use pinned memory and I notice a serious speed improvement. But since then I run into this issue.
Please let me know if you found out any alternative approaches.