Multiple iteration of single Task Kernel

Hi everyone.

I am having a very weird problem. So basically I have converted my parallel kernels to one sequential kernel. I have modified host code and done cross checking of code and so on. It does work. I do clean buffer before each iteration but for some reason after a certain number of iterations the kernel fails and it throws this error: CL_INVALID_COMMAND_QUEUE. So as far as I know this means the kernel has failed but it doesn’t make any sense because it works for previous iterations and then it doesn’t for new ones.
So in order to overcome this problem I re-initialize all the OpenCL variables (command queue, context and so on) dynamically once in a while every 50 iterations and now it goes through all the iterations.
I am running the code on my NVIDIA GPU. What could it be causing this problem? I do release the buffers and re-initialize them… Also if I run it on CPU it fails randomly…

One more thing is, if I want to run it on GPU all clReleaseMemObject calls have to be changed to clRetainMemObject otherwise it would sometime throw again that error after less iterations, On CPU is the opposite, I need to switch all clRetainMemObject to clReleaseMemObject to make it work but now it’s not working.

sometimes CL_INVALID_COMMAND_QUEUE is a result of hitting the WDDM TDR timeout (google that). I have no idea if that is what you are running into or not.

Thanks for the reply. The kernel I am running doesn’t take more than 1 sec to execute so I don’t think that’s the problem. I have also had kernels running for a long time but they didn’t crash either way so I really don’t think that’s the issue. I have checked from the Nsight app the delay time and it’s 2 seconds by default so not an issue I guess.
The kernel also crashes as soon as it starts on that iteration. If I re-create all the OpenCL variables during the execution after a certain number of iterations it continues to execute so I can’t really understand the problem.

recreation of all the OpenCL variables is exactly what you would have to do to mask a WDDM TDR timeout.

invalid command queue, assuming you haven’t done something unexpected like destroying the command queue, would only come about AFAIK if the underlying compute context got corrupted somehow.

You may need to work pretty hard at reducing the test case until you can identify something critical that makes the problem appear or go away.

Ok so I have disabled the delay/timeout at all and I still get the same behavior/problem.
So it is not related to that since it does not take too long to execute.

maybe you should debug that. Presumably that has nothing to do with GPU

How can I debug the kernel? I can’t figure out the right way to do it. I have installed the Intel OpenCL SDK and I have visual studio 2015.

I have found the problem… It was a out of pointer exception but you can’t see that unless you debug (Intel OpenCL CPU debugger) and find the exact point where it fails. It was working on GPU maybe because of the type of memory allocation or something else.