I’m having an intermittent problem with kernels hanging indefinitely. My code
calls several kernels sequentially in a big loop with a few async memory copies
here and there.
for (t=0; t<10000000; t++) {
kernel_a( ..., stream[0]);
kernel_b( ..., stream[0]);
...
cuMemcpyDtoHAsync( ..., stream[0]);
kernel_g( ..., stream[1]);
cuStreamSynchronize(stream[0]);
some_cpu_work();
cuMemcpyHtoDAsync( ..., stream[0]);
cuCtxSynchronize();
}
The longest of these kernels takes around 10ms to execute. The code will run for
hours (several 100,000 kernel launches) but will eventually hang. I have the
floating point precision defined in a macro so that I can change it as needed,
#define PREC double
PREC * x;
x = (PREC*)malloc(20*sizeof(PREC));
The problem only occurs using ‘double’. Other than these hangs, the code
runs as expected and gives sensible results. If I am running the job on a GPU
with a display attached, I get error 702 when it hangs,
CUerror 702 CUDA_ERROR_LAUNCH_TIMEOUT
This indicates that the device kernel took too long to execute. This can
only occur if timeouts are enabled - see the device attribute
::CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT for more information. The
context cannot be used (and must be destroyed similar to
::CUDA_ERROR_LAUNCH_FAILED). All existing device memory allocations from
this context are invalid and must be reconstructed if the program is to
continue using CUDA.
Otherwise, if timeouts are not enabled (ie. if it is a compute only GPU) then I
get no error but the code still hangs. I’ve tested this with versions 304 and
310 of the driver, CUDA 5 and on Debian and Arch Linux on a few different
machines, all with 3GB GTX580 GPUs. The memory usage is around 700 MB.
Can anyone suggest what sort of problems can cause a kernel to hang like this?
I am completely stuck with how to trouble-shoot.