Is there some kind of built in timeout period in CUDA that prevents you from calling long-running kernels?
I’ve written a function that does a large amount of processing in a loop. I compiled the function with both device and host qualifiers, so that I can test it from the cuda kernel as well as on the cpu (the only difference is that I pass a ptr to device memory vs a pointer to host memory). I’ve tested the function and it works properly, but if I increase the number of processing iterations too high, on the device version the screen goes black and the kernel fails with unknown error.
You can check if there’s a run time limit on kernels using the deviceQuery executable in the SDK.
Here’s an example for my setup:
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: “Quadro FX 1600M”
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536150016 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.55 GHz
Concurrent copy and execution: Yes Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)
Test PASSED
Press ENTER to exit…
PS: If you happen to own a GF9800GX2 (or maybe a GTX295), I believe the second GPU does not have a run time limit on kernels
There is a watchdog timer in the NVIDIA driver which prevents kernels from monopolizing the GPU for more than a fixed amount of time (5-10 seconds depending on the OS) when that GPU is also driving a display. The solution is to use a dedicated GPU for CUDA, or in the case of linux, dont run an active display on the card.
Watchdog\Display\BreakPointDelay = 3 (30 sec) (note that setting this to a higher number also has no effect)
And also note…
Run time limit on kernels: No
…but I’m STILL getting the “the launch timed out and was terminated” or “unknown error” (it randomly gives one of those two messages every time). I have not been able to get to 8 seconds. This happens usually at 6.5 - 7.5 seconds
Thanks so much for this post. I’m running Windows 7, so I tried just adding HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel = 0 to the Registry (it wasn’t already there), then immediately checking the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute on my GPU, and voila - it returned zero (NO timeout) !! I didn’t even have to reboot !!
So then I changed TdrLevel = 3 in the Registry, and checked the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute again, and sure enough, it was non-zero (Time limit re-instated).
So now I’m thinking I’ll just leave it on (TdrLevel = 3), and let my CUDA program turn it off whenever it needs to use the GPU. Great news !! Thanks again…