CUDA Timeout?
Is there some kind of built in timeout period in CUDA that prevents you from calling long-running kernels?

I've written a function that does a large amount of processing in a loop. I compiled the function with both __device__ and __host__ qualifiers, so that I can test it from the cuda kernel as well as on the cpu (the only difference is that I pass a ptr to device memory vs a pointer to host memory). I've tested the function and it works properly, but if I increase the number of processing iterations too high, on the device version the screen goes black and the kernel fails with unknown error.
Is there some kind of built in timeout period in CUDA that prevents you from calling long-running kernels?



I've written a function that does a large amount of processing in a loop. I compiled the function with both __device__ and __host__ qualifiers, so that I can test it from the cuda kernel as well as on the cpu (the only difference is that I pass a ptr to device memory vs a pointer to host memory). I've tested the function and it works properly, but if I increase the number of processing iterations too high, on the device version the screen goes black and the kernel fails with unknown error.

#1
Posted 06/27/2009 09:13 PM   
You can check if there's a run time limit on kernels using the deviceQuery executable in the SDK.
Here's an example for my setup:

CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: "Quadro FX 1600M"
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536150016 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.55 GHz
Concurrent copy and execution: Yes
[b]Run time limit on kernels: Yes[/b]
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit...


PS: If you happen to own a GF9800GX2 (or maybe a GTX295), I believe the second GPU does not have a run time limit on kernels

N.
You can check if there's a run time limit on kernels using the deviceQuery executable in the SDK.

Here's an example for my setup:



CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA



Device 0: "Quadro FX 1600M"

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 1

Total amount of global memory: 536150016 bytes

Number of multiprocessors: 4

Number of cores: 32

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 0.55 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: No

Compute mode: Default (multiple host threads can use this device simultaneously)



Test PASSED



Press ENTER to exit...





PS: If you happen to own a GF9800GX2 (or maybe a GTX295), I believe the second GPU does not have a run time limit on kernels



N.

#2
Posted 06/27/2009 09:36 PM   
There is a watchdog timer in the NVIDIA driver which prevents kernels from monopolizing the GPU for more than a fixed amount of time (5-10 seconds depending on the OS) when that GPU is also driving a display. The solution is to use a dedicated GPU for CUDA, or in the case of linux, dont run an active display on the card.
There is a watchdog timer in the NVIDIA driver which prevents kernels from monopolizing the GPU for more than a fixed amount of time (5-10 seconds depending on the OS) when that GPU is also driving a display. The solution is to use a dedicated GPU for CUDA, or in the case of linux, dont run an active display on the card.

#3
Posted 06/27/2009 09:38 PM   
[quote name='avidday' post='558639' date='Jun 27 2009, 05:38 PM']There is a watchdog timer in the NVIDIA driver which prevents kernels from monopolizing the GPU for more than a fixed amount of time (5-10 seconds depending on the OS) when that GPU is also driving a display. The solution is to use a dedicated GPU for CUDA, or in the case of linux, dont run an active display on the card.[/quote]

I checked the file that Nico said, and it says I do [b]not[/b] have a runtime limit on kernels.

After poking around a bit, I found this:

"Disabling the Watchdog Timer While Testing Display Drivers"
[url="http://msdn.microsoft.com/en-us/library/ms797877.aspx"]http://msdn.microsoft.com/en-us/library/ms797877.aspx[/url]

I tried both registry keys but I'm still getting an error saying the kernel timed out
[quote name='avidday' post='558639' date='Jun 27 2009, 05:38 PM']There is a watchdog timer in the NVIDIA driver which prevents kernels from monopolizing the GPU for more than a fixed amount of time (5-10 seconds depending on the OS) when that GPU is also driving a display. The solution is to use a dedicated GPU for CUDA, or in the case of linux, dont run an active display on the card.



I checked the file that Nico said, and it says I do not have a runtime limit on kernels.



After poking around a bit, I found this:



"Disabling the Watchdog Timer While Testing Display Drivers"

http://msdn.microsoft.com/en-us/library/ms797877.aspx



I tried both registry keys but I'm still getting an error saying the kernel timed out

#4
Posted 06/28/2009 05:40 AM   
If you're using Vista, disable TDR.

[url="http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx"]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url]
If you're using Vista, disable TDR.



http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx

#5
Posted 06/28/2009 05:44 AM   
[quote name='tmurray' post='558750' date='Jun 28 2009, 01:44 AM']If you're using Vista, disable TDR.

[url="http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx"]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url][/quote]

I am using Vista. Thanks for pointing out this one.

I found ANOTHER timeout as well...for DirectDraw framelocked buffer
[url="http://technet.microsoft.com/en-us/library/cc976062.aspx"]http://technet.microsoft.com/en-us/library/cc976062.aspx[/url]

[url="http://msdn.microsoft.com/en-us/library/ms797877.aspx"]http://msdn.microsoft.com/en-us/library/ms797877.aspx[/url]
[url="http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx"]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url]

So currently all they keys I have set are:

GraphicsDrivers\TdrDelay = 16 sec
GraphicsDrivers\TdrDdiDelay = 16 sec
GraphicsDrivers\DCI\Timeout = 15 sec
Watchdog\Display\BreakPointDelay = 3 (30 sec) (note that setting this to a higher number also has no effect)

And also note..
Run time limit on kernels: No

..but I'm STILL getting the "the launch timed out and was terminated" or "unknown error" (it randomly gives one of those two messages every time). I have not been able to get to 8 seconds. This happens usually at 6.5 - 7.5 seconds
[quote name='tmurray' post='558750' date='Jun 28 2009, 01:44 AM']If you're using Vista, disable TDR.



http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx



I am using Vista. Thanks for pointing out this one.



I found ANOTHER timeout as well...for DirectDraw framelocked buffer

http://technet.microsoft.com/en-us/library/cc976062.aspx



http://msdn.microsoft.com/en-us/library/ms797877.aspx

http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx



So currently all they keys I have set are:



GraphicsDrivers\TdrDelay = 16 sec

GraphicsDrivers\TdrDdiDelay = 16 sec

GraphicsDrivers\DCI\Timeout = 15 sec

Watchdog\Display\BreakPointDelay = 3 (30 sec) (note that setting this to a higher number also has no effect)



And also note..

Run time limit on kernels: No



..but I'm STILL getting the "the launch timed out and was terminated" or "unknown error" (it randomly gives one of those two messages every time). I have not been able to get to 8 seconds. This happens usually at 6.5 - 7.5 seconds

#6
Posted 06/28/2009 01:40 PM   
Does anybody know how to achieve this under OS X?

Thanks.

nodag
Does anybody know how to achieve this under OS X?



Thanks.



nodag

#7
Posted 12/18/2009 09:23 AM   
[quote name='yahastu' date='28 June 2009 - 07:40 AM' timestamp='1246196456' post='558856']

..[intro redacted for brevity]..

So currently all they keys I have set are:

GraphicsDrivers\TdrDelay = 16 sec
GraphicsDrivers\TdrDdiDelay = 16 sec
GraphicsDrivers\DCI\Timeout = 15 sec
Watchdog\Display\BreakPointDelay = 3 (30 sec) (note that setting this to a higher number also has no effect)

And also note..
Run time limit on kernels: No

..but I'm STILL getting the "the launch timed out and was terminated" or "unknown error" (it randomly gives one of those two messages every time). I have not been able to get to 8 seconds. This happens usually at 6.5 - 7.5 seconds
[/quote]

Thanks so much for this post. I'm running Windows 7, so I tried just adding HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel = 0 to the Registry (it wasn't already there), then immediately checking the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute on my GPU, and voila - it returned zero (NO timeout) !! I didn't even have to reboot !!

So then I changed TdrLevel = 3 in the Registry, and checked the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute again, and sure enough, it was non-zero (Time limit re-instated).

So now I'm thinking I'll just leave it on (TdrLevel = 3), and let my CUDA program turn it off whenever it needs to use the GPU. Great news !! Thanks again..
[quote name='yahastu' date='28 June 2009 - 07:40 AM' timestamp='1246196456' post='558856']



..[intro redacted for brevity]..



So currently all they keys I have set are:



GraphicsDrivers\TdrDelay = 16 sec

GraphicsDrivers\TdrDdiDelay = 16 sec

GraphicsDrivers\DCI\Timeout = 15 sec

Watchdog\Display\BreakPointDelay = 3 (30 sec) (note that setting this to a higher number also has no effect)



And also note..

Run time limit on kernels: No



..but I'm STILL getting the "the launch timed out and was terminated" or "unknown error" (it randomly gives one of those two messages every time). I have not been able to get to 8 seconds. This happens usually at 6.5 - 7.5 seconds





Thanks so much for this post. I'm running Windows 7, so I tried just adding HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel = 0 to the Registry (it wasn't already there), then immediately checking the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute on my GPU, and voila - it returned zero (NO timeout) !! I didn't even have to reboot !!



So then I changed TdrLevel = 3 in the Registry, and checked the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute again, and sure enough, it was non-zero (Time limit re-instated).



So now I'm thinking I'll just leave it on (TdrLevel = 3), and let my CUDA program turn it off whenever it needs to use the GPU. Great news !! Thanks again..

#8
Posted 12/19/2011 09:42 AM   
Scroll To Top