CUDA Kernel Execution Timeout on GeForce Trying to turn off the Kernel Timeout on gtx480 for compute

Hello,

Is there any reliable way to setup a geforce card as “compute only” device, on Windows 7? All I’m trying to do is switch off the Kernel Execution Timeout property on my card. It is not being used for display, but is still producing timeouts once the kernel execution time reaches a certain limit. I am working on Windows7-64, and have tried to play with the registry TdrDelay to no avail. Is there no good way to switch off the timeout property on a geforce card? Is this feature only available on the tesla cards?

I also tried installing Tesla driver by tweaking the NVWD.info, but without success. It seems strange that the gtx480 comes with the claims of CUDA capability but in fact is quite restricted in its CUDA functinoality. Is this by NVidia’s design, or am I missing something simple? Please help.

Thanks in advance, Joe

TdrDelay = 0 disables all timeouts on Win7/Vista.

TdrDelay = 0 disables all timeouts on Win7/Vista.

Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by “Display stopped responding and has recoverd” message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the ‘display’ treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.

Should I change my primary display card to something non-Nvidia to avoid this issue?

Thanks again

Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by “Display stopped responding and has recoverd” message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the ‘display’ treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.

Should I change my primary display card to something non-Nvidia to avoid this issue?

Thanks again

Er, wait, it’s TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can’t kill an app that is running an infinite CUDA kernel if you have TDR disabled).

Er, wait, it’s TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can’t kill an app that is running an infinite CUDA kernel if you have TDR disabled).

Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn’t change anything. I’m beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I’m passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?

Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn’t change anything. I’m beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I’m passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?

Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.

Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.

I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same “Stopped responding and was restarted” message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.

I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same “Stopped responding and was restarted” message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.

I can not find that Key on my W7 system. External Media can anyone help me? W7 professional version. I would like to play with the timeout a bit

TIA!

I can not find that Key on my W7 system. External Media can anyone help me? W7 professional version. I would like to play with the timeout a bit

TIA!

Yes, basically you need to create two keys/words using regedit. Firstly, the TdrDelay, set to 10, or 60, or whatever the desired timeout:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]

“TdrDelay”=dword:00000060

This sets TdrDelay to 60s, as opposed to 2s.

Secondly, Timeout has to be created and set in concert with TdrDelay:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\DCI]

“Timeout”=dword:00000060

Having created both of these words in the specified paths, you need to reboot the machine. The timout issue should now be fixed.

As an alternative, synchronizing threads can alleviate this issue. Reduction in kernel size and execution time can also do the trick.

Yes, basically you need to create two keys/words using regedit. Firstly, the TdrDelay, set to 10, or 60, or whatever the desired timeout:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]

“TdrDelay”=dword:00000060

This sets TdrDelay to 60s, as opposed to 2s.

Secondly, Timeout has to be created and set in concert with TdrDelay:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\DCI]

“Timeout”=dword:00000060

Having created both of these words in the specified paths, you need to reboot the machine. The timout issue should now be fixed.

As an alternative, synchronizing threads can alleviate this issue. Reduction in kernel size and execution time can also do the trick.