CUDA Kernel Execution Timeout on GeForce Trying to turn off the Kernel Timeout on gtx480 for compute
  1 / 2    
Hello,

Is there any reliable way to setup a geforce card as "compute only" device, on Windows 7? All I'm trying to do is switch off the Kernel Execution Timeout property on my card. It is not being used for display, but is still producing timeouts once the kernel execution time reaches a certain limit. I am working on Windows7-64, and have tried to play with the registry TdrDelay to no avail. Is there no good way to switch off the timeout property on a geforce card? Is this feature only available on the tesla cards?

I also tried installing Tesla driver by tweaking the NVWD.info, but without success. It seems strange that the gtx480 comes with the claims of CUDA capability but in fact is quite restricted in its CUDA functinoality. Is this by NVidia's design, or am I missing something simple? Please help.

Thanks in advance, Joe
Hello,



Is there any reliable way to setup a geforce card as "compute only" device, on Windows 7? All I'm trying to do is switch off the Kernel Execution Timeout property on my card. It is not being used for display, but is still producing timeouts once the kernel execution time reaches a certain limit. I am working on Windows7-64, and have tried to play with the registry TdrDelay to no avail. Is there no good way to switch off the timeout property on a geforce card? Is this feature only available on the tesla cards?



I also tried installing Tesla driver by tweaking the NVWD.info, but without success. It seems strange that the gtx480 comes with the claims of CUDA capability but in fact is quite restricted in its CUDA functinoality. Is this by NVidia's design, or am I missing something simple? Please help.



Thanks in advance, Joe

#1
Posted 08/07/2010 07:54 PM   
Hello,

Is there any reliable way to setup a geforce card as "compute only" device, on Windows 7? All I'm trying to do is switch off the Kernel Execution Timeout property on my card. It is not being used for display, but is still producing timeouts once the kernel execution time reaches a certain limit. I am working on Windows7-64, and have tried to play with the registry TdrDelay to no avail. Is there no good way to switch off the timeout property on a geforce card? Is this feature only available on the tesla cards?

I also tried installing Tesla driver by tweaking the NVWD.info, but without success. It seems strange that the gtx480 comes with the claims of CUDA capability but in fact is quite restricted in its CUDA functinoality. Is this by NVidia's design, or am I missing something simple? Please help.

Thanks in advance, Joe
Hello,



Is there any reliable way to setup a geforce card as "compute only" device, on Windows 7? All I'm trying to do is switch off the Kernel Execution Timeout property on my card. It is not being used for display, but is still producing timeouts once the kernel execution time reaches a certain limit. I am working on Windows7-64, and have tried to play with the registry TdrDelay to no avail. Is there no good way to switch off the timeout property on a geforce card? Is this feature only available on the tesla cards?



I also tried installing Tesla driver by tweaking the NVWD.info, but without success. It seems strange that the gtx480 comes with the claims of CUDA capability but in fact is quite restricted in its CUDA functinoality. Is this by NVidia's design, or am I missing something simple? Please help.



Thanks in advance, Joe

#2
Posted 08/07/2010 07:54 PM   
TdrDelay = 0 disables all timeouts on Win7/Vista.
TdrDelay = 0 disables all timeouts on Win7/Vista.

#3
Posted 08/07/2010 09:35 PM   
TdrDelay = 0 disables all timeouts on Win7/Vista.
TdrDelay = 0 disables all timeouts on Win7/Vista.

#4
Posted 08/07/2010 09:35 PM   
[quote name='tmurray' post='1100573' date='Aug 7 2010, 10:35 PM']TdrDelay = 0 disables all timeouts on Win7/Vista.[/quote]

Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by "Display stopped responding and has recoverd" message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the 'display' treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.
Should I change my primary display card to something non-Nvidia to avoid this issue?

Thanks again
[quote name='tmurray' post='1100573' date='Aug 7 2010, 10:35 PM']TdrDelay = 0 disables all timeouts on Win7/Vista.



Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by "Display stopped responding and has recoverd" message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the 'display' treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.

Should I change my primary display card to something non-Nvidia to avoid this issue?



Thanks again

#5
Posted 08/08/2010 01:45 PM   
[quote name='tmurray' post='1100573' date='Aug 7 2010, 10:35 PM']TdrDelay = 0 disables all timeouts on Win7/Vista.[/quote]

Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by "Display stopped responding and has recoverd" message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the 'display' treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.
Should I change my primary display card to something non-Nvidia to avoid this issue?

Thanks again
[quote name='tmurray' post='1100573' date='Aug 7 2010, 10:35 PM']TdrDelay = 0 disables all timeouts on Win7/Vista.



Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by "Display stopped responding and has recoverd" message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the 'display' treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.

Should I change my primary display card to something non-Nvidia to avoid this issue?



Thanks again

#6
Posted 08/08/2010 01:45 PM   
Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).
Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).

#7
Posted 08/08/2010 06:06 PM   
Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).
Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).

#8
Posted 08/08/2010 06:06 PM   
[quote name='tmurray' post='1100929' date='Aug 8 2010, 07:06 PM']Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).[/quote]

Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?
[quote name='tmurray' post='1100929' date='Aug 8 2010, 07:06 PM']Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).



Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?

#9
Posted 08/08/2010 11:13 PM   
[quote name='tmurray' post='1100929' date='Aug 8 2010, 07:06 PM']Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).[/quote]

Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?
[quote name='tmurray' post='1100929' date='Aug 8 2010, 07:06 PM']Er, wait, it's TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can't kill an app that is running an infinite CUDA kernel if you have TDR disabled).



Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?

#10
Posted 08/08/2010 11:13 PM   
[quote name='Joe Fatmama' post='1101084' date='Aug 9 2010, 12:13 AM']Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?[/quote]


Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.
[quote name='Joe Fatmama' post='1101084' date='Aug 9 2010, 12:13 AM']Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?





Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.

#11
Posted 08/11/2010 12:00 AM   
[quote name='Joe Fatmama' post='1101084' date='Aug 9 2010, 12:13 AM']Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?[/quote]


Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.
[quote name='Joe Fatmama' post='1101084' date='Aug 9 2010, 12:13 AM']Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn't change anything. I'm beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I'm passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?





Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.

#12
Posted 08/11/2010 12:00 AM   
[quote name='Joe Fatmama' post='1102230' date='Aug 10 2010, 05:00 PM']Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.[/quote]

I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same "Stopped responding and was restarted" message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.
[quote name='Joe Fatmama' post='1102230' date='Aug 10 2010, 05:00 PM']Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.



I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same "Stopped responding and was restarted" message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.

#13
Posted 08/26/2010 09:11 PM   
[quote name='Joe Fatmama' post='1102230' date='Aug 10 2010, 05:00 PM']Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.[/quote]

I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same "Stopped responding and was restarted" message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.
[quote name='Joe Fatmama' post='1102230' date='Aug 10 2010, 05:00 PM']Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.



I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same "Stopped responding and was restarted" message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.

#14
Posted 08/26/2010 09:11 PM   
[quote name='Razgriz' post='1109219' date='Aug 26 2010, 05:11 PM']I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same "Stopped responding and was restarted" message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.[/quote]

I can not find that Key on my W7 system. /confused.gif' class='bbc_emoticon' alt=':confused:' /> can anyone help me? W7 professional version. I would like to play with the timeout a bit
TIA!
[quote name='Razgriz' post='1109219' date='Aug 26 2010, 05:11 PM']I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same "Stopped responding and was restarted" message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.



I can not find that Key on my W7 system. /confused.gif' class='bbc_emoticon' alt=':confused:' /> can anyone help me? W7 professional version. I would like to play with the timeout a bit

TIA!

#15
Posted 10/07/2010 02:37 AM   
  1 / 2    
Scroll To Top