Is there any reliable way to setup a geforce card as “compute only” device, on Windows 7? All I’m trying to do is switch off the Kernel Execution Timeout property on my card. It is not being used for display, but is still producing timeouts once the kernel execution time reaches a certain limit. I am working on Windows7-64, and have tried to play with the registry TdrDelay to no avail. Is there no good way to switch off the timeout property on a geforce card? Is this feature only available on the tesla cards?
I also tried installing Tesla driver by tweaking the NVWD.info, but without success. It seems strange that the gtx480 comes with the claims of CUDA capability but in fact is quite restricted in its CUDA functinoality. Is this by NVidia’s design, or am I missing something simple? Please help.
Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by “Display stopped responding and has recoverd” message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the ‘display’ treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.
Should I change my primary display card to something non-Nvidia to avoid this issue?
Thanks for the reply. I did try this out, and, still, stretching the size of the loop inside the kernel or the total number of threads (work size) resulted in a brief blackout followed by “Display stopped responding and has recoverd” message. Otherwise, as long as I keep the loop size and number of threads within some bound, everything works just fine. Do you think this issue is related to the GeForce driver still applying the ‘display’ treatment? My current setup is: display card: GT8800, secondary (cuda) card: gtx480.
Should I change my primary display card to something non-Nvidia to avoid this issue?
Er, wait, it’s TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can’t kill an app that is running an infinite CUDA kernel if you have TDR disabled).
Er, wait, it’s TdrLevel = 0, not TdrDelay. You can set TdrDelay = 60 to get a longer timeout if you want (which is often useful because you can’t kill an app that is running an infinite CUDA kernel if you have TDR disabled).
Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn’t change anything. I’m beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I’m passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?
Yes, yes, I tried both of those. In fact TdrLevel does not exist on W7, but putting it in doesn’t change anything. I’m beginning to think there may some other problem, perhaps memory related. The problem is though, the size of arrays I’m passing to device does not change. What does change is the number of threads potentially vying for the same global memory space, and the kernel runtime. Splitting up the task into blocks and running the kernel repeatedly also causes the same issue, which is extremely baffling, since it implies that kernel runtime may not be at fault. Is there a way to manually mop up all the thread-related memory following its execution?
Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.
Today I discovered cudaThreadSynchronize(). And that did the trick. The idea is to make sure the kernels in a loop do not overlap. So at the very least, I can work around the kernel memory constraints in a reliable manner.
I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same “Stopped responding and was restarted” message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.
I have a similar problem when running Folding@Home, and having my GTX 260 crash with the same “Stopped responding and was restarted” message. I have also tried modifying (/ adding) the TdrLevel=0, and TdrDelay=60 registry DWORDs, but to no avail.