Recovering after CUDA crashes GPU
Its fairly common for the Windows display to freeze up completely when a bug in one of my CUDA kernels causes the GPU to crash. Sometimes Windows seems to be able to recover (the screen with freeze for second but it will come back with message about the display drivers failing), but more often than not the display will just hang permanently. Currently I'm using GeForce GTX 580 on Windows 7 with Driver Version: 8.17.12.7081.

Is there any way to recover from this situation without rebooting (which makes the iteration time for debugging the kernel annoyingly long). The machine is clearly still running, its just the display that has crashed. When I try and remote desktop to a box in this state I'll get a username/password window but remote login will hang when it tries to to log-in.

Is there some kind of non-GUI based remote console I can use to log and reset the display as you would on a UNIX-based system? Or is there some way to make driver more fault tolerant (some kind of slow debug mode where bad kernels are less likely to crash it).
Its fairly common for the Windows display to freeze up completely when a bug in one of my CUDA kernels causes the GPU to crash. Sometimes Windows seems to be able to recover (the screen with freeze for second but it will come back with message about the display drivers failing), but more often than not the display will just hang permanently. Currently I'm using GeForce GTX 580 on Windows 7 with Driver Version: 8.17.12.7081.



Is there any way to recover from this situation without rebooting (which makes the iteration time for debugging the kernel annoyingly long). The machine is clearly still running, its just the display that has crashed. When I try and remote desktop to a box in this state I'll get a username/password window but remote login will hang when it tries to to log-in.



Is there some kind of non-GUI based remote console I can use to log and reset the display as you would on a UNIX-based system? Or is there some way to make driver more fault tolerant (some kind of slow debug mode where bad kernels are less likely to crash it).

#1
Posted 07/27/2011 10:00 PM   
[quote name='griffin2000' date='27 July 2011 - 08:00 PM' timestamp='1311804055' post='1271386']
Its fairly common for the Windows display to freeze up completely when a bug in one of my CUDA kernels causes the GPU to crash. Sometimes Windows seems to be able to recover (the screen with freeze for second but it will come back with message about the display drivers failing), but more often than not the display will just hang permanently. Currently I'm using GeForce GTX 580 on Windows 7 with Driver Version: 8.17.12.7081.

Is there any way to recover from this situation without rebooting (which makes the iteration time for debugging the kernel annoyingly long). The machine is clearly still running, its just the display that has crashed. When I try and remote desktop to a box in this state I'll get a username/password window but remote login will hang when it tries to to log-in.

Is there some kind of non-GUI based remote console I can use to log and reset the display as you would on a UNIX-based system? Or is there some way to make driver more fault tolerant (some kind of slow debug mode where bad kernels are less likely to crash it).
[/quote]

I guess that, on Windows, the easiest way is to install a second card and debug the code on this card. Is some crashes occur, your display won't freeze.
[quote name='griffin2000' date='27 July 2011 - 08:00 PM' timestamp='1311804055' post='1271386']

Its fairly common for the Windows display to freeze up completely when a bug in one of my CUDA kernels causes the GPU to crash. Sometimes Windows seems to be able to recover (the screen with freeze for second but it will come back with message about the display drivers failing), but more often than not the display will just hang permanently. Currently I'm using GeForce GTX 580 on Windows 7 with Driver Version: 8.17.12.7081.



Is there any way to recover from this situation without rebooting (which makes the iteration time for debugging the kernel annoyingly long). The machine is clearly still running, its just the display that has crashed. When I try and remote desktop to a box in this state I'll get a username/password window but remote login will hang when it tries to to log-in.



Is there some kind of non-GUI based remote console I can use to log and reset the display as you would on a UNIX-based system? Or is there some way to make driver more fault tolerant (some kind of slow debug mode where bad kernels are less likely to crash it).





I guess that, on Windows, the easiest way is to install a second card and debug the code on this card. Is some crashes occur, your display won't freeze.

Anyday, anytime.

TLuisRS
Thales Luis Rodrigues Sabino
tluisrs@gmail.com

#2
Posted 07/28/2011 01:55 AM   
Hi, I have the same problem. Sometimes the screen freezes permanently, but once "Blue Screen" showed and restarted computer. I have only one GTX 580 and I can debug it remotely. But this problem is very annoying. Isn't there any way to force windows (Windows Server 2008) not to use GPU for screen? I mean, theoretically OS can use only CPU for its purposes. Then, if GPU crashes, it won't freeze screen. Does anyone know the solution of this problem? (Except buying second Graphics Card which costs $500)
Hi, I have the same problem. Sometimes the screen freezes permanently, but once "Blue Screen" showed and restarted computer. I have only one GTX 580 and I can debug it remotely. But this problem is very annoying. Isn't there any way to force windows (Windows Server 2008) not to use GPU for screen? I mean, theoretically OS can use only CPU for its purposes. Then, if GPU crashes, it won't freeze screen. Does anyone know the solution of this problem? (Except buying second Graphics Card which costs $500)

#3
Posted 11/16/2011 10:18 AM   
Same problem, just the occasional random freeze and have to reboot..annoying.
Same problem, just the occasional random freeze and have to reboot..annoying.

#4
Posted 11/17/2011 03:36 AM   
[quote name='Tsotne' date='16 November 2011 - 05:18 AM' timestamp='1321438687' post='1327505']...Does anyone know the solution of this problem? (Except buying second Graphics Card which costs $500)[/quote]
Your graphics card doesn't have to be anywhere near as powerful as your compute card.

Just a thought...
[quote name='Tsotne' date='16 November 2011 - 05:18 AM' timestamp='1321438687' post='1327505']...Does anyone know the solution of this problem? (Except buying second Graphics Card which costs $500)

Your graphics card doesn't have to be anywhere near as powerful as your compute card.



Just a thought...

Intel Siler DX79SI Desktop Extreme | Intel Core i7-3820 Sandy Bridge-Extreme | DangerDen M6 and Koolance MVR-40s w/Black Ice Stealths | 32 GB Mushkin PC3-12800LV | NVIDIA GTX 660 Ti SLI | PNY GTX 470 | 24 GB RAMDisk (C:\Temp\Temp) | 120 GB Intel Cherryville SSDs (OS and UserData)| 530 GB Western Digital VelociRaptor SATA 2 RAID0 (C:\Games\) | 60 GB G2 SSDs (XP Pro and Linux) | 3 TB Western Digital USB-3 MyBook (Archive) | LG BP40NS20 USB ODD | LG IPS236 Monitor | LogiTech X-530 Speakers | Plantronics GameCom 780 Headphones | Cooler Master UCP 1100 | Cooler Master HAF XB | Windows 7 Pro x64 SP1

Stock is Extreme now

#5
Posted 11/17/2011 08:21 AM   
Why do you not use debuggers?
Why do you not use debuggers?

#6
Posted 11/17/2011 09:45 AM   
[quote name='jaafaman' date='17 November 2011 - 08:21 AM' timestamp='1321518086' post='1328104']
Your graphics card doesn't have to be anywhere near as powerful as your compute card.

Just a thought...
[/quote]
I didn't get it.
When I say, buying second video card, it means I have to buy the same card as I already have, right? Doesn't Nvidia SLI need both cards to be same? or, is it possible to have another video card, which is not as good as GTX 580? Can they work together? I mean, just to run OS on one card and run computation on GTX 580.
[quote name='jaafaman' date='17 November 2011 - 08:21 AM' timestamp='1321518086' post='1328104']

Your graphics card doesn't have to be anywhere near as powerful as your compute card.



Just a thought...



I didn't get it.

When I say, buying second video card, it means I have to buy the same card as I already have, right? Doesn't Nvidia SLI need both cards to be same? or, is it possible to have another video card, which is not as good as GTX 580? Can they work together? I mean, just to run OS on one card and run computation on GTX 580.

#7
Posted 11/17/2011 06:42 PM   
SLI configuration is by no means necessary to support CUDA operations. It's intended to provide increases in graphics processing power for the desktop.

The WDDM used by both Vista and Win7 allows discrete, independent cards of the same brand to operate as long as they use the same driver, and cards with drivers from seperate brands are OK as well regardless. You could get a cheap, $40 GT 240 to handle your graphics for the most part, working your way up the line all the way to the GTX 580 until you hit as strong a card as you need and be just fine.

The GTX 580 you have can be moved to the secondary slot and still be used for display support on another monitor when not dedicated to GPGPU, and it'll help ease the pain of recovery if there's no monitor attached while it is computing and crashing during debug.

Shoot, some folks even use on-board graphics to display their output if the demand's not too high...
SLI configuration is by no means necessary to support CUDA operations. It's intended to provide increases in graphics processing power for the desktop.



The WDDM used by both Vista and Win7 allows discrete, independent cards of the same brand to operate as long as they use the same driver, and cards with drivers from seperate brands are OK as well regardless. You could get a cheap, $40 GT 240 to handle your graphics for the most part, working your way up the line all the way to the GTX 580 until you hit as strong a card as you need and be just fine.



The GTX 580 you have can be moved to the secondary slot and still be used for display support on another monitor when not dedicated to GPGPU, and it'll help ease the pain of recovery if there's no monitor attached while it is computing and crashing during debug.



Shoot, some folks even use on-board graphics to display their output if the demand's not too high...

Intel Siler DX79SI Desktop Extreme | Intel Core i7-3820 Sandy Bridge-Extreme | DangerDen M6 and Koolance MVR-40s w/Black Ice Stealths | 32 GB Mushkin PC3-12800LV | NVIDIA GTX 660 Ti SLI | PNY GTX 470 | 24 GB RAMDisk (C:\Temp\Temp) | 120 GB Intel Cherryville SSDs (OS and UserData)| 530 GB Western Digital VelociRaptor SATA 2 RAID0 (C:\Games\) | 60 GB G2 SSDs (XP Pro and Linux) | 3 TB Western Digital USB-3 MyBook (Archive) | LG BP40NS20 USB ODD | LG IPS236 Monitor | LogiTech X-530 Speakers | Plantronics GameCom 780 Headphones | Cooler Master UCP 1100 | Cooler Master HAF XB | Windows 7 Pro x64 SP1

Stock is Extreme now

#8
Posted 11/17/2011 09:58 PM   
[quote name='jaafaman' date='17 November 2011 - 09:58 PM' timestamp='1321567086' post='1328484']
SLI configuration is by no means necessary to support CUDA operations. It's intended to provide increases in graphics processing power for the desktop.

The WDDM used by both Vista and Win7 allows discrete, independent cards of the same brand to operate as long as they use the same driver, and cards with drivers from seperate brands are OK as well regardless. You could get a cheap, $40 GT 240 to handle your graphics for the most part, working your way up the line all the way to the GTX 580 until you hit as strong a card as you need and be just fine.

The GTX 580 you have can be moved to the secondary slot and still be used for display support on another monitor when not dedicated to GPGPU, and it'll help ease the pain of recovery if there's no monitor attached while it is computing and crashing during debug.

Shoot, some folks even use on-board graphics to display their output if the demand's not too high...
[/quote]

Though it seems like that would be a pain if you need switch back to your main card for gfx intensive activities. It seems like there should be some way to remotely log into a windows box after the GFX card has crashed and restart it. Or a way of running the driver in fault tolerance (but slow) mode that would safely recover from errors.
[quote name='jaafaman' date='17 November 2011 - 09:58 PM' timestamp='1321567086' post='1328484']

SLI configuration is by no means necessary to support CUDA operations. It's intended to provide increases in graphics processing power for the desktop.



The WDDM used by both Vista and Win7 allows discrete, independent cards of the same brand to operate as long as they use the same driver, and cards with drivers from seperate brands are OK as well regardless. You could get a cheap, $40 GT 240 to handle your graphics for the most part, working your way up the line all the way to the GTX 580 until you hit as strong a card as you need and be just fine.



The GTX 580 you have can be moved to the secondary slot and still be used for display support on another monitor when not dedicated to GPGPU, and it'll help ease the pain of recovery if there's no monitor attached while it is computing and crashing during debug.



Shoot, some folks even use on-board graphics to display their output if the demand's not too high...





Though it seems like that would be a pain if you need switch back to your main card for gfx intensive activities. It seems like there should be some way to remotely log into a windows box after the GFX card has crashed and restart it. Or a way of running the driver in fault tolerance (but slow) mode that would safely recover from errors.

#9
Posted 11/17/2011 10:05 PM   
Granted, it's far from being the most elegant of solutions and barely serves as stop-gap. But the answer was merely against the concern for a high-priced card to support independent video rather than the ability to serve as a solution to a lack of remote capabilities should things go that way...

Granted, it's far from being the most elegant of solutions and barely serves as stop-gap. But the answer was merely against the concern for a high-priced card to support independent video rather than the ability to serve as a solution to a lack of remote capabilities should things go that way...


Intel Siler DX79SI Desktop Extreme | Intel Core i7-3820 Sandy Bridge-Extreme | DangerDen M6 and Koolance MVR-40s w/Black Ice Stealths | 32 GB Mushkin PC3-12800LV | NVIDIA GTX 660 Ti SLI | PNY GTX 470 | 24 GB RAMDisk (C:\Temp\Temp) | 120 GB Intel Cherryville SSDs (OS and UserData)| 530 GB Western Digital VelociRaptor SATA 2 RAID0 (C:\Games\) | 60 GB G2 SSDs (XP Pro and Linux) | 3 TB Western Digital USB-3 MyBook (Archive) | LG BP40NS20 USB ODD | LG IPS236 Monitor | LogiTech X-530 Speakers | Plantronics GameCom 780 Headphones | Cooler Master UCP 1100 | Cooler Master HAF XB | Windows 7 Pro x64 SP1

Stock is Extreme now

#10
Posted 11/18/2011 07:11 AM   
[quote name='jaafaman' date='17 November 2011 - 09:58 PM' timestamp='1321567086' post='1328484']
SLI configuration is by no means necessary to support CUDA operations. It's intended to provide increases in graphics processing power for the desktop.

The WDDM used by both Vista and Win7 allows discrete, independent cards of the same brand to operate as long as they use the same driver, and cards with drivers from seperate brands are OK as well regardless. You could get a cheap, $40 GT 240 to handle your graphics for the most part, working your way up the line all the way to the GTX 580 until you hit as strong a card as you need and be just fine.

The GTX 580 you have can be moved to the secondary slot and still be used for display support on another monitor when not dedicated to GPGPU, and it'll help ease the pain of recovery if there's no monitor attached while it is computing and crashing during debug.

Shoot, some folks even use on-board graphics to display their output if the demand's not too high...
[/quote]
Thanks for your reply! I have old 8800GT Video Card, so I'll test your solution. I was said that only the same Nvidia cards can be put together :(
Thank you once again :)
[quote name='jaafaman' date='17 November 2011 - 09:58 PM' timestamp='1321567086' post='1328484']

SLI configuration is by no means necessary to support CUDA operations. It's intended to provide increases in graphics processing power for the desktop.



The WDDM used by both Vista and Win7 allows discrete, independent cards of the same brand to operate as long as they use the same driver, and cards with drivers from seperate brands are OK as well regardless. You could get a cheap, $40 GT 240 to handle your graphics for the most part, working your way up the line all the way to the GTX 580 until you hit as strong a card as you need and be just fine.



The GTX 580 you have can be moved to the secondary slot and still be used for display support on another monitor when not dedicated to GPGPU, and it'll help ease the pain of recovery if there's no monitor attached while it is computing and crashing during debug.



Shoot, some folks even use on-board graphics to display their output if the demand's not too high...



Thanks for your reply! I have old 8800GT Video Card, so I'll test your solution. I was said that only the same Nvidia cards can be put together :(

Thank you once again :)

#11
Posted 11/18/2011 09:04 AM   
Scroll To Top