DIGITS: Machine restarts when I use AlexNet or LeNet

I am new to deep learning and been reading quite a few papers recently (Sorry if I posted it in the wrong section/forum). I have installed Cuda, cuDNN, DIGITS, etc. To get used to DIGITS interface I run googleNet for a classification problem. Everything is okay. But when I use LeNet or AlexNet the entire system restarts. What I mean restart is everything went blank back to DOS and the machine starts up again. I suspect the problem is because AlexNet and LeNet are deep networks and require more memory? is that right? can someone help me to diagnose this problem please. Or maybe I didn’t install some required software correctly?

By the way I’m on Linux 16.04 (again I’m quite new to this OS), Cuda 9 and cuDNN 7, Nvidia driver 384.98

I really appreciate your help.

• 32GB 2133MHz PC4-2133 ECC Registered DDR4 Memory (2 x 16GB)
• 240GB 2.5" SOLID STATE DRIVE
• 2TB Serial ATA 3 Hard Drive
• Serial ATA Multi-format DVD Writer/CD-RW Drive - Black
• Cyberlink Media Suite 10 (pre-installed)
• Integrated Intel Gigabit Ethernet Adapter
• NVIDIA GeForce GTX1080 8GB PCI Express Graphics with DVI, HDMI and 3x DP
• Integrated High Definition audio
• Viglen Soft Touch 105-key USB Windows 8 Keyboard - Black
• Viglen Optical Wheel Mouse (USB, black)
• Iiyama ProLite B2483HS 24’’ Multimedia LED Display with DVI, HDMI + H’Adj
• 24’’ LED Display panel
• 1,920 x 1,080 resolution
• Built in Height Adjust stand
• Integrated 2x 2W stereo speakers
• Connections: VGA, DVI, HDMI
• Contrast: 1,000:1 typical
• Brightness: 300cd/m2
• Response time: 2ms
• 100mm VESA mount compliant
• Dimensions (WxHxD): 565 x 390 x 520mm
• 600 Watt 80 PLUS Power Supply

Thanks,
eric

A machine rebooting spontaneously (without proper operating system shutdown) under load is typically an indication of insufficient power supply. This causes supply voltage to electronic components to drop under high current draw, triggering a reset of the hardware. Such a reset is a feature of the hardware.

I am a bit mystified because on paper, your power supply (600W 80PLUS) seems adequate assuming a single CPU plus a single GTX 1080. 100W for the (unspecified) CPU + 180W for the GTX 1080 + 30W for the rest of the system ~= 310W. Generally speaking, size the PSU so that the nominal wattage of all system components is less than 60% of the watt rating of the PSU (in your case, that would result in a limit of 360W).

The GTX 1080 appears to have an 8-pin PCIe power connector. Is the corresponding cable plugged in properly? Make sure no 6-pin to 8-pin converter is being used in that cabling.

Hi njuffa,

Thanks for your reply. How do I check that though? I bought the PC and everything had been set up (hardware). I only did the OS and deep learning stuff.

When you said ‘Make sure no 6-pin to 8-pin converter is being used in that cabling.’ it is possible that there was an error when they ensemble/installed the hardware on my PC?

Or can it be any of my hardware drivers were not installed properly?

Thanks,
eric

Software issues could cause all kind of issues but not the sudden reboot of a system. The symptoms you describe are entirely consistent with a power supply issue, although remote diagnosis of computers is about as reliable as remote diagnosis by a car mechanic or medical doctor, i.e. no guarantees.

By all means try the latest driver, but I have never seen a GPU driver take down the entire system (well, not in recent years). Did the system have this problem right out of the box, or is this something that has started to happen recently? If the latter, did you install OS fixes for the Spectre/Meltdown security issues recently? I seem to recall reports of system instability on some systems because of such fixes (presumably because they apply microcode updates, i.e. are basically performing brain surgery on the CPU).

Did you buy the PC from a reputable integrator? If you are able to open the PC case without voiding your warranty, open it up and look at the PCIe auxiliary power cable that runs from the PSU to an edge of the GPU. Also, take a closer look at the PSU and note make and model, then compare that to specifications you can find online. If you can’t identify the PSU, GPU, or cable, take the system back to the vendor or have some other qualified local person look at it. If you reside in a country with a high prevalence of counterfeit electronic products in the market, do likewise (counterfeit PSUs are a thing from what I read; probably because copper is fairly expensive).

PCIe devices can only draw up to 75W through the PCIe slot. NVIDIA GPUs typically play it safe and don’t draw more than 40W to 50W through that interface. Additional power is provided by means of auxiliary PCIe power connectors, either 6-pin (rated for 75W) or 8-pin (rated for 150W). I haven’t actually seen a GTX 1080 up close, but from what I can find on the internet it seems to use a single 8-pin power connector. This caps total power at 75W + 150 W = 225W, while the GTX 1080 is limited to 180W per NVIDIA’s specifications, so well within the cap.

Sometimes (though it should be rare these days) lower-end PSUs do not offer an 8-pin PCIe power connector, and instead offer only two 6-pin ones. One can buy 6-pin to 8-pin converters to mechanically solve the connectivity problem for a GPU that requires an 8-pin connector in that case. I have done this myself a few times. However, this creates the imminent risk overloading the portion of the PSU that drives that 6-pin output, and is absolutely to be avoided for robust operation.

Hi njuffa,

Thanks for your reply again.

  1. ‘is this something that has started to happen recently?’ I got this PC late last year and started the deep learning and OS installation in DEC 2017. After all setups I started doing some deep learning stuff and whenever I use Alexnet or LeNet network the entire system shut down without any warning.

  2. ‘Did you buy the PC from a reputable integrator?’ Well, I ordered it through the university. I gave the specifications to the university and they bought it. So I assume they bought it from a reputable company/supplier.

  3. ‘If you reside in a country with a high prevalence of counterfeit electronic products in the market’. I live in the UK.

I am going to take the machine to the IT support people at the university and see what they will say about it.

Thanks,
eric

That seems like an excellent course of action based on the additional information provided. It would be helpful for improving my pattern matching if you could let us know what they determine is the root cause of these sudden reboots.

Hi njuffa,

I will get back with some updates once they finished with their diagnosis.

Thanks,
eric