GTX1080 crash, after reboot for crashing in windows 10, must poweroff
I am not sure whether it is the windows application making the graphics card into a wrong state. When I play some recently 3D games at the windows, the screen would turn off, the fan of graphics card is running loudly(at full speed), I must use the remote console(ssh server of the windows 10) to restart the system. I don't meet problem at booting time, the screen and the fan are normal. But I meet problem when the X server is starting. Reboot won't work, I must do a poweroff then cool startup. Even reboot at the Linux won't work. I must order a poweroff(the power adapter won't be cut off). Here is the bug reports achieve: https://drive.google.com/open?id=1zOGiho0s9I3DtUEfY5ptxuMfio50abY2
I am not sure whether it is the windows application making the graphics card into a wrong state.
When I play some recently 3D games at the windows, the screen would turn off, the fan of graphics card is running loudly(at full speed), I must use the remote console(ssh server of the windows 10) to restart the system.
I don't meet problem at booting time, the screen and the fan are normal. But I meet problem when the X server is starting.
Reboot won't work, I must do a poweroff then cool startup. Even reboot at the Linux won't work. I must order a poweroff(the power adapter won't be cut off).
Here is the bug reports achieve:


https://drive.google.com/open?id=1zOGiho0s9I3DtUEfY5ptxuMfio50abY2

#1
Posted 01/03/2018 10:50 AM   
Hi, Have you tried a clean driver install via the driver installer? Custom install with clean install checked? Also what is the wattage on your power supply? -Josh
Hi,

Have you tried a clean driver install via the driver installer? Custom install with clean install checked?

Also what is the wattage on your power supply?

-Josh

Home::MSI Z170I::Intel 6700k::16GB Corsair DDR4 3200mhz::Samsung 256gb M.2(OS)::Titan Xp:Asus ROG Swift PG278Q: Custom waterloop
Test rig::P9X79::i7-3820: Gskills 8gb::EVGA GTX 670SC SLI
Please send me a PM if I fail to keep up on replying in any specific thread
Opinions expressed here are my own and do not reflect the opinions of NVIDIA
Email Support:::Beta & Archived Driver Search:::Promo Code FAQ

#2
Posted 01/18/2018 06:47 PM   
Yes, I have tried to uninstall and clean install again. It doesn't help. My power supply is 650W and 700W at maximum.
Yes, I have tried to uninstall and clean install again. It doesn't help.
My power supply is 650W and 700W at maximum.

#3
Posted 01/21/2018 03:08 PM   
I found a way to reproduce the problem in Linux, running the cuda. I will get the following messasge: [50100.142935] NVRM: GPU at PCI:0000:02:00: GPU-0f624448-93a1-9681-1224-3fe93d7e 42f1 [50100.142947] NVRM: GPU Board Serial Number: [50100.142953] NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus. [50100.142958] NVRM: GPU at 0000:02:00.0 has fallen off the bus. [50100.142961] NVRM: GPU is on Board . [50100.142976] NVRM: A GPU crash dump has been created. If possible, please run NVRM: nvidia-bug-report.sh as root to collect this data before NVRM: the NVIDIA kernel module is unloaded. I have uploaded the full nvidia bug report result herehttps://drive.google.com/open?id=18_jEHe9tF-c9AQ2AS_derw5oW1qX8Bjr
I found a way to reproduce the problem in Linux, running the cuda. I will get the following messasge:
[50100.142935] NVRM: GPU at PCI:0000:02:00: GPU-0f624448-93a1-9681-1224-3fe93d7e
42f1
[50100.142947] NVRM: GPU Board Serial Number:
[50100.142953] NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
[50100.142958] NVRM: GPU at 0000:02:00.0 has fallen off the bus.
[50100.142961] NVRM: GPU is on Board .
[50100.142976] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.

I have uploaded the full nvidia bug report result herehttps://drive.google.com/open?id=18_jEHe9tF-c9AQ2AS_derw5oW1qX8Bjr

#4
Posted 02/28/2018 11:16 AM   
Hi ayaka, What cuda app you are running? What cuda sdk version you are using? How long does it take to repro this issue? What is the temperature of gpu when you see Xid 79 error in dmesg? You can check it with nvidia-smi or nvidia-settings on linux. Looks like you have issue on both Linux as well as windows. Reading through thread looks like this issue gpu hardware or system power supply or thermal issue. Do you have any other same gpu with which you can test? Also I think you are using supiermicro X10DAL-i motherboard, Is the issue reproduce on any other system with same gpu?
Hi ayaka, What cuda app you are running? What cuda sdk version you are using? How long does it take to repro this issue? What is the temperature of gpu when you see Xid 79 error in dmesg? You can check it with nvidia-smi or nvidia-settings on linux.

Looks like you have issue on both Linux as well as windows. Reading through thread looks like this issue gpu hardware or system power supply or thermal issue. Do you have any other same gpu with which you can test? Also I think you are using supiermicro X10DAL-i motherboard, Is the issue reproduce on any other system with same gpu?

Thanks,
Sandip.

#5
Posted 03/06/2018 06:24 AM   
After the hardware crashed, I tried to run the nvidia-smi but it would tell the hardware is not available. The cuda version is 9.1 with cudnn at the same version. The application I run can be found here https://github.com/BoyuanJiang/Age-Gender-Estimate-TF I would failed at training a model from tfrecords. I only have the other Quadro FX 380.
After the hardware crashed, I tried to run the nvidia-smi but it would tell the hardware is not available.
The cuda version is 9.1 with cudnn at the same version. The application I run can be found here

https://github.com/BoyuanJiang/Age-Gender-Estimate-TF

I would failed at training a model from tfrecords.
I only have the other Quadro FX 380.

#6
Posted 03/06/2018 07:35 AM   
Hi ayaka, Before running your app you can start `nvidia-smi -l` on other terminal in loop to check temperature. Also we never used Age-Gender-Estimate-TF app to It would be good if you can provide detailed[step-by-step] instructions to compile, build, usecase, model use so we can reproduce same issue inhouse to investigate further.
Hi ayaka, Before running your app you can start `nvidia-smi -l` on other terminal in loop to check temperature. Also we never used Age-Gender-Estimate-TF app to It would be good if you can provide detailed[step-by-step] instructions to compile, build, usecase, model use so we can reproduce same issue inhouse to investigate further.

Thanks,
Sandip.

#7
Posted 03/06/2018 09:11 AM   
Our engineers think this sounds more likely to be a hardware problem (either a hardware defect or a configuration problem, such as an insufficient PSU) than a software problem. Please contact GPU vendor for hardware support and test with same model of GPU.
Our engineers think this sounds more likely to be a hardware problem (either a hardware defect or a configuration problem, such as an insufficient PSU) than a software problem. Please contact GPU vendor for hardware support and test with same model of GPU.

Thanks,
Sandip.

#8
Posted 03/13/2018 12:24 PM   
Hi ayaka, Is this issue resolved for you?
Hi ayaka, Is this issue resolved for you?

Thanks,
Sandip.

#9
Posted 03/19/2018 10:03 AM   
I have bought the a new power adapter and with an UPS. It doesn't solve the problem. I am still contacting the vendor.
I have bought the a new power adapter and with an UPS. It doesn't solve the problem.
I am still contacting the vendor.

#10
Posted 03/21/2018 03:49 AM   
Thanks ayaka. Who is your GPU vendor? Please keep us posted.
Thanks ayaka. Who is your GPU vendor? Please keep us posted.

Thanks,
Sandip.

#11
Posted 03/21/2018 03:57 AM   
ASUS, a Taiwan(Republic of China) computer vendor.
ASUS, a Taiwan(Republic of China) computer vendor.

#12
Posted 03/21/2018 09:26 AM   
Are you running the nvidea sound drivers along side realtek sound drivers, if so remove the realtek driver and they shouldn't clash, it is a weird issue but sometimes it resolves the crashing
Are you running the nvidea sound drivers along side realtek sound drivers, if so remove the realtek driver and they shouldn't clash, it is a weird issue but sometimes it resolves the crashing

#13
Posted 8 hours ago   
By sound driver I mean audio driver :)
By sound driver I mean audio driver :)

#14
Posted 8 hours ago   
Scroll To Top

Add Reply