Rock and hard place: 375.26 fails to suspend but required for CUDA 8.0

I’m running Ubuntu 16.04.1 LTS x64 (4.8.0-36-generic #36~16.04.1-Ubuntu SMP) and am unable to suspend to RAM with driver 375.26-0ubuntu1. Two GPUs, Nvidia GTX 670 (not in SLI).

I am triggering suspend to RAM via pm-suspend, systemctl suspend, or selecting suspend option in Unity before logging in. Regardless of how I do it, my screens always go blank except for a single text cursor that doesn’t blink. No input works any more, but all power is still on, and CPU appears to be at 100% (based on wattage consumption). Nothing shows up in syslog (see below). Only thing I can do in this state is reset / power cycle. This is 100% reproducible, but I don’t know how to meaningfully debug this.

I uninstalled 375.26-0ubuntu1 and installed 367.57-0ubuntu0.16.04.1, and the suspend problems go away. Without any nvidia drivers installed and using the default Nouveau drivers, suspend also works. So I suspect this problem is specific to 375.

Unfortunately for me, I need CUDA 8.0, and the CUDA installer uninstalls 367 and installs 375. Workarounds or other suggestions much appreciated.


Here are some comparisons of /var/log/syslog on 367 and 375 when I issue the suspend command. The 375 log looks exactly like the 367 at the beginning, gets hung at some point, and the next entry is the restart of the system after I reset.

On 375: suspend at 16:50, reset

Feb 12 07:16:50 home NetworkManager[1162]: <info>  [1486912610.2089] manager: sleep requested (sleeping: no  enabled: yes)
Feb 12 07:16:50 home NetworkManager[1162]: <info>  [1486912610.2089] manager: sleeping...
Feb 12 07:16:50 home NetworkManager[1162]: <info>  [1486912610.2090] manager: NetworkManager state is now ASLEEP
Feb 12 07:16:50 home whoopsie[1141]: [07:16:50] offline
Feb 12 07:16:50 home systemd[1]: Reached target Sleep.
Feb 12 07:16:50 home systemd[1]: Starting Suspend...
Feb 12 07:16:50 home systemd-sleep[6522]: Failed to connect to non-global ctrl_ifname: (nil)  error: No such file or directory
Feb 12 07:16:50 home systemd-sleep[6523]: /lib/systemd/system-sleep/wpasupplicant failed with error code 255.
Feb 12 07:16:50 home systemd-sleep[6522]: Suspending system.
Feb 12 07:19:19 home rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="1114" x-info="http://www.rsyslog.com"] start

On 367: suspend at 53:38, resume at 53:49

Feb 12 07:53:38 home NetworkManager[1184]: <info>  [1486914818.0894] manager: sleep requested (sleeping: no  enabled: yes)
Feb 12 07:53:38 home NetworkManager[1184]: <info>  [1486914818.0895] manager: sleeping...
Feb 12 07:53:38 home NetworkManager[1184]: <info>  [1486914818.0895] manager: NetworkManager state is now ASLEEP
Feb 12 07:53:38 home whoopsie[1107]: [07:53:38] offline
Feb 12 07:53:38 home systemd[1]: Reached target Sleep.
Feb 12 07:53:38 home systemd[1]: Starting Suspend...
Feb 12 07:53:38 home systemd-sleep[3689]: Failed to connect to non-global ctrl_ifname: (nil)  error: No such file or directory
Feb 12 07:53:38 home systemd-sleep[3690]: /lib/systemd/system-sleep/wpasupplicant failed with error code 255.
Feb 12 07:53:38 home systemd-sleep[3689]: Suspending system...
Feb 12 07:53:38 home kernel: [  303.626951] PM: Syncing filesystems ... done.
Feb 12 07:53:38 home kernel: [  303.652066] PM: Preparing system for sleep (mem)
Feb 12 07:53:49 home kernel: [  304.053097] Freezing user space processes ... (elapsed 0.001 seconds) done.
Feb 12 07:53:49 home kernel: [  304.054513] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds) 
Feb 12 07:53:49 home kernel: [  304.054552] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Feb 12 07:53:49 home kernel: [  304.055687] PM: Suspending system (mem)
Feb 12 07:53:49 home kernel: [  304.055713] Suspending console(s) (use no_console_suspend to debug)
Feb 12 07:53:49 home kernel: [  304.055917] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
Feb 12 07:53:49 home kernel: [  304.055929] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Feb 12 07:53:49 home kernel: [  304.055930] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Feb 12 07:53:49 home kernel: [  304.056005] sd 0:0:0:0: [sda] Stopping disk
Feb 12 07:53:49 home kernel: [  304.056040] sd 1:0:0:0: [sdb] Stopping disk
Feb 12 07:53:49 home kernel: [  304.056060] sd 2:0:0:0: [sdc] Stopping disk
Feb 12 07:53:49 home kernel: [  304.056279] serial 00:02: disabled
Feb 12 07:53:49 home kernel: [  304.056437] parport_pc 00:01: disabled
Feb 12 07:53:49 home kernel: [  304.057333] e1000e: EEE TX LPI TIMER: 00000011
Feb 12 07:53:49 home kernel: [  304.800028] PM: suspend of devices complete after 743.747 msecs
Feb 12 07:53:49 home kernel: [  304.800445] PM: late suspend of devices complete after 0.415 msecs
Feb 12 07:53:49 home kernel: [  304.800873] xhci_hcd 0000:07:00.0: System wakeup enabled by ACPI
Feb 12 07:53:49 home kernel: [  304.800965] e1000e 0000:00:1f.6: System wakeup enabled by ACPI
Feb 12 07:53:49 home kernel: [  304.801160] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
Feb 12 07:53:49 home kernel: [  304.838257] PM: noirq suspend of devices complete after 37.785 msecs
Feb 12 07:53:49 home kernel: [  304.839088] ACPI: Preparing to enter system sleep state S3
Feb 12 07:53:49 home kernel: [  304.840563] PM: Saving platform NVS memory
Feb 12 07:53:49 home kernel: [  304.840652] Disabling non-boot CPUs ...
Feb 12 07:53:49 home kernel: [  304.841151] Broke affinity for irq 124
Feb 12 07:53:49 home kernel: [  304.841163] Broke affinity for irq 128
Feb 12 07:53:49 home kernel: [  304.841167] Broke affinity for irq 129
Feb 12 07:53:49 home kernel: [  304.841171] Broke affinity for irq 133
Feb 12 07:53:49 home kernel: [  304.842245] smpboot: CPU 1 is now offline
Feb 12 07:53:49 home kernel: [  304.859243] Broke affinity for irq 19
Feb 12 07:53:49 home kernel: [  304.859252] Broke affinity for irq 123
Feb 12 07:53:49 home kernel: [  304.859256] Broke affinity for irq 124
Feb 12 07:53:49 home kernel: [  304.859260] Broke affinity for irq 125
Feb 12 07:53:49 home kernel: [  304.859263] Broke affinity for irq 126
Feb 12 07:53:49 home kernel: [  304.859269] Broke affinity for irq 128
Feb 12 07:53:49 home kernel: [  304.859273] Broke affinity for irq 129
Feb 12 07:53:49 home kernel: [  304.859277] Broke affinity for irq 133
Feb 12 07:53:49 home kernel: [  304.859281] Broke affinity for irq 134
Feb 12 07:53:49 home kernel: [  304.860340] smpboot: CPU 2 is now offline
Feb 12 07:53:49 home kernel: [  304.882993] Broke affinity for irq 1
Feb 12 07:53:49 home kernel: [  304.882997] Broke affinity for irq 5
Feb 12 07:53:49 home kernel: [  304.883001] Broke affinity for irq 8
Feb 12 07:53:49 home kernel: [  304.883005] Broke affinity for irq 9
Feb 12 07:53:49 home kernel: [  304.883009] Broke affinity for irq 12
Feb 12 07:53:49 home kernel: [  304.883014] Broke affinity for irq 19
Feb 12 07:53:49 home kernel: [  304.883023] Broke affinity for irq 123
Feb 12 07:53:49 home kernel: [  304.883027] Broke affinity for irq 124
Feb 12 07:53:49 home kernel: [  304.883031] Broke affinity for irq 125
Feb 12 07:53:49 home kernel: [  304.883035] Broke affinity for irq 126
Feb 12 07:53:49 home kernel: [  304.883038] Broke affinity for irq 127
Feb 12 07:53:49 home kernel: [  304.883042] Broke affinity for irq 128
Feb 12 07:53:49 home kernel: [  304.883045] Broke affinity for irq 129
Feb 12 07:53:49 home kernel: [  304.883050] Broke affinity for irq 133
Feb 12 07:53:49 home kernel: [  304.883054] Broke affinity for irq 134
Feb 12 07:53:49 home kernel: [  304.884115] smpboot: CPU 3 is now offline
Feb 12 07:53:49 home kernel: [  304.901305] ACPI: Low-level resume complete
Feb 12 07:53:49 home kernel: [  304.901391] PM: Restoring platform NVS memory
Feb 12 07:53:49 home kernel: [  304.902107] Enabling non-boot CPUs ...
Feb 12 07:53:49 home kernel: [  304.925375] x86: Booting SMP configuration:
Feb 12 07:53:49 home kernel: [  304.925376] smpboot: Booting Node 0 Processor 1 APIC 0x2
Feb 12 07:53:49 home kernel: [  304.927812]  cache: parent cpu1 should not be sleeping
Feb 12 07:53:49 home kernel: [  304.927917] CPU1 is up
Feb 12 07:53:49 home kernel: [  304.981680] smpboot: Booting Node 0 Processor 2 APIC 0x4
Feb 12 07:53:49 home kernel: [  304.985164]  cache: parent cpu2 should not be sleeping
Feb 12 07:53:49 home kernel: [  304.985722] CPU2 is up
Feb 12 07:53:49 home kernel: [  305.025775] smpboot: Booting Node 0 Processor 3 APIC 0x6
Feb 12 07:53:49 home kernel: [  305.029269]  cache: parent cpu3 should not be sleeping
Feb 12 07:53:49 home kernel: [  305.029870] CPU3 is up
Feb 12 07:53:49 home kernel: [  305.038806] ACPI: Waking up from system sleep state S3
Feb 12 07:53:49 home kernel: [  305.065420] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
Feb 12 07:53:49 home kernel: [  305.081548] xhci_hcd 0000:07:00.0: System wakeup disabled by ACPI
Feb 12 07:53:49 home kernel: [  305.081585] PM: noirq resume of devices complete after 37.290 msecs
Feb 12 07:53:49 home kernel: [  305.082703] PM: early resume of devices complete after 1.054 msecs
Feb 12 07:53:49 home kernel: [  305.082972] usb usb1: root hub lost power or was reset
Feb 12 07:53:49 home kernel: [  305.082975] usb usb2: root hub lost power or was reset
Feb 12 07:53:49 home kernel: [  305.083186] e1000e 0000:00:1f.6: System wakeup disabled by ACPI
Feb 12 07:53:49 home kernel: [  305.085439] parport_pc 00:01: activated
Feb 12 07:53:49 home kernel: [  305.087007] serial 00:02: activated
Feb 12 07:53:49 home kernel: [  305.087015] rtc_cmos 00:05: System wakeup disabled by ACPI
Feb 12 07:53:49 home kernel: [  305.087403] usb usb3: root hub lost power or was reset
Feb 12 07:53:49 home kernel: [  305.087405] usb usb4: root hub lost power or was reset
Feb 12 07:53:49 home kernel: [  305.158275] sd 1:0:0:0: [sdb] Starting disk
Feb 12 07:53:49 home kernel: [  305.158275] sd 0:0:0:0: [sda] Starting disk
Feb 12 07:53:49 home kernel: [  305.158284] sd 2:0:0:0: [sdc] Starting disk
Feb 12 07:53:49 home kernel: [  305.431047] usb 1-14: reset high-speed USB device number 2 using xhci_hcd
Feb 12 07:53:49 home kernel: [  305.532009] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb 12 07:53:49 home kernel: [  305.532040] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 12 07:53:49 home kernel: [  305.532053] ata5: SATA link down (SStatus 4 SControl 300)
Feb 12 07:53:49 home kernel: [  305.532066] ata6: SATA link down (SStatus 4 SControl 300)
Feb 12 07:53:49 home kernel: [  305.542027] ata1.00: supports DRM functions and may not be fully accessible
Feb 12 07:53:49 home kernel: [  305.556548] ata4.00: configured for UDMA/133
Feb 12 07:53:49 home kernel: [  305.563371] ata1.00: disabling queued TRIM support
Feb 12 07:53:49 home kernel: [  305.594373] ata1.00: supports DRM functions and may not be fully accessible
Feb 12 07:53:49 home kernel: [  305.619325] ata1.00: disabling queued TRIM support
Feb 12 07:53:49 home kernel: [  305.625961] ata1.00: configured for UDMA/133
Feb 12 07:53:49 home kernel: [  305.751158] firewire_core 0000:06:00.0: rediscovered device fw0
Feb 12 07:53:49 home kernel: [  306.110228] usb 1-14.1: reset full-speed USB device number 3 using xhci_hcd
Feb 12 07:53:49 home kernel: [  306.365858] usb 1-14.2: reset full-speed USB device number 8 using xhci_hcd
Feb 12 07:53:49 home kernel: [  306.695998] usb 1-14.1.2: reset full-speed USB device number 5 using xhci_hcd
Feb 12 07:53:49 home kernel: [  307.131629] PM: resume of devices complete after 2047.791 msecs
Feb 12 07:53:49 home kernel: [  307.131895] PM: Finishing wakeup.
Feb 12 07:53:49 home kernel: [  307.131896] Restarting tasks ... 
Feb 12 07:53:49 home kernel: [  307.132056] e1000e: enp0s31f6 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
Feb 12 07:53:49 home kernel: [  307.132058] e1000e 0000:00:1f.6 enp0s31f6: 10/100 speed: disabling TSO
Feb 12 07:53:49 home kernel: [  307.132227] pci_bus 0000:05: Allocating resources
Feb 12 07:53:49 home kernel: [  307.132238] pci 0000:04:00.0: bridge window [io  0x1000-0x0fff] to [bus 05] add_size 1000
Feb 12 07:53:49 home kernel: [  307.132239] pci 0000:04:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 05] add_size 200000 add_align 100000
Feb 12 07:53:49 home kernel: [  307.132240] pci 0000:04:00.0: bridge window [mem 0x00100000-0x000fffff] to [bus 05] add_size 200000 add_align 100000
Feb 12 07:53:49 home kernel: [  307.132241] pci 0000:04:00.0: res[14]=[mem 0x00100000-0x000fffff] res_to_dev_res add_size 200000 min_align 100000
Feb 12 07:53:49 home kernel: [  307.132241] pci 0000:04:00.0: res[14]=[mem 0x00100000-0x002fffff] res_to_dev_res add_size 200000 min_align 100000
Feb 12 07:53:49 home kernel: [  307.132242] pci 0000:04:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
Feb 12 07:53:49 home kernel: [  307.132243] pci 0000:04:00.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
Feb 12 07:53:49 home kernel: [  307.132243] pci 0000:04:00.0: res[13]=[io  0x1000-0x0fff] res_to_dev_res add_size 1000 min_align 1000
Feb 12 07:53:49 home kernel: [  307.132244] pci 0000:04:00.0: res[13]=[io  0x1000-0x1fff] res_to_dev_res add_size 1000 min_align 1000
Feb 12 07:53:49 home kernel: [  307.132245] pci 0000:04:00.0: BAR 14: no space for [mem size 0x00200000]
Feb 12 07:53:49 home kernel: [  307.132245] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x00200000]
Feb 12 07:53:49 home kernel: [  307.132246] pci 0000:04:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
Feb 12 07:53:49 home kernel: [  307.132246] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
Feb 12 07:53:49 home kernel: [  307.132247] pci 0000:04:00.0: BAR 13: no space for [io  size 0x1000]
Feb 12 07:53:49 home kernel: [  307.132247] pci 0000:04:00.0: BAR 13: failed to assign [io  size 0x1000]
Feb 12 07:53:49 home kernel: [  307.132248] pci 0000:04:00.0: BAR 14: no space for [mem size 0x00200000]
Feb 12 07:53:49 home kernel: [  307.132249] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x00200000]
Feb 12 07:53:49 home kernel: [  307.132249] pci 0000:04:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
Feb 12 07:53:49 home kernel: [  307.132250] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
Feb 12 07:53:49 home kernel: [  307.132250] pci 0000:04:00.0: BAR 13: no space for [io  size 0x1000]
Feb 12 07:53:49 home kernel: [  307.132251] pci 0000:04:00.0: BAR 13: failed to assign [io  size 0x1000]
Feb 12 07:53:49 home kernel: [  307.132251] pci 0000:04:00.0: PCI bridge to [bus 05]
Feb 12 07:53:49 home systemd[1983]: Time has been changed
Feb 12 07:53:49 home systemd[1]: Time has been changed
Feb 12 07:53:49 home systemd[1]: snapd.refresh.timer: Adding 4h 16min 45.744513s random time.
Feb 12 07:53:49 home systemd[1]: snapd.refresh.timer: Adding 4h 40min 32.511656s random time.
Feb 12 07:53:49 home systemd[1]: apt-daily.timer: Adding 4h 57min 811.661ms random time.
Feb 12 07:53:49 home acpid: client 1303[0:0] has disconnected
Feb 12 07:53:49 home systemd-sleep[3689]: System resumed.
....

I’ve used 378 with CUDA 8. Not sure it’s required but they setup the .deb package to install it. If you install the CUDA package you can switch back to 378 just fine (or it did for me I should say.)

How did you switch back to 378? Anytime I try to install CUDA (apt-get install cuda from the deb file downloaded) it tries to install nvidia-375 and uninstall nvidia-378. Uninstalling nvidia-375 uninstalls cuda. Installing nvidia-378 uninstalls nvidia-375, which uninstalls cuda. And trying to install both via apt-get install cuda nvidia-378 results in

Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-378 is already the newest version (378.09-0ubuntu0~gpu16.04.1).
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-8-0 (>= 8.0.61) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Install CUDA as before. it will install 375. From there go to the additional drivers panel (if in Ubuntu or Mint.) You can switch back to 378 and go abut your business.

Doesn’t seem to work for me. I can select nvidia-378, but as soon as I hit Apply Changes the radio button jumps back to 375 and nothing happens. I suspect that the cuda package incompatibility described above causes this behavior.

Hmmmm, I wonder if you remove CUDA and reinstall to try the procedure again. I did a a few installs before using the same technique without an issue. Are you on 16.04?

Yes, I’m on 16.04.1 LTS. I tried uninstalling cuda but no dice. If it matters, I’m selecting Linux > x86_64 > Ubuntu > 16.04 > deb (network), and the version of CUDA I have is 8.0.61-1. (I believe there was an earlier version of CUDA 8.0).

Ok I checked a few things and I was able to update today to new versions of the 378 driver (from 378.09 to 378.13) and retain the Cuda 8 libraries from the site ( I install two days ago so it’s current.)

So maybe this is helpful:"

I’m using the graphics library PPA to install latest Nvidia drivers and I pull the latest kernels using the Ukuu tool. Currently I am running 4.9.10. All works very well.

My configuration is:
Gigabyte z170 ud5
6700k
Gigabyte xTreme Gaming GTX1080

I am running Unity-based Ubuntu 16.04 (with above mentioned changes)
I don’t use developer preview or the unstable PPAs.

I verified CUDA works by compiling a CUDA accelerated ffmpeg and use blender with GPU rendering successfully.