Extremely bad performance when waking up from hibernation on gtx1060

I just swapped my old gtx750 for a new gtx1060
Before all was good, but now there are issues; to reproduce:

  • Start the system normally
  • Let unigine valley free demo run for a couple of minutes
  • Put the system into hibernation
  • Wake up the system
  • Let unigine valley run a couple of minutes

After a while, the system starts to lag; even the mouse is jerky.
In the kernel logs, an unstoppable flood/LOOP (SEE A PATTERN) of the following will happen:

dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000a4 00010053 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000c0 00010076 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000a4 00010053 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000c0 00010076 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000a4 00010053 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000c0 00010076 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000a4 00010053 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000c0 00010076 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000a4 00010053 00000007 00000000
dic 26 09:16:47 slimer kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000c0 00010078 00000007 00000000

System is still alive, but not very responsive.
If you manage to kill the Xorg process and restart it, all seems fine again.

I tried with and without a framebuffer console, no changes.

http://wpage.unina.it/aorefice/sharevari/nvidia-bug-report.log.gz

What’s the make and model # of the monitor you are using and how did you connect it to the GTX 750 and then the GTX 1060?

Belinea Monitor connected through DVI-D.
How is that relevant?

Process of elimination. A number of display function issues involve *bad Display Port cables which can now be ruled out in your case. BTW. DVD-D SL or DVD-D DL? Have you tried a different cable or electrical contact cleaner on the one you have?

Does your Belinea monitor have any sort of deep sleep feature? If so, have you tried turning it off?

Re deep sleep:

Post #126
GTX 970/980 BIOS update for DisplayPort issues - Page 13
https://rog.asus.com/forum/showthread.php?59850-GTX-970-980-BIOS-update-for-DisplayPort-issues&p=568296&viewfull=1#post568296

*Notice regarding incompatibility of certain 3rd party DisplayPort video cables
http://www.necdisplay.com/documents/Miscellaneous/DisplayPort_Notice.pdf

How to Choose a DisplayPort Cable, and Not Get a Bad One! - DisplayPort
http://www.displayport.org/cables/how-to-choose-a-displayport-cable-and-not-get-a-bad-one/

Cables cannot be related to the hibernation/wakeup process, so please let’s exclude hardware issues, it is a driver one.

Xid 56 error usually but not always indicates a hardware issue.

Don’t expect a fix or a reply from NVIDIA any time soon.

We are tracking this issue under bug 200273112 : linux eu/oem: Desktop and Applications freeze once resume from suspend with Xid 56 with GTX-1060

Thanks, but please note that i’m not using uefi.

Can I get output of dmesg command to see boot logs of your system?

Sure, i’ll post as soon as i go home.

I have confirmed we are able to repro this issue in legacy mode also.

“Great”, thanks for let me now. Do you still need dmesg output?

Hi sandipt, are there progresses on the issue?

Investigation is in progress…

@sandipt, I’m getting the xid 56 error on resume from hibernate too after upgrading from 730GT to gtx 1050Ti. It mostly happens when the hibernate was over an hour.

The system itself is very stable even when running something like Unigine Heaven-4.0 for hours.
The only issue is with hibernate. I am booting in legacy mode because I didn’t want to use uefi.

…so do you still experience slowness when resume, with jerky mouse too?
I’ve still to try newer drivers, i’m stuck with 375.26 which have the problem, but i tought it was fixed, because in the following changelogs:

i read:
“Fixed a bug that could cause a system hang when resuming from suspend with some GPUs.”

But if it is still not fixed, then i ask some light about the issue, what’s the state of this bug sandipt? it is almost an year and i just noticed that (on 375.26) it happens even when waking up from suspend to ram.

I’m using legacy mode too. My CPU supports secure boot but I can’t risk migrating without doing so on a test machine first.

I don’t know, nor i can see any evidence in this thread, that the problem may depend on the use of UEFI.
Also, i asked if you experienced the very same:

Anyway, i just found that switching VT to a console and back to X, solves the problem for the session, but it has to be done after the problem arises, so here is an hacky workaround that works for me.
Needs to run has root.
It checks for Xid errors in the last line in the kernel buffer and if it founds that it matches the error expected, it switches vt and then back.
The screen will blank for a while, but after that, the graphic will become smooth again; it seems to be needed just one time per resume.
Just start it at boot and forget it, it has almost 0 cpu use.

#!/bin/bash
while true ; do
    #EDIT NVRM string as it appears in your log.
    if dmesg|tail -n 1|grep "NVRM: Xid (PCI:0000:01:00): 56, CMDre" ; then
        chvt 1  #EDIT as your need
        chvt 7  #EDIT as your need
        sleep 1
        echo koko_switch_vt_done > /dev/kmsg
        #if the slowness persists, give user some time to shut down.
        sleep 60
    fi
    sleep 1
    #echo recheck
done

Expierncing the same, but with GTX980, switching VT does not help.
My thread: Linux suspend problem - Linux - NVIDIA Developer Forums

So, i’ve to consider myself “lucky”.
I wonder where the nvidia devs are, this issue seems pretty critical to me and more than a year passed.

This is not the kind of support one would expect after paying several hundered of euros.