940MX black screen and 100% Xorg CPU usage on system resume
Hi, On GeForce 940MX found embedded in Asus X705UQ, the system appears to hang during resume after S3 suspend. Testing with driver version 387.34. At this point the screen is black, but the computer is responsive over ssh. There are no error messages shown in dmesg. Xorg is using 100% CPU. Backtrace at this point is: [code] #0 0x00007ff9d6029262 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #1 0x00007ff9d602ddf9 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #2 0x00007ff9d602d389 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #3 0x00007ff9d5fc2b21 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #4 0x00007ff9d5ffe230 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #5 0x00007ff9d5fcdba1 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #6 0x00007ff9d6536fd1 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #7 0x000000000265a070 in ?? () #8 0x0000000001e899f0 in ?? () #9 0x000000000272cdd0 in ?? () #10 0x000000000047f80e in CMapEnterVT () #11 0x000000000048a98c in xf86XVEnterVT () #12 0x0000000000477dd0 in xf86VTEnter () #13 0x000000000049cb98 in systemd_logind_vtenter () #14 0x000000000049ceb5 in message_filter () #15 0x00007ff9de1201ad in dbus_connection_dispatch () from /lib64/libdbus-1.so.3 #16 0x00007ff9de1205c8 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3 #17 0x0000000000496981 in socket_handler () #18 0x000000000059df41 in ospoll_wait () #19 0x0000000000596f9b in WaitForSomething () #20 0x0000000000435603 in Dispatch () #21 0x00000000004398a0 in dix_main () [/code] nvidia-bug-report output: https://gist.github.com/dsd/aaf76c091c2658b13fd6656e055dae58 (this was captured over ssh while the system was in this hung state) Interestingly if I VT switch away from X before S3 suspend, I can suspend from there, and it will also resume fine to that state. However upon then changing VT back to X, the hang state occurs and I can't recover. Please let me know how I can help further.
Hi,

On GeForce 940MX found embedded in Asus X705UQ, the system appears to hang during resume after S3 suspend. Testing with driver version 387.34.

At this point the screen is black, but the computer is responsive over ssh.

There are no error messages shown in dmesg. Xorg is using 100% CPU. Backtrace at this point is:

#0  0x00007ff9d6029262 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#1 0x00007ff9d602ddf9 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#2 0x00007ff9d602d389 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#3 0x00007ff9d5fc2b21 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#4 0x00007ff9d5ffe230 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#5 0x00007ff9d5fcdba1 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#6 0x00007ff9d6536fd1 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#7 0x000000000265a070 in ?? ()
#8 0x0000000001e899f0 in ?? ()
#9 0x000000000272cdd0 in ?? ()
#10 0x000000000047f80e in CMapEnterVT ()
#11 0x000000000048a98c in xf86XVEnterVT ()
#12 0x0000000000477dd0 in xf86VTEnter ()
#13 0x000000000049cb98 in systemd_logind_vtenter ()
#14 0x000000000049ceb5 in message_filter ()
#15 0x00007ff9de1201ad in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#16 0x00007ff9de1205c8 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3
#17 0x0000000000496981 in socket_handler ()
#18 0x000000000059df41 in ospoll_wait ()
#19 0x0000000000596f9b in WaitForSomething ()
#20 0x0000000000435603 in Dispatch ()
#21 0x00000000004398a0 in dix_main ()


nvidia-bug-report output: https://gist.github.com/dsd/aaf76c091c2658b13fd6656e055dae58

(this was captured over ssh while the system was in this hung state)

Interestingly if I VT switch away from X before S3 suspend, I can suspend from there, and it will also resume fine to that state. However upon then changing VT back to X, the hang state occurs and I can't recover.

Please let me know how I can help further.

#1
Posted 12/27/2017 06:34 PM   
Looks like some acpi problem. Try using kernel parameter acpi_osi=! acpi_osi="Windows 2009" Please run acpidump and attach output to post.
Looks like some acpi problem. Try using kernel parameter
acpi_osi=! acpi_osi="Windows 2009"
Please run acpidump and attach output to post.

#2
Posted 12/29/2017 12:29 AM   
Thanks for the suggestion. With those params, dmesg reports: [code][ 0.100982] ACPI: Disabled all _OSI OS vendors [ 0.100982] ACPI: Added _OSI(Module Device) [ 0.100982] ACPI: Added _OSI(Processor Device) [ 0.100982] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.100982] ACPI: Added _OSI(Processor Aggregator Device) [ 0.100982] ACPI: Added _OSI(Windows 2009) [/code] There is no change to the resume behaviour, the bug still reproduces exactly as described above. acpidump: https://gist.github.com/dsd/1b8dfd188797dd5297408f4640052925
Thanks for the suggestion. With those params, dmesg reports:

[    0.100982] ACPI: Disabled all _OSI OS vendors
[ 0.100982] ACPI: Added _OSI(Module Device)
[ 0.100982] ACPI: Added _OSI(Processor Device)
[ 0.100982] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.100982] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.100982] ACPI: Added _OSI(Windows 2009)


There is no change to the resume behaviour, the bug still reproduces exactly as described above.

acpidump: https://gist.github.com/dsd/1b8dfd188797dd5297408f4640052925

#3
Posted 12/29/2017 12:33 PM   
None of the usual suspects found in the acpidump. See if this is reproducible by using bbswitch: Stop X unload nvidia modules load bbswitch turn off nvidia gpu using bbswitch turn on nvidia gpu using bbswitch use cat /proc/acpi/bbswitch to see if it is really on load nvidia modules start X
None of the usual suspects found in the acpidump.
See if this is reproducible by using bbswitch:
Stop X
unload nvidia modules
load bbswitch
turn off nvidia gpu using bbswitch
turn on nvidia gpu using bbswitch
use cat /proc/acpi/bbswitch to see if it is really on
load nvidia modules
start X

#4
Posted 12/30/2017 03:22 PM   
to get a bit more info, use kernel parameter acpi.aml_debug_output=1 I think you might be hit by this: [url]https://bugzilla.kernel.org/show_bug.cgi?id=156341[/url] due to following code in ssdt3 [code] While (\_SB.PCI0.LKS1 < 0x07) { Sleep (One) } [/code]
to get a bit more info, use kernel parameter
acpi.aml_debug_output=1
I think you might be hit by this:
https://bugzilla.kernel.org/show_bug.cgi?id=156341
due to following code in ssdt3
While (\_SB.PCI0.LKS1 < 0x07)
{
Sleep (One)
}

#5
Posted 12/30/2017 03:53 PM   
[quote=""]Stop X unload nvidia modules load bbswitch turn off nvidia gpu using bbswitch turn on nvidia gpu using bbswitch use cat /proc/acpi/bbswitch to see if it is really on load nvidia modules start X[/quote] This works fine, X came up using nvidia again. I then did a suspend/resume and it froze again though. I took a look at https://bugzilla.kernel.org/show_bug.cgi?id=156341 With nouveau enabled: [code] root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_enabled enabled root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status suspended root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat control auto root@endless:/sys/bus/pci/devices/0000:01:00.0/power# lspci 00:00.0 Host bridge: Intel Corporation Device 5904 (rev 02) 00:02.0 VGA compatible controller: Intel Corporation Device 5916 (rev 02) 00:04.0 Signal processing controller: Intel Corporation Skylake Processor Thermal Subsystem (rev 02) 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21) 00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1) 00:1e.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO UART Controller #0 (rev 21) 00:1e.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO SPI Controller #0 (rev 21) 00:1f.0 ISA bridge: Intel Corporation Device 9d4e (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Device 9d71 (rev 21) 00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21) 01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2) 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) 03:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 31) root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status active root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status suspending root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status suspended [/code] So I do not seem to be facing that issue. Also nouveau can suspend/resume just fine. The issue only happens when using the official nvidia driver. I added acpi.aml_debug_output=1 and acpi.debug_layer=0x10000000 acpi.debug_level=0xffffffff but "dmesg | grep -i 'ACPI DEBUG'" output is empty, so it doesn't look like any debug statements are being hit here.
said:Stop X
unload nvidia modules
load bbswitch
turn off nvidia gpu using bbswitch
turn on nvidia gpu using bbswitch
use cat /proc/acpi/bbswitch to see if it is really on
load nvidia modules
start X


This works fine, X came up using nvidia again. I then did a suspend/resume and it froze again though.

I took a look at https://bugzilla.kernel.org/show_bug.cgi?id=156341

With nouveau enabled:
root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_enabled 
enabled
root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status
suspended
root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat control
auto
root@endless:/sys/bus/pci/devices/0000:01:00.0/power# lspci
00:00.0 Host bridge: Intel Corporation Device 5904 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Device 5916 (rev 02)
00:04.0 Signal processing controller: Intel Corporation Skylake Processor Thermal Subsystem (rev 02)
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)
00:1e.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO UART Controller #0 (rev 21)
00:1e.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO SPI Controller #0 (rev 21)
00:1f.0 ISA bridge: Intel Corporation Device 9d4e (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Device 9d71 (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
03:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 31)
root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status
active
root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status
suspending
root@endless:/sys/bus/pci/devices/0000:01:00.0/power# cat runtime_status
suspended


So I do not seem to be facing that issue. Also nouveau can suspend/resume just fine. The issue only happens when using the official nvidia driver.

I added acpi.aml_debug_output=1 and acpi.debug_layer=0x10000000 acpi.debug_level=0xffffffff but "dmesg | grep -i 'ACPI DEBUG'" output is empty, so it doesn't look like any debug statements are being hit here.

#6
Posted 01/01/2018 12:44 PM   
Ok. So it's open to know whether the X or the kernel driver is hanging. Did you try stop X suspend resume start X ? If that works, does starting acpid help?
Ok. So it's open to know whether the X or the kernel driver is hanging. Did you try
stop X
suspend
resume
start X
?
If that works, does starting acpid help?

#7
Posted 01/02/2018 02:34 PM   
Suspend/resume was fine. Then on starting X again, the kernel logged errors: [code][ 108.102157] NVRM: RmInitAdapter failed! (0x26:0xffff:1114) [ 108.102186] NVRM: rm_init_adapter failed for device bearing minor number 0 [ 108.259364] NVRM: RmInitAdapter failed! (0x26:0xffff:1114) [ 108.259390] NVRM: rm_init_adapter failed for device bearing minor number 0 [ 108.409171] NVRM: RmInitAdapter failed! (0x26:0xffff:1114) [ 108.409222] NVRM: rm_init_adapter failed for device bearing minor number 0 [ 108.558814] NVRM: RmInitAdapter failed! (0x26:0xffff:1114) [ 108.558843] NVRM: rm_init_adapter failed for device bearing minor number 0 [ 108.709373] NVRM: RmInitAdapter failed! (0x26:0xffff:1114) [ 108.709498] NVRM: rm_init_adapter failed for device bearing minor number 0 [ 108.858905] NVRM: RmInitAdapter failed! (0x26:0xffff:1114) [ 108.858934] NVRM: rm_init_adapter failed for device bearing minor number 0 [/code] and X failed to launch with these errors: [code] [ 328.964] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0. Please [ 328.964] (EE) NVIDIA(GPU-0): check your system's kernel log for additional error [ 328.964] (EE) NVIDIA(GPU-0): messages and refer to Chapter 8: Common Problems in the [ 328.964] (EE) NVIDIA(GPU-0): README for additional information. [ 328.964] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device! [ 328.964] (EE) NVIDIA(0): Failing initialization of X screen 0 [/code] If testing acpid is still relevant, can you specify at which point in the test sequence I should start it?
Suspend/resume was fine. Then on starting X again, the kernel logged errors:

[  108.102157] NVRM: RmInitAdapter failed! (0x26:0xffff:1114)
[ 108.102186] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 108.259364] NVRM: RmInitAdapter failed! (0x26:0xffff:1114)
[ 108.259390] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 108.409171] NVRM: RmInitAdapter failed! (0x26:0xffff:1114)
[ 108.409222] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 108.558814] NVRM: RmInitAdapter failed! (0x26:0xffff:1114)
[ 108.558843] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 108.709373] NVRM: RmInitAdapter failed! (0x26:0xffff:1114)
[ 108.709498] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 108.858905] NVRM: RmInitAdapter failed! (0x26:0xffff:1114)
[ 108.858934] NVRM: rm_init_adapter failed for device bearing minor number 0


and X failed to launch with these errors:

[   328.964] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
[ 328.964] (EE) NVIDIA(GPU-0): check your system's kernel log for additional error
[ 328.964] (EE) NVIDIA(GPU-0): messages and refer to Chapter 8: Common Problems in the
[ 328.964] (EE) NVIDIA(GPU-0): README for additional information.
[ 328.964] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[ 328.964] (EE) NVIDIA(0): Failing initialization of X screen 0


If testing acpid is still relevant, can you specify at which point in the test sequence I should start it?

#8
Posted 01/02/2018 05:21 PM   
At least there's an error now. acpid should be irrelevant. Can you check if the 384 driver works?
At least there's an error now. acpid should be irrelevant.
Can you check if the 384 driver works?

#9
Posted 01/02/2018 06:44 PM   
Reproduced the same issue on 384.98.
Reproduced the same issue on 384.98.

#10
Posted 01/02/2018 07:03 PM   
Also reproduced on driver version 390.12 Also reproduced on Asus X542UQ (NVIDIA GM108 940MX).
Also reproduced on driver version 390.12
Also reproduced on Asus X542UQ (NVIDIA GM108 940MX).

#11
Posted 01/09/2018 03:57 PM   
Another idea, tried pcie_port_pm=off kernel parameter? Looks like the ASUS/940MX combo has regularly problems waking up again telling by the threads in this forum.
Another idea, tried
pcie_port_pm=off
kernel parameter?
Looks like the ASUS/940MX combo has regularly problems waking up again telling by the threads in this forum.

#12
Posted 01/09/2018 08:04 PM   
That makes no difference, problem still exists.
That makes no difference, problem still exists.

#13
Posted 01/18/2018 10:19 PM   
Scroll To Top

Add Reply