Booting any driver newer than 364.19 on my 980m powered Clevo P650-RG results in nothing but a cursor in the top left of the screen when nvidia-modeset allocates the GPU.
Driver version 364.19 works fine, but it does not compile under newer kernels. Ubuntu 16.10 is now using 4.8, so this is a bit of a problem for me.
The following is seen in the kernel log:
Sep 25 13:57:32 sager kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 247
Sep 25 13:57:32 sager kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 370.28 Thu Sep 1 19:45:04 PDT 2016
Sep 25 13:57:32 sager kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 370.28 Thu Sep 1 19:18:48 PDT 2016
Sep 25 13:57:32 sager kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Sep 25 13:57:36 sager kernel: nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245
Sep 25 13:58:03 sager kernel: nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Sep 25 13:58:03 sager kernel: NVRM: GPU at PCI:0000:01:00: GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d
Sep 25 13:58:03 sager kernel: NVRM: Xid (PCI:0000:01:00): 61, 13ee(3360) 00000000 00000000
Sep 25 13:58:03 sager nvidia-persistenced[7440]: Verbose syslog connection opened
Sep 25 13:58:03 sager nvidia-persistenced[7440]: Now running with user ID 123 and group ID 131
Sep 25 13:58:03 sager nvidia-persistenced[7440]: Started (7440)
Sep 25 13:58:03 sager nvidia-persistenced[7440]: device 0000:01:00.0 - registered
Sep 25 13:58:03 sager nvidia-persistenced[7440]: Local RPC service initialized
Sep 25 13:58:06 sager kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
Sep 25 13:58:08 sager kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040
I tested with the new 367.57 drivers, but the problem persists:
Oct 12 08:01:22 sager kernel: [ 1.549905] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
Oct 12 08:01:22 sager kernel: [ 1.574449] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.57 Mon Oct 3 20:32:57 PDT 2016
Oct 12 08:01:22 sager kernel: [ 1.577443] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Oct 12 08:01:22 sager kernel: [ 41.810294] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 244
Oct 12 08:01:29 sager kernel: [ 67.048166] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Oct 12 08:01:34 sager nvidia-persistenced: Verbose syslog connection opened
Oct 12 08:01:34 sager nvidia-persistenced: Now running with user ID 123 and group ID 131
Oct 12 08:01:34 sager nvidia-persistenced: Started (9445)
Oct 12 08:01:34 sager nvidia-persistenced: device 0000:01:00.0 - registered
Oct 12 08:01:34 sager nvidia-persistenced: Local RPC service initialized
Oct 12 08:01:36 sager kernel: [ 74.131250] nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
Oct 12 08:01:38 sager kernel: [ 76.902229] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040
I reverted to 364.19, which continues to work fine on the same system.
Is this identified as an issue in the internal bug tracker? I’ve spent a lot of time testing each new version (multiple times each) for the last 5 months and posting results in these threads, yet I’ve not seen any feedback acknowledging this issue or asking for any specific additional information.
The only difference I can see is that when persistence mode was disabled I would see a single underline cursor in the top right after it loaded. With persistence mode enabled, the screen was completely black after it loaded.
Did you ever try disconnecting the external monitor and boot? This looks strange: Virtual screen size determined to be 6400 x 2160
Maybe generate a nvidia-bug-report.sh while the working driver is installed to have a base to compare?
In 378.09, I am seeing
nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
ERROR: GPU:0: Idling display engine timed out
after resuming from hibernate.
X server freezes and I ssh to kill it and reboot.
I upgraded to a core i5 skylake CPU and 8GB ram. (next step in the future would be a corei7 kabylake)
The random " ERROR: GPU:0: Idling display engine timed out" disappeared.
Incidentally a “hotplug” warning on hibernate disappeared as well. I’m not sure they are related but even with heavy ram usage, my system is very robust right now.
My laptop is an i7-6700HQ with 32GB of RAM… so not really in need of an upgrade.
This morning I tested with 378.13 and the 4.4 and 4.10 kernels, same black screen with cursor issue as always. Backleveling to 364.19 or earlier fixed the problem, as usual.
Ok, let me sum things up I’m getting from your description and your logs:
-You have a Clevo 650 Optimus laptop with 980m dGPU
-The iGPU is disabled in bios, so no Optimus
-You have one internal HiDPI display and one external HiDPI display connected
-You don’t have an xorg.conf, so xserver runs autoconfigured (xorg.conf not in logs)
-Xserver detects the right resolution 6400 x 2160 but driver bails out on setting modes
-From logs: xrandr says you have a 1280x800 internal display connected<-wrong
Is this correct?
Two things to try:
-Generate a xorg.conf, generate some modelines for some standard resolution and force them in xorg.conf
-Activate hybrid mode in bios and try to setup prime
-You have one internal HiDPI display and one external HiDPI display connected
The external display isn’t HiDPI - it’s a double-wide 2560x1080 screen. I get the same results with this screen disconnected entirely.
From logs: xrandr says you have a 1280x800 internal display connected<-wrong
You are correct in the statement that this is wrong. There is no 1280x800 screen. The built in 3840x2160 gsync screen actually doesn’t even show any other hardware resolutions under Linux, the only way to change the resolution is to override it in the nVidia driver (which offers a “1920x1080 (software)” option) and does not work very well at all in my experience - sometimes it works, sometimes the Unity launcher freaks out and disappears.
Things to try:
I’ve tried to stay away from overriding xorg.conf like the old days, because I disconnect/connect this system to different monitors every single day. Also, seven or eight months back, I tried to find out the exact make of the internal display and turned up zero as to any specs, supported modes, or even the manufacturer.
Prime works fine in 364.19 and most drivers before (there are a few previous to that version that didn’t work either, then even older ones that do work), so I’ve been keeping that enabled since I use this laptop on battery 3 hours a day.
Sorry, you’re getting something very wrong here. According to your last logs, you’re not using prime. Your iGPU is disabled. Furthermore, you can’t use prime without an xorg.conf.
You can use prime-select as often as you want to switch to intel or nvidia, as long as the iGPU is disabled in BIOS it does not have any effect. You can have a look at /var/log/gpu-manager.log, it will probably tell you that there is nothing to switch to.
Edit: When using the Ubuntu nvidia-prime infrastructure, it will generate a xorg.conf for you but only if the iGPU is enabled.
I did forget that I had disabled the iGPU some time back as a simpler workaround to the nvidia-367 bug where it would pick the intel driver instead of the modesetting driver when using prime. Since I constantly had to revert to 367, it seemed like an expedient way to make it work and take the iGPU and Prime out of the equation when troubleshooting.
Now, I wouldn’t expect that re-enabling the iGPU would help with a problem that seems specific to nVidia driver versions post 367… but I’ll be damned it the 378 driver does not now work! I know I had tested previous versions with the iGPU re-enabled, but probably not the last two or three versions.