390.12 - External monitor doesn't work, stuttering, ghost cursors

System: Sager NP8658S (same as Clevo P650-RG) Optimus w/ 980m & 4k panel
OS: Ubuntu 17.10, using SDDM/Plasma (KDE)

Symptoms:

Using 384.90 drivers, BIOS in MSHYBRID mode and prime-selected to nvidia drivers, the system will come up and work for the most part. But, there appears to be some minor cursor ghosting and stuttering that comes and goes during use.

Additionally, when I plug in the external HDMI 2560x1080 (ASUS MX299Q) display both screens will go black and cannot be recovered (attaching nvidia-bug-report showing state after this attachment). Detaching the external monitor does not make internal panel usable again. The system is not hung hard at this point (I ssh’d in to get the bug report) but the graphics driver is in a bad state such that you can’t restart the desktop manager. Rebooting it via ssh at this point also takes a long time as something has to time-out.

The external monitor works fine and there are no stuttering/ghosting issues if I go back to the 364.19 drivers (which requires backleveling my kernel and xorg) and switch the BIOS to DISCRETE mode, so there does not appear to be any hardware problem with the monitor or laptop.
nvidia-bug-report.log.gz (337 KB)

Did you check if activating PRIME sync with kernel parameter
nvidia-drm.modeset=1
helps at least with the ghosting?

helps at least with the ghosting?

I think it might have. I just tested tonight with the 387.12 drivers, had that kernel parameter, and did not notice ghosting. But, I didn’t really test very long as the system is useless to me without the external monitor, and that still isn’t quite working.

With 387.12 I noticed it does seem to detect the external monitor (when I read the Xorg.0.log) and its resolution, but the monitor never gets enabled (stays black screen). The internal monitor looks like it’s slowly (the screen takes a minute or two to set up) trying to do the right thing and sets up that panel, but compositing gets disabled (I can tell because the conky window on that monitor is not transparent).

At the end, the external monitor’s still black and the right monitor is slow to respond because plasma and Xorg are eating CPU:

Oct  3 21:14:36 sager nvidia-persistenced: Verbose syslog connection opened
Oct  3 21:14:36 sager nvidia-persistenced: Now running with user ID 123 and group ID 131
Oct  3 21:14:36 sager nvidia-persistenced: Started (6017)
Oct  3 21:14:37 sager nvidia-persistenced: device 0000:01:00.0 - registered
Oct  3 21:14:37 sager nvidia-persistenced: Local RPC service initialized
Oct  3 21:14:37 sager kernel: [   35.300764] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Oct  3 21:14:52 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"
Oct  3 21:14:52 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"
Oct  3 21:17:10 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"
Oct  3 21:17:15 sager kernel: [  194.178710] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Oct  3 21:17:17 sager kernel: [  196.185225] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0
Oct  3 21:17:19 sager kernel: [  198.185212] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0
Oct  3 21:17:20 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"
Oct  3 21:17:20 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"
Oct  3 21:20:26 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"
Oct  3 21:20:26 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"
Oct  3 21:20:26 sager org.kde.KScreen[11586]: kscreen.xcb.helper: #011Rotation:  "invalid value (0)"

Attaching a new bug report for 387.12 below. This was taken after connecting the external monitor.
nvidia-bug-report.log.gz (298 KB)

parse-edid < Documents/edid.bin
Checksum Correct

Section "Monitor"
        Identifier " 
                     @"
        ModelName " 
                    @"
        VendorName "SDC"
        # Monitor Manufactured week 0 of 2014
        # EDID version 1.3
        # Digital Display
        DisplaySize 340 190
        Gamma 2.20
        Option "DPMS" "true"
        Modeline        "Mode 0" 526.91 3840 3888 3920 3956 2160 2162 2167 2220 -hsync -vsync 
EndSection

Still not working with 387.22 either (tested with discrete mode):

Oct 31 12:13:16 sager kernel: [   50.231900] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Oct 31 12:13:23 sager kernel: [   57.347399] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Oct 31 12:13:25 sager kernel: [   59.349408] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0

discrete_nvidia-bug-report.log.gz (237 KB)

Same issue with 387.34. Returning to 364 works fine.

Dec  1 14:05:16 sager kernel: [   37.230874] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Dec  1 14:05:23 sager kernel: [   44.332126] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Dec  1 14:05:26 sager kernel: [   47.101145] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0

BTW, this was still in discrete mode. Testing with newer kernel looks much the same:

Jan  3 09:36:14 sager kernel: [   38.281641] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Jan  3 09:36:14 sager kernel: [   38.359479] NVRM: GPU at PCI:0000:01:00: GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d
Jan  3 09:36:14 sager kernel: [   38.359482] NVRM: Xid (PCI:0000:01:00): 61, 15d1(1968) 00000000 00000000
Jan  3 09:36:21 sager kernel: [   45.370303] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Jan  3 09:36:24 sager kernel: [   48.157125] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0

390.12 is showing the same issue:

Jan 12 13:07:26 sager kernel: [    1.561612] nvidia: loading out-of-tree module taints kernel.
Jan 12 13:07:26 sager kernel: [    1.562949] nvidia: module license 'NVIDIA' taints kernel.
Jan 12 13:07:26 sager kernel: [    1.568962] nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jan 12 13:07:26 sager kernel: [    1.576671] nvidia-nvlink: Nvlink Core is being initialized, major device number 244
Jan 12 13:07:26 sager kernel: [    1.578086] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Jan 12 13:07:26 sager kernel: [    1.579361] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.12  Wed Dec 20 07:19:16 PST 2017 (using threaded interrupts)
Jan 12 13:07:26 sager kernel: [    1.629891] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.12  Wed Dec 20 06:13:53 PST 2017
Jan 12 13:07:26 sager kernel: [    1.631566] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jan 12 13:07:26 sager kernel: [    1.632578] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
Jan 12 13:07:26 sager kernel: [    3.212176] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 242
Jan 12 13:07:26 sager kernel: [   16.658708] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input19
Jan 12 13:07:26 sager kernel: [   16.658757] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input20
Jan 12 13:07:26 sager kernel: [   16.658801] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
Jan 12 13:07:26 sager kernel: [   16.658846] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22
Jan 12 13:07:26 sager systemd[1]: Starting NVIDIA Persistence Daemon...
Jan 12 13:07:26 sager systemd[1]: Started NVIDIA Persistence Daemon.
Jan 12 13:07:26 sager nvidia-persistenced: Verbose syslog connection opened
Jan 12 13:07:26 sager nvidia-persistenced: Now running with user ID 103 and group ID 105
Jan 12 13:07:26 sager nvidia-persistenced: Started (6051)
Jan 12 13:07:27 sager nvidia-persistenced: device 0000:01:00.0 - registered
Jan 12 13:07:27 sager nvidia-persistenced: Local RPC service initialized
Jan 12 13:07:27 sager kernel: [   34.342871] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Jan 12 13:07:27 sager kernel: [   34.398397] NVRM: GPU at PCI:0000:01:00: GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d
Jan 12 13:07:27 sager kernel: [   34.398412] NVRM: Xid (PCI:0000:01:00): 61, 15c7(18a8) 00000000 00000000
Jan 12 13:07:34 sager kernel: [   41.419527] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Jan 12 13:07:37 sager kernel: [   44.189430] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0

Revisiting…
Just noticed the modesetting parameter never stuck due to ubuntu renaming the driver. If you’re still trying, please update to driver 390, xorg 1.19, kernel 4.5 or higher and create a file 99-nvidia-modesetting.conf in /etc/modprobe.d containing

options nvidia_390_drm modeset=1

then run

sudo update-initramfs -u

and reboot.
Create a new nvidia-bug-report afterwards.
Edit: kernel not higher than 4.14, 4.15 needs extra patches.

I tested your suggestion to set the drm option with the 390.42 drivers (using the Ubuntu nvidia-drivers-390 package on the latest Bionic with newest Xorg and 4.15 kernel), but am still seeing the same problems as I do with all post-364 drivers.

Discrete mode:

  • Hangs with cursor in the top left on startup. Logs look like previous versions.

MsHybrid mode, with prime-select nvidia:

  • “Almost works”, but is unusable. It will boot into the desktop using nvidia drivers and run 3D apps like glxgears, but the entire desktop (except for the cursor) experiences 4 second pauses randomly. Sometimes several times in the same minute.
  • Some apps like VLC just eat 100% of the CPU and freeze the desktop.
  • Sleeping the system hangs it with a black screen on restart.
  • Attaching external monitor makes system hang for a couple of minutes, then it reactivates laptop screen with panel half off the screen as if it tried to reposition the display halfway between the two monitors. The external monitor never activates. The system runs super slow at that point.

All problems go away after going back to 364 again (and backleveling kernel + Xorg). I can play hundreds of 3D games for hours on end with the 364.19 drivers, so I am still confident there’s no hardware issue.

I’m attaching a bug report showing what I see after attaching the external monitor.

after_external_plugin_nvidia-bug-report.log.gz (134 KB)

Like I said a bit late with an edit, not kernel 4.15, that one needs an extra patch for the driver. Maximum kernel for modeset=1 is 4.14.

That’s a little confounding to test for me, because I have LUKS encryption and a boot partition that’s not easily re-sized. And I’m skittish about removing the latest kernel and any relying packages. But, I can try to clone the system and try it this weekend.

In other news, still seeing the issue with 390.48:

Mar 29 15:49:53 sager kernel: [   36.014472] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Mar 29 15:50:00 sager kernel: [   43.142994] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Mar 29 15:50:02 sager kernel: [   45.963607] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0

390.48 + kernel 4.15 + modeset=1 now works, you can check if the parameter is set with

sudo cat /sys/module/nvidia_drm/parameters/modeset

should return ‘Y’

I tested that version with that kernel and the modeset you suggested, but it did not make any noticeable difference. I did not cat that module to verify it was using it however, so I will retest that tomorrow evening (I just finished reverting everything to 364 so I could play some games tonight) and post the results.

I’ll gather some more nvidia-bug-reports for both BIOS in discrete mode and mshybrid mode when I do that as well.

Why not setting up a current Ubuntu on a thumbdrive for testing purposes?