Since I installed kernel 4.14.7 I’m getting this error on logs (asynchronous wait on fence NVIDIA:nvidia.prime:7d9 timed out) but more importantly the machine freezes unpredictably, even if not randomly. The freeze can happen, say, when using xrandr to manage a second monitor (using xrandr because I’m on kde and kscreen2 does not play nice with nvidia).
Quite often the freeze happens when moving mouse across screens (sometimes it recover, though).
I saw this happen also when booting, likely when kms stuff kicks in. Actually, removing nvidia-drm.modeset=1 and splash from kernel command line avoids the freezing.
It seems that running nvidia-config and moving screen around avoids further lockups, but it could be only a coincidence.
This was not happening with kernel 4.14.6 and earlier.
Not really sure about that, for a couple of reasons:
This error message (failed to load the firmware blob) is happening on my logs since kernel 4.14.3, and the faulty behaviour for the card started only with 4.14.7 (and goes on with 4.14.8)
It seems to refer to runtime power management. Admittedly not ok, but I wonder if this can cause the reported behaviour also.
Anyway, good catch, I will try to understand why the firmware is not loaded when the file is actually there.
Just for info, I’m running 4.14.7 on Haswell with Prime sync and don’t experience your issue. Looking at the commits between .6 and .7 (not that many) there’s nothing that would explain this. But without the firmware, you might have to set i915.enable_rc6=0 to work around problems. Though it’s strange why this is happening just now.
I tried to set i915.enable_rc6=0 but this does not seem to have helped.
Looking at kernel logs, I noticed an interesting kernel Oops, dunno if it is expected due to firmware not being loaded:
(only relevant lines):
I finally managed to load the firmware correctly, but I’m still getting kernel oopses (see below).
should then I assume that the issue that I’m experiencing is connected to x2apic setting and not to other nvidia stuff? This will be interesting as it was not happening before… still investigating :)
(log is in reverse order)
Jan 03 08:57:11 hobbes kernel: asynchronous wait on fence NVIDIA:nvidia.prime:742 timed out
Jan 03 08:57:00 hobbes kernel: Fixing recursive fault but reboot is needed!
Jan 03 08:57:00 hobbes kernel: ---[ end trace d4236cdf2abc96f2 ]---
Jan 03 08:57:00 hobbes kernel: RIP: task_work_run+0x7b/0xa0 RSP: ffffb73d004a3ea0
Jan 03 08:57:00 hobbes kernel: Code: 89 ed 75 e1 41 f6 44 24 4c 04 48 c7 c1 10 1b 9a bc 49 0f 44 ce eb ce 4c 89 ff e8 f1 44 b6 00 48 85 ed 74 15 49 8b 6d 00 4c 89 ef <41> ff 55 08 48 85 ed 49 89 ed 75 ed eb a2 5b 5d 41 5c 41 5d 41
Jan 03 08:57:00 hobbes kernel: rewind_stack_do_exit+0x17/0x20
Jan 03 08:57:00 hobbes kernel: ? kthread+0x117/0x130
Jan 03 08:57:00 hobbes kernel: ? wake_threads_waitq+0x30/0x30
Jan 03 08:57:00 hobbes kernel: do_exit+0x2e0/0xb10
Jan 03 08:57:00 hobbes kernel: Call Trace:
Jan 03 08:57:00 hobbes kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 03 08:57:00 hobbes kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 03 08:57:00 hobbes kernel: CR2: ffff9b9850c33080 CR3: 0000000bed20a003 CR4: 00000000003606e0
Jan 03 08:57:00 hobbes kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 03 08:57:00 hobbes kernel: FS: 0000000000000000(0000) GS:ffff9b989f5c0000(0000) knlGS:0000000000000000
Jan 03 08:57:00 hobbes kernel: R13: ffffffffbb0e5cf0 R14: 0000000000000000 R15: ffff9b98530168ec
Jan 03 08:57:00 hobbes kernel: R10: 0000000000000000 R11: 0000000000000040 R12: ffff9b9853016200
Jan 03 08:57:00 hobbes kernel: RBP: fff3a748e8df8948 R08: ffff9b98578ffb50 R09: ffff9b985433c200
Jan 03 08:57:00 hobbes kernel: RDX: 0000000000000000 RSI: 000000000000008b RDI: ffffffffbb0e5cf0
Jan 03 08:57:00 hobbes kernel: RAX: ffff9b985433c200 RBX: ffff9b98530168b8 RCX: 000000000000000b
Jan 03 08:57:00 hobbes kernel: RSP: 0018:ffffb73d004a3ea0 EFLAGS: 00010286
Jan 03 08:57:00 hobbes kernel: RIP: 0010:task_work_run+0x7b/0xa0
Jan 03 08:57:00 hobbes kernel: task: ffff9b9853016200 task.stack: ffffb73d004a0000
Jan 03 08:57:00 hobbes kernel: Hardware name: LENOVO 20EQS58500/20EQS58500, BIOS N1EET74W (1.47 ) 11/21/2017
Jan 03 08:57:00 hobbes kernel: CPU: 7 PID: 794 Comm: irq/139-nvidia Tainted: P D OE 4.14.10-cova #1
Jan 03 08:57:00 hobbes kernel: virtio_ring virtio fuse overlay linear raid0 dm_raid dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log hid_apple ohci_pci ohci_hcd uhci_hcd nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE)
Jan 03 08:57:00 hobbes kernel: Modules linked in: rfcomm rmi_smbus rmi_core bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev btusb btrtl btbcm joydev btintel bluetooth snd_hda_codec_realtek snd_hda_codec_generic wmi_bmof ecdh_gener>
Jan 03 08:57:00 hobbes kernel: general protection fault: 0000 [#2] PREEMPT SMP
Jan 03 08:57:00 hobbes kernel: genirq: exiting task "irq/139-nvidia" (794) is an active IRQ thread (irq 139)
Jan 03 08:57:00 hobbes kernel: ---[ end trace d4236cdf2abc96f1 ]---
Jan 03 08:57:00 hobbes kernel: CR2: ffff9b9850c33080
Jan 03 08:57:00 hobbes kernel: RIP: 0xffff9b9850c33080 RSP: ffffb73d004a3d40
Jan 03 08:57:00 hobbes kernel: Code: 00 00 00 d0 74 fc c0 ff ff ff ff c8 ac e1 44 98 9b ff ff 00 00 00 00 00 00 00 00 70 3f c3 50 98 9b ff ff f0 3b c3 50 98 9b ff ff <80> 30 c3 50 98 9b ff ff 80 30 c3 50 98 9b ff ff 30 67 f7 c0 ff
Jan 03 08:57:00 hobbes kernel: ? ret_from_fork+0x1f/0x30
Jan 03 08:57:00 hobbes kernel: ? do_group_exit+0x33/0xa0
Jan 03 08:57:00 hobbes kernel: ? kthread_create_on_node+0x70/0x70
Jan 03 08:57:00 hobbes kernel: ? kthread+0x117/0x130
Jan 03 08:57:00 hobbes kernel: ? wake_threads_waitq+0x30/0x30
Jan 03 08:57:00 hobbes kernel: ? irq_thread+0x151/0x1a0
Jan 03 08:57:00 hobbes kernel: ? irq_thread_fn+0x1b/0x50
Jan 03 08:57:00 hobbes kernel: ? nv_check_pci_config_space+0x3d8/0x710 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? irq_forced_thread_fn+0x60/0x60
Jan 03 08:57:00 hobbes kernel: ? rm_isr_bh+0x23/0x70 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? irq_thread_dtor+0x90/0x90
Jan 03 08:57:00 hobbes kernel: ? _nv001199rm+0x10e/0x150 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? _nv025510rm+0x71/0xa0 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? _nv006922rm+0x1a8/0x290 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? _nv010340rm+0x318/0x410 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? _nv010340rm+0x33b/0x410 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? _nv026080rm+0x8b/0xf0 [nvidia]
Jan 03 08:57:00 hobbes kernel: ? _nv029074rm+0x115/0x180 [nvidia]
Jan 03 08:57:00 hobbes kernel: Call Trace:
Jan 03 08:57:00 hobbes kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 03 08:57:00 hobbes kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 03 08:57:00 hobbes kernel: CR2: ffff9b9850c33080 CR3: 0000000bed20a003 CR4: 00000000003606e0
Jan 03 08:57:00 hobbes kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 03 08:57:00 hobbes kernel: FS: 0000000000000000(0000) GS:ffff9b989f5c0000(0000) knlGS:0000000000000000
Jan 03 08:57:00 hobbes kernel: R13: 0000000000000000 R14: ffff9b9854bc7008 R15: 0000000000000000
Jan 03 08:57:00 hobbes kernel: R10: 0000000002020008 R11: ffffffffc0403760 R12: ffff9b9856f22948
Jan 03 08:57:00 hobbes kernel: RBP: ffff9b9855a2add8 R08: 0000000000000000 R09: ffff9b9850c33080
Jan 03 08:57:00 hobbes kernel: RDX: 0000000000010004 RSI: 0000000000000000 RDI: 000561da8f9835e5
Jan 03 08:57:00 hobbes kernel: RAX: ffff9b9856f223b0 RBX: ffff9b9856f22948 RCX: 0000000000000000
Jan 03 08:57:00 hobbes kernel: RSP: 0018:ffffb73d004a3d40 EFLAGS: 00010246
Jan 03 08:57:00 hobbes kernel: RIP: 0010:0xffff9b9850c33080
Jan 03 08:57:00 hobbes kernel: task: ffff9b9853016200 task.stack: ffffb73d004a0000
Jan 03 08:57:00 hobbes kernel: Hardware name: LENOVO 20EQS58500/20EQS58500, BIOS N1EET74W (1.47 ) 11/21/2017
Jan 03 08:57:00 hobbes kernel: CPU: 7 PID: 794 Comm: irq/139-nvidia Tainted: P OE 4.14.10-cova #1
Jan 03 08:57:00 hobbes kernel: virtio_ring virtio fuse overlay linear raid0 dm_raid dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log hid_apple ohci_pci ohci_hcd uhci_hcd nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE)
Jan 03 08:57:00 hobbes kernel: Modules linked in: rfcomm rmi_smbus rmi_core bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev btusb btrtl btbcm joydev btintel bluetooth snd_hda_codec_realtek snd_hda_codec_generic wmi_bmof ecdh_gener>
Jan 03 08:57:00 hobbes kernel: Oops: 0011 [#1] PREEMPT SMP
Jan 03 08:57:00 hobbes kernel: PGD beda22067 P4D beda22067 PUD 105846c063 PMD 105291f063 PTE 8000001050c33163
Jan 03 08:57:00 hobbes kernel: IP: 0xffff9b9850c33080
Jan 03 08:57:00 hobbes kernel: BUG: unable to handle kernel paging request at ffff9b9850c33080
Jan 03 08:57:00 hobbes kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Jan 03 08:57:00 hobbes kscreen_backend_launcher[2855]: kscreen.xcb.helper: Geometry: 1920 0 1920 1200
I just made some other tests; now the firmware for intel card is loaded fine, but the issue is still present.
this happens with or without x2apic optout and with or without i915.enable_rc6=0 parameter.
The only way to avoid lockups when using xrandr to configure the external monitor is to remove nvidia-drm.modeset=1 from the kernel boot parameters.
Interesting enough, the issue is much more evident when the laptop is inserted into a docking station, and also in this case it could randomly happen that (say, 1 out of 10) xrandr works without screen lockup, so it seems quite likely that a timing issue is involved…
So, it seems that even older kernels (4.14.6) shows the same behaviour, you were right. I guess that I overlooked some other change.
To summarize, in my setup when nvidia-drm.modeset=1 is present, changing monitor layout with xrandr, like
xrandr --auto
and
xrandr --output eDP-1-1 --pos 0x70 --primary --output DP-1 --right-of eDP-1-1 --output DP-3.1 --right-of eDP-1-1
(I use both in a script)
causes freezes and lockups. kscreen2 is disabled. The issue is more evident with a external monitor attached to the docking station, less evident when the external monitor is connected to the laptop port.
Ok. Can you check if 4.14.5 works? .6 had a change regarding i915 vsync timestamps, which shouldn’t affect your system, but who knows.
Other than that, you have the vgem drm driver enabled, maybe that’s interfering. Are you running a custom kernel?
The difference in behaviour between docked/direct connection would point to a hardware issue but then it’s strange why this would only happen with Prime sync enabled.
Taken a look at your old logs from the kscreen2 flashing and the only noticeable change was the kernel, 4.14.4 at that time. Same xserver and nvidia driver.
I will check with 4.14.5, sure.
I’m using a custom kernel, so if you think that some config change could help to understand what’s going on, I can do that with no effort.
And yes, but the flashing issue was solved by stopping to kscreen2, then after some kernel changes the issue with xrandr stepped in
With 4.14.5 the behaviour is quite similar, but not completely the same. basically it freezes but it takes more “actions” (windows dragged across screens and so on). But basically it is the same. The difference may be that in 4.14.5 I was not loading the firmware.
On the same hardware (dual boot) windows manages the situation correctly.
Revisited your old thread and there you mentioned that the flashing was worse with the HP. So you switched to the Dell and the logs didn’t have the timeout errors. The Dell was connected over HDMI, the HP now is connected over DP. Maybe check if the Dell now works without the freezing and if that’s the case check why the HP doesn’t work (cables?).