Xorg closes after Xid 16 on 387.34

Hello,

Every 20 minutes Xorg closes and

NVRM: GPU at PCI:0000:01:00: GPU-53fd238d-7504-727d-9231-76e00e8c29f0
NVRM: GPU Board Serial Number:
NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 0000002d
NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 0000002e
NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 0000002f
NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 00000030
NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 00000031
NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 00000032
NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 00000033

appears in dmesg.
I use two cards one 1080 and one 670.
The 1080 is in normal use on Linux and the 670 is used in PCI-Passthrough to VM’s for windows gaming.

The messages in the beginning of the dmesg are because of this (around 3 sec in).

[    3.349358] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    3.349358] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    3.349358] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.

I used this setup for the past 14 Months or so and it worked flawlessly until 1-2 weeks ago when these crashes started appearing.

I don’t know which version last worked because I update the entire system at once (Arch Linux).

I also reinstalled the OS just to figure out if something different caused this and I don’t have
this problem with the nouveau drivers.

Arch Linux
GNOME 3.26.2 happens also in MATE
Linux 4.9.77-1-lts happens also in 4.14.13-1-ARCH
Intel® Core™ i5-7600K

Thanks for any help.
nvidia-bug-report.log.gz (142 KB)
nvidia-bug-report.log.old.gz (144 KB)

Your setup is a bit odd, you still have the intel gpu active and connected to a monitor. Is this for a reason? Otherwise, disable it in bios and remove the display cable, test again.

Yes the intel GPU is active because in order to pass through the 1080 if I need a stronger card in my
VM the 1080 cannot be bound by BIOS. This is why i set the intel GPU as the primary in my BIOS.

I removed the display cable just to make sure but it still crashes after a certain time.

But Xorg doesn’t crash if both nvidia cards are bound by the nvidia driver but this isn’t really an option.

I uploaded an other bug report this time with the display cable removed from the intel.

But thanks for the help.
nvidia-bug-report.log.gz (135 KB)

Did you try to only bind one gpu to the driver using kernel parameter
nvidia.NVreg_AssignGpus=0000:01:00.0

Sry for not replying for so long.

In the meantime I tried using your suggestion of using the kernel parameter but it didn’t changed anything. I also tried the new 390 drivers but that also didn’t help.

Today I installed the Ubuntu based POP os to see if it still happens there and as soon as I got the nvidia driver to only bind to one card (it doesn’t matter if it is the 1080 or the 670) it crashed just as it did on Arch Linux.
Note I haven’t even started to configure the VM.

I uploaded the bug report from a crash in which I did only two things start Teamspeak and start Firefox. The crash happens at [304.456429]. The Driver version on POP os is 384.111

Again thanks for any help.
nvidia-bug-report.log.gz (147 KB)