Hitting kernel NULL pointer dereference in nvidia_modeset on 4.10.11-1-ARCH with Quadro P2000

Setup:

  • Brand-new Quadro P2000, 2x Dell UP2414q attached via DisplayPort.
  • ArchLinux, Kernel 4.10.11, nvidia driver version 378.13, XOrg version 1.19.3. Can also be reproduced with Kernel 4.9.24.
  • System is using the EFI Oprom of the card and has CSM disabled, otherwise driver initialization fails
  • DRM has been disabled with nvidia-drm.modeset=0, if this is enabled manually the machine immediately crashes

Both screens are in MST mode(first generation of 4k screens), I was hoping to use TwinView to make the four individual panels appear as two screens to applications, similar to how it works on Windows. My device section in xorg.conf looks like this:

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    #Option         "nvidiaXineramaInfo" "True"
    Option         "Stereo" "0"
    #Option         "nvidiaXineramaInfoOrder" "DFP-6.9, DFP-6.8, DFP-4.9, DFP-4.8"
    #Option	   "TwinViewXineramaInfoOverride" "3840x2160+0+0, 3840x2160+3840+0"
    Option         "metamodes" "DP-6.8: 1920x2160 +1920+0, DP-6.9: 1920x2160 +0+0, DP-4.8: nvidia-auto-select +5760+0, DP-4.9: nvidia-auto-select +3840+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

When I uncomment the three xinerama lines, nvidia_modeset produces the following backtrace, crashing the machine (captured via serial port):

[   34.401002] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   34.408826] IP:           (null)
[   34.412050] PGD 0 
[   34.412051] 
[   34.415552] Oops: 0010 [#1] PREEMPT SMP
[   34.419382] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc 8021q mrp nct6775 hwmon_vid ipmi_ssif lm92 joydev input_leds mousedev led_class nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device dm_thin_pool dm_persistent_data drm_kms_helper intel_rapl dm_bio_prison dm_bufio sb_edac edac_core drm x86_pkg_temp_thermal nls_iso8859_1 intel_powerclamp nls_cp437 igb coretemp vfat snd_hda_codec_realtek syscopyarea ptp sysfillrect fat iTCO_wdt pps_core dm_mod snd_hda_codec_generic iTCO_vendor_support sysimgblt mxm_wmi
[   34.490074]  snd_hda_codec_hdmi kvm_intel fb_sys_fops i2c_algo_bit kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel pcbc snd_hda_codec snd_hda_core snd_hwdep aesni_intel aes_x86_64 snd_pcm crypto_simd glue_helper cryptd snd_timer evdev hid_generic intel_cstate snd ioatdma mac_hid intel_rapl_perf pcspkr i2c_i801 lpc_ich soundcore dca shpchp fjes ipmi_si wmi acpi_power_meter usbhid uas tpm_tis tpm_tis_core tpm usb_storage button sch_fq_codel uhid hid ipmi_devintf ipmi_msghandler ip_tables x_tables xfs libcrc32c crc32c_generic sd_mod ahci crc32c_intel xhci_pci libahci xhci_hcd libata usbcore scsi_mod usb_common serio
[   34.546896] CPU: 7 PID: 1165 Comm: kworker/7:3 Tainted: P           O    4.10.11-1-ARCH #1
[   34.555144] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0 01/28/2016
[   34.562210] Workqueue: events nvkms_workqueue_callback [nvidia_modeset]
[   34.568821] task: ffff8810349a0000 task.stack: ffffc9000a664000
[   34.574731] RIP: 0010:          (null)
[   34.578474] RSP: 0018:ffffc9000a667c00 EFLAGS: 00010282
[   34.583691] RAX: ffffffffa0ad8850 RBX: ffff8810347cddb0 RCX: 0000000000000001
[   34.590815] RDX: ffff88101e6c56b0 RSI: ffff88101e7595e0 RDI: ffff8810347cddb0
[   34.597939] RBP: ffff88101e6c56b0 R08: ffff88101e7595ec R09: 0000000000000003
[   34.605063] R10: ffffea004079d600 R11: ffffffffa0a76d60 R12: ffff88101e7595e0
[   34.612189] R13: 0000000000000010 R14: ffff88101e6c5408 R15: 0000000000000010
[   34.619313] FS:  0000000000000000(0000) GS:ffff88103f1c0000(0000) knlGS:0000000000000000
[   34.627389] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.633125] CR2: 0000000000000000 CR3: 0000000002a09000 CR4: 00000000001406e0
[   34.640249] Call Trace:
[   34.642720]  ? _nv000828kms+0x93/0x100 [nvidia_modeset]
[   34.647961]  ? _nv001007kms+0x179/0x180 [nvidia_modeset]
[   34.653288]  ? _nv000733kms+0xe3/0xa80 [nvidia_modeset]
[   34.658512]  ? update_curr+0xe5/0x190
[   34.662178]  ? nvkms_free+0x1e/0x20 [nvidia_modeset]
[   34.667135]  ? dequeue_entity+0x23d/0xe30
[   34.671161]  ? _nv000964kms+0xd1/0x130 [nvidia_modeset]
[   34.676384]  ? put_prev_entity+0x33/0xa10
[   34.680386]  ? pick_next_task_fair+0x138/0x4c0
[   34.684823]  ? __switch_to+0x272/0x490
[   34.688568]  ? finish_task_switch+0x78/0x200
[   34.692843]  ? _nv001958kms+0x48/0x60 [nvidia_modeset]
[   34.697988]  ? nvkms_workqueue_callback+0xac/0xe0 [nvidia_modeset]
[   34.704159]  ? process_one_work+0x1e5/0x470
[   34.708337]  ? worker_thread+0x48/0x4e0
[   34.712168]  ? kthread+0x101/0x140
[   34.715565]  ? process_one_work+0x470/0x470
[   34.719742]  ? kthread_create_on_node+0x60/0x60
[   34.724266]  ? ret_from_fork+0x2c/0x40
[   34.728009] Code:  Bad RIP value.
[   34.731328] RIP:           (null) RSP: ffffc9000a667c00
[   34.736545] CR2: 0000000000000000
[   34.739856] ---[ end trace 5b033b3bafcf4c5b ]---

nvidia-bug-report output cannot be obtained after the kernel panic, so it was produced with the lines commented as seen above.

nvidia-bug-report.log.gz (126 KB)

Hi, maxf. Thanks for reporting this. Does it still reproduce with 381.09?

Aah, I didn’t realize that was released already. Yes, 381.09 fixes the issue and the last LTS 375.39 is also fine. Marking this as fixed. Thank you!

Great, thanks for confirming!