Fedora 27, occasional crash in NVidia driver on boot.

Occasionally, during boot, I get the crash listed below from the NVidia driver (noting that I use akmod to rebuild kernel modules after any kernel update). I dual boot Fedora 27 as my primary operating sytem and Windows 10 for gaming only. The Windows 10 side of things performs flawlessly and I’ve never had a driver crash of any kind, so I’m pretty sure it’s not a hardware issue.
Note that I don’t overclock and run everything in a stock configuration. Once this issue occurs, I power down the system, wait a few seconds, power up again and everything works great. Note that this issue also happens on my two kid’s Fedora 27 boxes, one running a 480 GTX NVidia card and the other running a 770GTX NVidia card. The son’s system is running Intel’s first Q6600 quad core chip on an old Gigabyte motherboard and the daughter is running a Sandy Bridge core I7 on an Asus Maximus Formual III motherboard.

If I can provide any other relevant info, just let me know.

Thanks!

My System Information:

Graphics Card: NVIDIA Corporation GP104 [GeForce GTX 1070] (eVGA brand)
Processor: Intel Core I7-7820X 8 core CPU
Motherboard: Asus Tuf X299 Mark 2, bios version 0802
Memory: G.SKILL Ripjaws V Series 64GB (4 x 16GB) 288-Pin DDR4 SDRAM DDR4 2800 (running at 2133, stock FSB) F4-2800C14Q-64GVK

Kernel: Linux computer 4.14.13-300.fc27.x86_64 #1 SMP Thu Jan 11 04:00:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Error messages from journalctl boot log:

Jan 20 21:37:20 computer kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jan 20 21:37:20 computer kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 239
Jan 20 21:37:20 computer kernel: nvidia 0000:65:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Jan 20 21:37:20 computer kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 387.34 Tue Nov 21 03:09:00 PST 2017 (using threaded interrupts)
Jan 20 21:37:20 computer systemd-udevd[696]: Process ‘/usr/bin/bash -c ‘/usr/bin/mknod -Z -m 666 /dev/nvidiactl c 195 255’’ failed with exit code 1.
Jan 20 21:37:20 computer kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 387.34 Tue Nov 21 02:09:45 PST 2017
Jan 20 21:37:20 computer kernel: [drm] [nvidia-drm] [GPU ID 0x00006500] Loading driver
Jan 20 21:37:20 computer kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000378
Jan 20 21:37:20 computer kernel: IP: rm_get_device_name+0x9a/0x1a0 [nvidia]
Jan 20 21:37:20 computer kernel: PGD 0 P4D 0
Jan 20 21:37:20 computer kernel: Oops: 0000 [#1] SMP PTI
Jan 20 21:37:20 computer kernel: Modules linked in: x86_pkg_temp_thermal intel_powerclamp nvidia_drm(POE+) nvidia_modeset(POE) coretemp nvidia(POE) kvm_intel kv
Jan 20 21:37:20 computer kernel: CPU: 6 PID: 805 Comm: cat Tainted: P OE 4.14.13-300.fc27.x86_64 #1
Jan 20 21:37:20 computer kernel: Hardware name: System manufacturer System Product Name/TUF X299 MARK 2, BIOS 0802 09/06/2017
Jan 20 21:37:20 computer kernel: task: ffff99291155cc00 task.stack: ffffbb82473f4000
Jan 20 21:37:20 computer kernel: RIP: 0010:rm_get_device_name+0x9a/0x1a0 [nvidia]
Jan 20 21:37:20 computer kernel: RSP: 0018:ffffbb82473f7c18 EFLAGS: 00010282
Jan 20 21:37:20 computer kernel: RAX: 0000000000000000 RBX: ffff992902144008 RCX: ffffbb82473f7c44
Jan 20 21:37:20 computer kernel: RDX: 000000000000002c RSI: 0000000000000000 RDI: ffff992902144008
Jan 20 21:37:20 computer kernel: RBP: ffff9929043b3000 R08: 0000000000006173 R09: 0000000000000028
Jan 20 21:37:20 computer kernel: R10: ffffbb82473f7d30 R11: 0000000000025140 R12: 0000000000003842
Jan 20 21:37:20 computer kernel: R13: 0000000000006173 R14: ffff9929043b0000 R15: ffffbb82473f7d88
Jan 20 21:37:20 computer kernel: FS: 00007f551899a500(0000) GS:ffff99291e980000(0000) knlGS:0000000000000000
Jan 20 21:37:20 computer kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 20 21:37:20 computer kernel: CR2: 0000000000000378 CR3: 0000001050bc2006 CR4: 00000000003606e0
Jan 20 21:37:20 computer kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 20 21:37:20 computer kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 20 21:37:20 computer kernel: Call Trace:
Jan 20 21:37:20 computer kernel: ? kmem_cache_alloc+0x101/0x1c0
Jan 20 21:37:20 computer kernel: ? nv_procfs_close_registry+0x141/0x1b0 [nvidia]
Jan 20 21:37:20 computer kernel: ? nv_procfs_read_gpu_info+0x2f4/0x350 [nvidia]
Jan 20 21:37:20 computer kernel: ? __kmalloc_node+0x223/0x2e0
Jan 20 21:37:20 computer kernel: ? kvmalloc_node+0x75/0x80
Jan 20 21:37:20 computer kernel: ? seq_read+0xc9/0x3f0
Jan 20 21:37:20 computer kernel: ? lru_cache_add_active_or_unevictable+0x4c/0xf0
Jan 20 21:37:20 computer kernel: ? proc_reg_read+0x42/0x70
Jan 20 21:37:20 computer kernel: ? __vfs_read+0x37/0x160
Jan 20 21:37:20 computer kernel: ? security_file_permission+0x9b/0xc0
Jan 20 21:37:20 computer kernel: ? vfs_read+0x99/0x150
Jan 20 21:37:20 computer kernel: ? SyS_read+0x55/0xc0
Jan 20 21:37:20 computer kernel: ? do_syscall_64+0x67/0x180
Jan 20 21:37:20 computer kernel: ? entry_SYSCALL64_slow_path+0x25/0x25
Jan 20 21:37:20 computer kernel: Code: 18 f8 ff 48 85 db 74 3f 0f b7 93 6a 0c 00 00 48 8b 83 30 1e 00 00 48 8d 4c 24 2c 48 89 df 48 89 c6 66 89 54 24 1e ba 2c 0
Jan 20 21:37:20 computer kernel: RIP: rm_get_device_name+0x9a/0x1a0 [nvidia] RSP: ffffbb82473f7c18
Jan 20 21:37:20 computer kernel: CR2: 0000000000000378
Jan 20 21:37:20 computer kernel: —[ end trace 88dc2d22137ae2cc ]—
Jan 20 21:37:20 computer kernel: EDAC skx: ECC is disabled on imc 0
Jan 20 21:37:20 computer kernel: intel_rapl: Found RAPL domain package
Jan 20 21:37:20 computer kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:64/0000:64:00.0/0000:65:00.1/sound/card1/input15
Jan 20 21:37:20 computer kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:64/0000:64:00.0/0000:65:00.1/sound/card1/input16
Jan 20 21:37:20 computer kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:64/0000:64:00.0/0000:65:00.1/sound/card1/input17
Jan 20 21:37:20 computer kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:64/0000:64:00.0/0000:65:00.1/sound/card1/input18
Jan 20 21:37:20 computer kernel: resource sanity check: requesting [mem 0x000a0000-0x000fffff], which spans more than PCI Bus 0000:64 [mem 0x000a0000-0x000bffff
Jan 20 21:37:20 computer kernel: caller os_map_kernel_space.part.4+0xac/0xe0 [nvidia] mapping multiple BARs
Jan 20 21:37:21 computer kernel: NVRM: Your system is not currently configured to drive a VGA console
on the primary VGA device. The NVIDIA Linux graphics driver
requires the use of a text-mode VGA console. Use of other console
drivers including, but not limited to, vesafb, may result in
corruption and stability problems, and is not supported.
Jan 20 21:37:21 computer kernel: nvidia-modeset: Allocated GPU:0 (GPU-33998961-dfd8-8eaf-68d0-1b1c3775a41c) @ PCI:0000:65:00.0
Jan 20 21:37:21 computer kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Jan 20 21:37:21 computer kernel: [drm] No driver support for vblank timestamp query.
Jan 20 21:37:21 computer kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:65:00.0 on minor 0
Jan 20 21:37:21 computer systemd[1]: Started udev Wait for Complete Device Initialization.
Jan 20 21:37:21 computer audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-udev-settle comm=
Jan 20 21:37:21 computer systemd[1]: Started LVM2 metadata daemon.
Jan 20 21:37:21 computer audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=lvm2-lvmetad comm="system
Jan 20 21:37:21 computer systemd[1]: Starting LVM2 PV scan on device 8:19…
Jan 20 21:37:21 computer lvm[656]: WARNING: lvmetad is being updated by another command (pid 827).
Jan 20 21:37:21 computer lvm[656]: WARNING: Not using lvmetad because cache update failed.
lines 1288-1346

We reproduced this by running “while true; cat /proc/driver/nvidia/gpus/0000:02:00.0/information; done” in one SSH session and then running nvidia-persistenced in another and tracking under private bug 2050360

We have fixed this issue and fix will be available in next driver release.