This problem made my computer unbootable. Because DKMS installs the driver to all kernels, none of my installed kernels would boot properly. I reinstalled Debian. Then started using sgfxi instead to install the drivers without DKMS, so I can boot another kernel to remove the drivers.
Oddly enough, Debian doesn’t come with persistent logs turned on, but I managed to enable them, and this is the report I get:
Jan 08 22:47:03 ronin kernel: divide error: 0000 [#1] PREEMPT SMP
Jan 08 22:47:03 ronin kernel: Modules linked in: btrfs zstd_compress zstd_decompress xxhash xor raid6_pq edac_mce_amd kvm_amd kvm eeepc_wmi asus_wmi sparse_keymap irqbypass rfkill
Jan 08 22:47:03 ronin kernel: crc32c_intel i2c_piix4 libata i2c_algo_bit dca ptp xhci_pci pps_core scsi_mod xhci_hcd rtc_cmos gpio_amdpt gpio_generic i2c_designware_platform i2c_d
Jan 08 22:47:03 ronin kernel: CPU: 2 PID: 347 Comm: systemd-udevd Not tainted 4.14.0-11.1-liquorix-amd64 #1 liquorix 4.14-14
Jan 08 22:47:03 ronin kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B350-F GAMING, BIOS 3401 12/04/2017
Jan 08 22:47:03 ronin kernel: task: ffff88020eb73500 task.stack: ffffc900014c8000
Jan 08 22:47:03 ronin kernel: RIP: 0010:nvGetClocks+0x176/0x260 [nvidiafb]
Jan 08 22:47:03 ronin kernel: RSP: 0018:ffffc900014cb7f8 EFLAGS: 00010246
Jan 08 22:47:03 ronin kernel: RAX: 0000000000000000 RBX: ffff8802152e2420 RCX: 0000000000000000
Jan 08 22:47:03 ronin kernel: RDX: 0000000000000000 RSI: ffffc900014cb834 RDI: ffff8802152e2420
Jan 08 22:47:03 ronin kernel: RBP: ffff8802152e2518 R08: ffffc900014cb838 R09: 0000000000000000
Jan 08 22:47:03 ronin kernel: R10: 0000000000000068 R11: 00000000002e18c8 R12: 0000000000062570
Jan 08 22:47:03 ronin kernel: R13: 000000000000000e R14: 0000000000000010 R15: 0000000000000008
Jan 08 22:47:03 ronin kernel: FS: 00007fe8ed56b400(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000
Jan 08 22:47:03 ronin kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 08 22:47:03 ronin kernel: CR2: 00007f546a15c870 CR3: 000000020df0a000 CR4: 00000000003406e0
Jan 08 22:47:03 ronin kernel: Call Trace:
Jan 08 22:47:03 ronin kernel: NVCalcStateExt+0x189/0x8e0 [nvidiafb]
Jan 08 22:47:03 ronin kernel: nvidiafb_set_par+0x47c/0x9f0 [nvidiafb]
Jan 08 22:47:03 ronin kernel: fbcon_init+0x59e/0x780
Jan 08 22:47:03 ronin kernel: visual_init+0xca/0x120
Jan 08 22:47:03 ronin kernel: do_bind_con_driver+0x2ab/0x640
Jan 08 22:47:03 ronin kernel: do_take_over_console+0x22d/0x470
Jan 08 22:47:03 ronin kernel: fbcon_event_notify+0x90d/0xa20
Jan 08 22:47:03 ronin kernel: blocking_notifier_call_chain+0x5d/0x80
Jan 08 22:47:03 ronin kernel: register_framebuffer+0x1d5/0x2f0
Jan 08 22:47:03 ronin kernel: nvidiafb_probe+0x6b2/0xa80 [nvidiafb]
Jan 08 22:47:03 ronin kernel: pci_device_probe+0x1e4/0x340
Jan 08 22:47:03 ronin kernel: driver_probe_device+0x3d4/0x4a0
Jan 08 22:47:03 ronin kernel: __driver_attach+0xd1/0xe0
Jan 08 22:47:03 ronin kernel: ? driver_probe_device+0x4a0/0x4a0
Jan 08 22:47:03 ronin kernel: bus_for_each_dev+0x57/0x80
Jan 08 22:47:03 ronin kernel: bus_add_driver+0x191/0x210
Jan 08 22:47:03 ronin kernel: driver_register+0x78/0xf0
Jan 08 22:47:03 ronin kernel: ? nvidiafb_setcolreg+0x2a0/0x2a0 [nvidiafb]
Jan 08 22:47:03 ronin kernel: do_one_initcall+0x46/0x190
Jan 08 22:47:03 ronin kernel: do_init_module+0x58/0x2f9
Jan 08 22:47:03 ronin kernel: load_module+0x1dfd/0x2760
Jan 08 22:47:03 ronin kernel: ? SyS_finit_module+0x91/0xb0
Jan 08 22:47:03 ronin kernel: SyS_finit_module+0x91/0xb0
Jan 08 22:47:03 ronin kernel: do_syscall_64+0x64/0x190
Jan 08 22:47:03 ronin kernel: entry_SYSCALL64_slow_path+0x25/0x25
Jan 08 22:47:03 ronin kernel: RIP: 0033:0x7fe8ece94da9
Jan 08 22:47:03 ronin kernel: RSP: 002b:00007ffe2837a368 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Jan 08 22:47:03 ronin kernel: RAX: ffffffffffffffda RBX: 000055d1362632f0 RCX: 00007fe8ece94da9
Jan 08 22:47:03 ronin kernel: RDX: 0000000000000000 RSI: 00007fe8ecb9f2d5 RDI: 0000000000000010
Jan 08 22:47:03 ronin kernel: RBP: 00007fe8ecb9f2d5 R08: 0000000000000000 R09: 0000000000000000
Jan 08 22:47:03 ronin kernel: R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
Jan 08 22:47:03 ronin kernel: R13: 000055d1362591a0 R14: 0000000000020000 R15: 000055d136249140
Jan 08 22:47:03 ronin kernel: Code: f0 0f 00 00 3d 00 03 00 00 74 73 3d 30 03 00 00 74 6c 41 8b 89 04 05 00 00 0f b6 c5 44 0f b6 c9 c1 e9 10 0f af c2 31 d2 83 e1 0f <41> f7 f1 d3 e
Jan 08 22:47:03 ronin kernel: RIP: nvGetClocks+0x176/0x260 [nvidiafb] RSP: ffffc900014cb7f8
You attached the same report twice, they’re from 15:01:47, crash occured at 15:06:29
General hints: did you check if you’re affected by the Ryzen bug?
[url]https://github.com/suaefar/ryzen-test[/url]
Did you check if the gpu is working in another system? Did an earlier driver version work?
I went through RMA for the segfault bug that script pertains to and I don’t get those anymore, although I do get the random occasional hard crashes that AMD seems to be incapable of fixing. I’ve tried various BIOS settings, and these crashes don’t leave anything of note in the system log. But this is definitely not related.
The GPU works. The driver also works if I start X just after installing without rebooting. Something happens during bootup that crashes the driver. One possibility I can think of is that using a CRT monitor has something to do with it. In the bootup log I get “nvidiafb: unable to detect display type”. By the backtraces it does look like the crash is related to nvidiafb.
I looks like the bug report script will not overwrite a bug report if one already exists, so I ended up copying the file again thinking it had been changed.
Ok, you just found the solution that I didn’t see: nvidiafb is no part of the official nvidia driver. It comes with the kernel and has to be blacklisted like nouveau.