Crash with kernel 4.5 and 4.6

Can’t run latest Nvidia drivers with kernels 4.5 or 4.6 (Fedora).
Get a memory dump even with a vanilla kernel.

Ssh with a remote machine give this dump:

744.657781] nvidia: module license ‘NVIDIA’ taints kernel.
[ 744.657786] Disabling lock debugging due to kernel taint
[ 744.661716] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 744.667867] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
[ 744.667960] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[ 744.667976] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 364.12 Wed Mar 16 21:11:26 PDT 2016
[ 749.419241] page:ffffea0007dff000 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
[ 749.419242] flags: 0x2ffffc00004000(head)
[ 749.419242] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
[ 749.419264] ------------[ cut here ]------------
[ 749.419265] kernel BUG at include/linux/page-flags.h:272!
[ 749.419266] invalid opcode: 0000 [#1] SMP
[ 749.419278] Modules linked in: nvidia(POE) ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_filter ebtable_broute bridge stp llc ebtable_nat ebtables ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_filter ip6_tables iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_mangle rc_dib0700_rc5 intel_rapl x86_pkg_temp_thermal coretemp kvm_intel snd_hda_codec_hdmi kvm arc4 ath9k ath9k_common ath9k_hw snd_hda_codec_realtek vfat snd_hda_codec_generic fat irqbypass crct10dif_pclmul snd_hda_intel mac80211 iTCO_wdt snd_hda_codec iTCO_vendor_support crc32_pclmul snd_hda_core snd_hwdep snd_seq dib7000p ghash_clmulni_intel ath3k snd_seq_device dvb_usb_dib0700
[ 749.419288] ath dib7000m snd_pcm cfg80211 dib0090 btusb dib0070 btrtl btbcm dib3000mc dibx000_common dvb_usb dvb_core btintel rc_core bluetooth joydev i2c_i801 snd_timer snd rfkill lpc_ich soundcore tpm_infineon ie31200_edac mei_me mei edac_core tpm_tis shpchp tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc uas usb_storage btrfs xor raid6_pq mxm_wmi crc32c_intel serio_raw e1000e firewire_ohci firewire_core atl1c crc_itu_t ptp pps_core wmi video fjes
[ 749.419290] CPU: 1 PID: 5516 Comm: Xorg Tainted: P W OE 4.5.0-301.fc24.x86_64 #1
[ 749.419290] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77X-UD5H, BIOS F14 08/22/2012
[ 749.419291] task: ffff880213925b80 ti: ffff8801f8e94000 task.ti: ffff8801f8e94000
[ 749.419349] RIP: 0010:[] [] nv_alloc_contig_pages+0x165/0x4d0 [nvidia]
[ 749.419350] RSP: 0018:ffff8801f8e976a0 EFLAGS: 00010246
[ 749.419350] RAX: 000000000000003c RBX: 0000000000000000 RCX: 000000000000001f
[ 749.419351] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000292
[ 749.419351] RBP: ffff8801f8e976e8 R08: 0000000000000000 R09: 000000000000003c
[ 749.419351] R10: ffffffff81a80399 R11: 0000000000040000 R12: ffff8801f7fc0000
[ 749.419352] R13: ffff880000000000 R14: ffff880080000000 R15: ffff8800c4ecc5a0
[ 749.419353] FS: 00007f3f20211a40(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000
[ 749.419353] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 749.419353] CR2: 00007f3f19af7510 CR3: 0000000201c96000 CR4: 00000000001406e0
[ 749.419354] Stack:
[ 749.419355] 0000000000000007 ffff8800c4ecc640 ffff8800dc973800 00000000c7a4d645
[ 749.419355] 0000000000000004 0000000000000008 0000000000000001 ffff8800dc973800
[ 749.419356] ffff8800dc973800 ffff8801f8e97750 ffffffffa0912743 0000000000000000
[ 749.419356] Call Trace:
[ 749.419395] [] nv_alloc_pages+0x103/0x1d0 [nvidia]
[ 749.419478] [] ? _nv009867rm+0x16fd/0x1a00 [nvidia]
[ 749.419481] [] ? swiotlb_sync_sg_for_cpu+0x60/0x60
[ 749.419540] [] _nv014758rm+0xe6/0x270 [nvidia]
[ 749.419597] [] ? _nv014758rm+0x44/0x270 [nvidia]
[ 749.419690] [] ? _nv013171rm+0x4b5/0x660 [nvidia]
[ 749.419782] [] ? _nv013164rm+0xa8/0x350 [nvidia]
[ 749.419873] [] ? _nv013165rm+0x44/0x230 [nvidia]
[ 749.419956] [] ? _nv002409rm+0x36d/0x970 [nvidia]
[ 749.420048] [] ? _nv013289rm+0x6c6/0xee0 [nvidia]
[ 749.420138] [] ? _nv013289rm+0xb8a/0xee0 [nvidia]
[ 749.420227] [] ? _nv013289rm+0xad9/0xee0 [nvidia]
[ 749.420316] [] ? _nv013283rm+0x1be/0x410 [nvidia]
[ 749.420404] [] ? _nv013283rm+0x18a/0x410 [nvidia]
[ 749.420494] [] ? _nv013284rm+0x130/0x300 [nvidia]
[ 749.420581] [] ? _nv013285rm+0xd2/0x130 [nvidia]
[ 749.420640] [] ? _nv011273rm+0x613/0x690 [nvidia]
[ 749.420686] [] ? _nv003413rm+0x15b/0x260 [nvidia]
[ 749.420767] [] ? _nv015943rm+0x1b95/0x3d40 [nvidia]
[ 749.420847] [] ? _nv015943rm+0xa56/0x3d40 [nvidia]
[ 749.420930] [] ? _nv002413rm+0x1ed0/0x3a40 [nvidia]
[ 749.420992] [] ? _nv010220rm+0x6e/0xd0 [nvidia]
[ 749.421049] [] ? _nv014886rm+0xc60/0xd50 [nvidia]
[ 749.421106] [] ? _nv000769rm+0x32a/0x660 [nvidia]
[ 749.421164] [] ? rm_init_adapter+0x6a/0x100 [nvidia]
[ 749.421165] [] ? setup_irq+0x90/0x90
[ 749.421204] [] ? nv_open_device+0x12b/0x600 [nvidia]
[ 749.421205] [] ? kmem_cache_alloc+0x197/0x1d0
[ 749.421243] [] ? nvidia_open+0x1be/0x2e0 [nvidia]
[ 749.421281] [] ? nvidia_frontend_open+0x55/0xa0 [nvidia]
[ 749.421283] [] ? chrdev_open+0xb0/0x180
[ 749.421284] [] ? do_dentry_open+0x201/0x2e0
[ 749.421285] [] ? cdev_put+0x30/0x30
[ 749.421286] [] ? vfs_open+0x59/0x60
[ 749.421287] [] ? path_openat+0x1d3/0x14d0
[ 749.421288] [] ? legitimize_mnt+0x12/0x60
[ 749.421289] [] ? do_filp_open+0x91/0x100
[ 749.421290] [] ? __alloc_fd+0x3f/0x170
[ 749.421291] [] ? do_sys_open+0x130/0x220
[ 749.421292] [] ? SyS_open+0x1e/0x20
[ 749.421294] [] ? entry_SYSCALL_64_fastpath+0x12/0x6d
[ 749.421302] Code: 83 e1 01 4c 0f 44 c7 45 8b 40 1c 4c 89 66 08 48 89 46 10 44 89 46 1c 48 8b 07 f6 c4 40 74 4a 48 c7 c6 b0 aa 1a a1 e8 8b 9f 8c e0 <0f> 0b 48 c7 c2 60 cd e6 a0 48 c7 c6 48 aa 1a a1 31 ff e8 54 18
[ 749.421341] RIP [] nv_alloc_contig_pages+0x165/0x4d0 [nvidia]
[ 749.421342] RSP
[ 749.421349] —[ end trace bb769003a6aaaa92 ]—

Thank’s

Same problem here, Did not try with 4.6, but starting from 4.5 it crashes same way here. I use Precision M4600 laptop with Quadro 2000M (F23)

For now I workaround this using a crude hack:

— kernel/nvidia/nv-vm.c 2016-03-17 03:58:29.000000000 +0100
+++ kernel.org/nvidia/nv-vm.c 2016-04-04 18:51:06.212730640 +0200
@@ -363,7 +363,7 @@
nv_printf(NV_DBG_MEMINFO,
“NVRM: VM: %s: %u pages\n”, FUNCTION, at->num_pages);

  • if (IS_VGX_HYPER())
  • //if (IS_VGX_HYPER())
    return nv_alloc_coherent_pages(nv, at);

    at->order = get_order(at->num_pages * PAGE_SIZE);
    @@ -437,7 +437,7 @@
    nv_printf(NV_DBG_MEMINFO,
    “NVRM: VM: %s: %u pages\n”, FUNCTION, at->num_pages);

  • if (IS_VGX_HYPER())
  • //if (IS_VGX_HYPER())
    return nv_free_coherent_pages(at);

    if (!NV_ALLOC_MAPPING_CACHED(at->flags))

Ugly like hell, but as this path (this == IS_VGX_HYPER) is not using SetPageReserved() (defined as NV_LOCK_PAGE) and this call seems to trigger this assert it works until someone figure something more correct.

Thanks for your hack.

I’m wondering if it’s a fedora config problem cause other users don’t complain with this kernel.
(http://rglinuxtech.com/?p=1647).
I filed a bug a bug to fedora two months ago but they said “No support for this driver see with Nvidia”
Sorry for my english.

Quick check and it seems disabling CONFIG_DEBUG_VM_PGFLAGS prevents problem from happening (obvious) but the question is where the problem is …in kernel or in nvidia drivers ? I hope someone from nvidia dev’s will clarify/check that.

Please attach nvidia bug report and kernel config file.

It looks like the change in 4.5 is affecting more than just the nVidia driver:

https://www.virtualbox.org/changeset/60372/vbox
https://bugzilla.redhat.com/show_bug.cgi?id=1307033
https://bugzilla.redhat.com/show_bug.cgi?id=1317296

Internally filed bug to track this issue : Bug 200189979

When I recompile the kernel without CONFIG_DEBUG_VM_PGFLAGS I got the nvida driver to work without patching.

Fedora’s bug is being tracked here: 1335173 – CONFIG_DEBUG_VM_PGFLAGS causes kernel panic for the X.org server: kernel BUG at /usr/src/kernels/4.5.3-300.fc24.x86_64/include/linux/page-flags.h:272