GTX 580 - 375.10 - Weston/EGLStream produces crash in kernel
Hi, I'm running Arch Linux with the 4.8.6 Kernel and have the weston-eglstream package installed from AUR. When I boot into console by added "3" to my kernel line and then unload the nvidia-drm module "# modprobe -r nvidia-drm" and then reload it with modeset=1 parameter I can't start weston with "weston --use-egldevice". The screen goes black and weston never shows up. When I ssh into this machine and follow the journal with journalctl -f I can see this kernel message: [code] Nov 02 11:25:57 archlinux kernel: usercopy: kernel memory overwrite attempt detected to ffff8803e8ec7ce0 (<process stack>) (8 bytes) Nov 02 11:25:57 archlinux kernel: ------------[ cut here ]------------ Nov 02 11:25:57 archlinux kernel: kernel BUG at mm/usercopy.c:75! Nov 02 11:25:57 archlinux kernel: invalid opcode: 0000 [#2] PREEMPT SMP Nov 02 11:25:57 archlinux kernel: Modules linked in: nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) drm_kms_helper drm syscopyarea sysfillrect sysimgblt fb_sys_fops ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_mod nct6775 hwmon_vid snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic input_leds joydev mousedev hid_roccat_lua hid_roccat_common hid_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp btusb coretemp btrtl btbcm eeepc_wmi iTCO_wdt iTCO_vendor_support btintel kvm_intel asus_wmi sparse_keymap led_class mxm_wmi evdev kvm bluetooth Nov 02 11:25:57 archlinux kernel: mac_hid irqbypass snd_hda_intel rfkill crct10dif_pclmul crc32_pclmul snd_hda_codec usbhid crc32c_intel ghash_clmulni_intel snd_hda_core hid snd_hwdep aesni_intel aes_x86_64 lrw gf128mul glue_helper snd_pcm ablk_helper cryptd e1000e snd_timer i2c_i801 intel_cstate snd intel_rapl_perf mei_me psmouse ptp pcspkr i2c_smbus soundcore mei pps_core lpc_ich shpchp fan thermal fjes wmi battery video tpm_tis tpm_tis_core tpm button squashfs sch_fq_codel loop vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sd_mod serio_raw atkbd libps2 ahci libahci libata xhci_pci ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common i8042 serio [last unloaded: nvidia] Nov 02 11:25:57 archlinux kernel: CPU: 2 PID: 3591 Comm: weston Tainted: P D O 4.8.6-1-ARCH #1 Nov 02 11:25:57 archlinux kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012 Nov 02 11:25:57 archlinux kernel: task: ffff8803f6249c80 task.stack: ffff8803e8ec4000 Nov 02 11:25:57 archlinux kernel: RIP: 0010:[<ffffffff81205f5f>] [<ffffffff81205f5f>] __check_object_size+0x13f/0x1d6 Nov 02 11:25:57 archlinux kernel: RSP: 0018:ffff8803e8ec7c88 EFLAGS: 00010282 Nov 02 11:25:57 archlinux kernel: RAX: 0000000000000062 RBX: ffff8803e8ec7ce0 RCX: 0000000000000000 Nov 02 11:25:57 archlinux kernel: RDX: 0000000000000000 RSI: ffff88041ec8dba8 RDI: ffff88041ec8dba8 Nov 02 11:25:57 archlinux kernel: RBP: ffff8803e8ec7ca8 R08: 000000000003d43f R09: 0000000000000005 Nov 02 11:25:57 archlinux kernel: R10: ffff8803e5382a00 R11: 000000000000037a R12: 0000000000000008 Nov 02 11:25:57 archlinux kernel: R13: 0000000000000000 R14: ffff8803e8ec7ce8 R15: ffff8803e5382a00 Nov 02 11:25:57 archlinux kernel: FS: 00007f05e5e0de80(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000 Nov 02 11:25:57 archlinux kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 02 11:25:57 archlinux kernel: CR2: 00007f05e51c34a0 CR3: 00000003f75fb000 CR4: 00000000000406e0 Nov 02 11:25:57 archlinux kernel: Stack: Nov 02 11:25:57 archlinux kernel: ffff8803e8ec7ce0 0000000000000008 00007ffc5b4ce350 00000000ffffffea Nov 02 11:25:57 archlinux kernel: ffff8803e8ec7cd0 ffffffffa14dc4f1 ffff8803e8ec7dd0 0000000000f00000 Nov 02 11:25:57 archlinux kernel: ffff8803fd3cd240 ffff880404af7f08 ffffffffa15439e9 ffff8803e8ec7d30 Nov 02 11:25:57 archlinux kernel: Call Trace: Nov 02 11:25:57 archlinux kernel: [<ffffffffa14dc4f1>] nvkms_copyin+0x21/0x50 [nvidia_modeset] Nov 02 11:25:57 archlinux kernel: [<ffffffffa15439e9>] _nv000272kms+0x69/0x120 [nvidia_modeset] Nov 02 11:25:57 archlinux kernel: [<ffffffff810d7198>] ? console_unlock+0x318/0x5f0 Nov 02 11:25:57 archlinux kernel: [<ffffffffa03c07c6>] ? nvidia_drm_gem_import_nvkms_memory+0x76/0x110 [nvidia_drm] Nov 02 11:25:57 archlinux kernel: [<ffffffff810bfd3d>] ? remove_wait_queue+0x4d/0x60 Nov 02 11:25:57 archlinux kernel: [<ffffffffa053ed40>] ? drm_ioctl+0x200/0x4f0 [drm] Nov 02 11:25:57 archlinux kernel: [<ffffffff810bfe14>] ? __wake_up+0x44/0x50 Nov 02 11:25:57 archlinux kernel: [<ffffffffa03c0750>] ? nvidia_drm_dumb_create+0x190/0x190 [nvidia_drm] Nov 02 11:25:57 archlinux kernel: [<ffffffff813eb570>] ? n_tty_open+0xd0/0xd0 Nov 02 11:25:57 archlinux kernel: [<ffffffff812088d7>] ? __vfs_write+0x37/0x140 Nov 02 11:25:57 archlinux kernel: [<ffffffff8121c433>] ? do_vfs_ioctl+0xa3/0x5f0 Nov 02 11:25:57 archlinux kernel: [<ffffffff812276a7>] ? __fget+0x77/0xb0 Nov 02 11:25:57 archlinux kernel: [<ffffffff8121c9f9>] ? SyS_ioctl+0x79/0x90 Nov 02 11:25:57 archlinux kernel: [<ffffffff815f7cf2>] ? entry_SYSCALL_64_fastpath+0x1a/0xa4 Nov 02 11:25:57 archlinux kernel: Code: 87 71 81 48 0f 45 d0 48 c7 c6 70 a5 72 81 48 c7 c0 eb 43 73 81 48 0f 45 f0 4d 89 e1 48 89 d9 48 c7 c7 28 0d 73 81 e8 a7 01 f7 ff <0f> 0b 48 89 df e8 57 75 e6 ff 84 c0 0f 84 f8 fe ff ff b8 00 00 Nov 02 11:25:57 archlinux kernel: RIP [<ffffffff81205f5f>] __check_object_size+0x13f/0x1d6 Nov 02 11:25:57 archlinux kernel: RSP <ffff8803e8ec7c88> Nov 02 11:25:57 archlinux kernel: ---[ end trace 5ad7d5aef591d152 ]--- [/code] Seems to be a problem specific to that kernel version. I downgraded to 370.28 and the problem is the same. I installed linux-lts (4.4.28) and the 370.28 kernel module for it and "weston --use-egldevice" works just fine. However with 370.28 this bug seems to be back: https://devtalk.nvidia.com/default/topic/932343/364-19-gtx-580-weston-simple-egl-fails-to-initialize-egl/ EDIT: 375.10 also works as long as I use the Linux LTS Kernel so this is further evidence that it's specific to 4.8.6. weston-simple-egl still doesn't work.
Hi,

I'm running Arch Linux with the 4.8.6 Kernel and have the weston-eglstream package installed from AUR.
When I boot into console by added "3" to my kernel line and then unload the nvidia-drm module "# modprobe -r nvidia-drm" and then reload it with modeset=1 parameter I can't start weston with "weston --use-egldevice". The screen goes black and weston never shows up. When I ssh into this machine and follow the journal with journalctl -f I can see this kernel message:

Nov 02 11:25:57 archlinux kernel: usercopy: kernel memory overwrite attempt detected to ffff8803e8ec7ce0 (<process stack>) (8 bytes)
Nov 02 11:25:57 archlinux kernel: ------------[ cut here ]------------
Nov 02 11:25:57 archlinux kernel: kernel BUG at mm/usercopy.c:75!
Nov 02 11:25:57 archlinux kernel: invalid opcode: 0000 [#2] PREEMPT SMP
Nov 02 11:25:57 archlinux kernel: Modules linked in: nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) drm_kms_helper drm syscopyarea sysfillrect sysimgblt fb_sys_fops ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_mod nct6775 hwmon_vid snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic input_leds joydev mousedev hid_roccat_lua hid_roccat_common hid_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp btusb coretemp btrtl btbcm eeepc_wmi iTCO_wdt iTCO_vendor_support btintel kvm_intel asus_wmi sparse_keymap led_class mxm_wmi evdev kvm bluetooth
Nov 02 11:25:57 archlinux kernel: mac_hid irqbypass snd_hda_intel rfkill crct10dif_pclmul crc32_pclmul snd_hda_codec usbhid crc32c_intel ghash_clmulni_intel snd_hda_core hid snd_hwdep aesni_intel aes_x86_64 lrw gf128mul glue_helper snd_pcm ablk_helper cryptd e1000e snd_timer i2c_i801 intel_cstate snd intel_rapl_perf mei_me psmouse ptp pcspkr i2c_smbus soundcore mei pps_core lpc_ich shpchp fan thermal fjes wmi battery video tpm_tis tpm_tis_core tpm button squashfs sch_fq_codel loop vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sd_mod serio_raw atkbd libps2 ahci libahci libata xhci_pci ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common i8042 serio [last unloaded: nvidia]
Nov 02 11:25:57 archlinux kernel: CPU: 2 PID: 3591 Comm: weston Tainted: P D O 4.8.6-1-ARCH #1
Nov 02 11:25:57 archlinux kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012
Nov 02 11:25:57 archlinux kernel: task: ffff8803f6249c80 task.stack: ffff8803e8ec4000
Nov 02 11:25:57 archlinux kernel: RIP: 0010:[<ffffffff81205f5f>] [<ffffffff81205f5f>] __check_object_size+0x13f/0x1d6
Nov 02 11:25:57 archlinux kernel: RSP: 0018:ffff8803e8ec7c88 EFLAGS: 00010282
Nov 02 11:25:57 archlinux kernel: RAX: 0000000000000062 RBX: ffff8803e8ec7ce0 RCX: 0000000000000000
Nov 02 11:25:57 archlinux kernel: RDX: 0000000000000000 RSI: ffff88041ec8dba8 RDI: ffff88041ec8dba8
Nov 02 11:25:57 archlinux kernel: RBP: ffff8803e8ec7ca8 R08: 000000000003d43f R09: 0000000000000005
Nov 02 11:25:57 archlinux kernel: R10: ffff8803e5382a00 R11: 000000000000037a R12: 0000000000000008
Nov 02 11:25:57 archlinux kernel: R13: 0000000000000000 R14: ffff8803e8ec7ce8 R15: ffff8803e5382a00
Nov 02 11:25:57 archlinux kernel: FS: 00007f05e5e0de80(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
Nov 02 11:25:57 archlinux kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 02 11:25:57 archlinux kernel: CR2: 00007f05e51c34a0 CR3: 00000003f75fb000 CR4: 00000000000406e0
Nov 02 11:25:57 archlinux kernel: Stack:
Nov 02 11:25:57 archlinux kernel: ffff8803e8ec7ce0 0000000000000008 00007ffc5b4ce350 00000000ffffffea
Nov 02 11:25:57 archlinux kernel: ffff8803e8ec7cd0 ffffffffa14dc4f1 ffff8803e8ec7dd0 0000000000f00000
Nov 02 11:25:57 archlinux kernel: ffff8803fd3cd240 ffff880404af7f08 ffffffffa15439e9 ffff8803e8ec7d30
Nov 02 11:25:57 archlinux kernel: Call Trace:
Nov 02 11:25:57 archlinux kernel: [<ffffffffa14dc4f1>] nvkms_copyin+0x21/0x50 [nvidia_modeset]
Nov 02 11:25:57 archlinux kernel: [<ffffffffa15439e9>] _nv000272kms+0x69/0x120 [nvidia_modeset]
Nov 02 11:25:57 archlinux kernel: [<ffffffff810d7198>] ? console_unlock+0x318/0x5f0
Nov 02 11:25:57 archlinux kernel: [<ffffffffa03c07c6>] ? nvidia_drm_gem_import_nvkms_memory+0x76/0x110 [nvidia_drm]
Nov 02 11:25:57 archlinux kernel: [<ffffffff810bfd3d>] ? remove_wait_queue+0x4d/0x60
Nov 02 11:25:57 archlinux kernel: [<ffffffffa053ed40>] ? drm_ioctl+0x200/0x4f0 [drm]
Nov 02 11:25:57 archlinux kernel: [<ffffffff810bfe14>] ? __wake_up+0x44/0x50
Nov 02 11:25:57 archlinux kernel: [<ffffffffa03c0750>] ? nvidia_drm_dumb_create+0x190/0x190 [nvidia_drm]
Nov 02 11:25:57 archlinux kernel: [<ffffffff813eb570>] ? n_tty_open+0xd0/0xd0
Nov 02 11:25:57 archlinux kernel: [<ffffffff812088d7>] ? __vfs_write+0x37/0x140
Nov 02 11:25:57 archlinux kernel: [<ffffffff8121c433>] ? do_vfs_ioctl+0xa3/0x5f0
Nov 02 11:25:57 archlinux kernel: [<ffffffff812276a7>] ? __fget+0x77/0xb0
Nov 02 11:25:57 archlinux kernel: [<ffffffff8121c9f9>] ? SyS_ioctl+0x79/0x90
Nov 02 11:25:57 archlinux kernel: [<ffffffff815f7cf2>] ? entry_SYSCALL_64_fastpath+0x1a/0xa4
Nov 02 11:25:57 archlinux kernel: Code: 87 71 81 48 0f 45 d0 48 c7 c6 70 a5 72 81 48 c7 c0 eb 43 73 81 48 0f 45 f0 4d 89 e1 48 89 d9 48 c7 c7 28 0d 73 81 e8 a7 01 f7 ff <0f> 0b 48 89 df e8 57 75 e6 ff 84 c0 0f 84 f8 fe ff ff b8 00 00
Nov 02 11:25:57 archlinux kernel: RIP [<ffffffff81205f5f>] __check_object_size+0x13f/0x1d6
Nov 02 11:25:57 archlinux kernel: RSP <ffff8803e8ec7c88>
Nov 02 11:25:57 archlinux kernel: ---[ end trace 5ad7d5aef591d152 ]---


Seems to be a problem specific to that kernel version. I downgraded to 370.28 and the problem is the same. I installed linux-lts (4.4.28) and the 370.28 kernel module for it and "weston --use-egldevice" works just fine. However with 370.28 this bug seems to be back: https://devtalk.nvidia.com/default/topic/932343/364-19-gtx-580-weston-simple-egl-fails-to-initialize-egl/

EDIT: 375.10 also works as long as I use the Linux LTS Kernel so this is further evidence that it's specific to 4.8.6. weston-simple-egl still doesn't work.

#1
Posted 11/02/2016 11:15 AM   
Thanks for the report. This is specific to the Linux kernel config option CONFIG_HARDENED_USERCOPY (new in kernel 4.8). It attempts to validate that the kernel address passed to copy_from_user() and copy_to_user() is either on the stack or on the heap (trying to catch bugs where other kernel memory is either copied to user-space, or over-written by user-space). In the scenario here, memory is safely on the stack, but nvidia-modeset.ko was compiled such that the binary-only part of that kernel module does not contain stack frame information, and CONFIG_HARDENED_USERCOPY cannot recognize that the memory is really on the stack. In a future release, we'll allocate this memory on the heap, to avoid this interaction problem. In the meantime, the best work around I can suggest is to rebuild your >= 4.8 kernel without CONFIG_HARDENED_USERCOPY. However, CONFIG_HARDENED_USERCOPY provides very useful checking, so I'd encourage you to go back to a CONFIG_HARDENED_USERCOPY-enabled kernel once an updated NVIDIA driver is available. Sorry for the trouble.
Answer Accepted by Original Poster
Thanks for the report.

This is specific to the Linux kernel config option CONFIG_HARDENED_USERCOPY (new in kernel 4.8). It attempts to validate that the kernel address passed to copy_from_user() and copy_to_user() is either on the stack or on the heap (trying to catch bugs where other kernel memory is either copied to user-space, or over-written by user-space).

In the scenario here, memory is safely on the stack, but nvidia-modeset.ko was compiled such that the binary-only part of that kernel module does not contain stack frame information, and CONFIG_HARDENED_USERCOPY cannot recognize that the memory is really on the stack.

In a future release, we'll allocate this memory on the heap, to avoid this interaction problem. In the meantime, the best work around I can suggest is to rebuild your >= 4.8 kernel without CONFIG_HARDENED_USERCOPY. However, CONFIG_HARDENED_USERCOPY provides very useful checking, so I'd encourage you to go back to a CONFIG_HARDENED_USERCOPY-enabled kernel once an updated NVIDIA driver is available.

Sorry for the trouble.

Andy Ritger
NVIDIA Linux Graphics

#2
Posted 11/18/2016 06:43 PM   
Scroll To Top

Add Reply