Linux 3.10+ Driver crash

I’m having issues with the nvidia Linux driver (both 319 and 325 beta) on my Lenovo IdeaPad Y500 laptop (MBG2BMH with GeForce GT650M GPU) running Arch Linux x64.

When starting X, my screen goes black, my fan starts spinning at full speed and after a few seconds, the system dies (turns off, not cleanly).

The issue started occurring after a kernel update to Linux 3.10.
The nvidia driver for this kernel contains the unofficial patches (see https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/nvidia&id=415c1daa9ccb1ec46c172b304f40929239d87af8 for diff).

X output:

[    49.888] (**) NVIDIA(0): Enabling 2D acceleration
[    56.164] (EE) NVIDIA(0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
[    56.164] (EE) NVIDIA(0):     check your system's kernel log for additional error
[    56.164] (EE) NVIDIA(0):     messages and refer to Chapter 8: Common Problems in the
[    56.164] (EE) NVIDIA(0):     README for additional information.
[    56.164] (EE) NVIDIA(0): Failed to initialize the NVIDIA graphics device!
[    56.164] (EE) NVIDIA(0): Failing initialization of X screen 0
[    56.164] (II) UnloadModule: "nvidia"
[    56.164] (II) UnloadSubModule: "shadow"
[    56.164] (II) UnloadSubModule: "wfb"
[    56.164] (II) UnloadSubModule: "fb"
[    56.164] (EE) Screen(s) found, but none have a usable configuration.

Kernel output:

Jul 25 16:02:20 nwa kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jul 25 16:02:20 nwa kernel: NVRM: os_pci_init_handle: invalid context!
Jul 25 16:02:20 nwa kernel: NVRM: os_pci_init_handle: invalid context!
Jul 25 16:02:20 nwa kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jul 25 16:02:20 nwa kernel: NVRM: os_pci_init_handle: invalid context!
Jul 25 16:02:20 nwa kernel: NVRM: os_pci_init_handle: invalid context!
Jul 25 16:02:20 nwa kernel: NVRM: RmInitAdapter failed! (0x25:0x28:1148)
Jul 25 16:02:20 nwa kernel: NVRM: rm_init_adapter(0) failed

Relevant topics:
https://bbs.archlinux.org/viewtopic.php?pid=1304291
https://bbs.archlinux.org/viewtopic.php?pid=1304141

Experiencing the exact same issue on my Asus G75VX-T4066H, with GeForce GTX670MX gpu. Also running Arch Linux x64. The only difference for me is that my fans would not start spinning at full speed before my system dies, otherwise I am experiencing the exact same problem.

Same issue here with Linux 3.10-2, ArchLinux x64, and latest nvidia drivers. The GPU falling off the bus, on a Lenovo ThinkPad W530, every time we try to start X. My system does not overheat and shut down; it just stays on an unworkable black screen. Sometimes I can switch to a different TTY and Ctrl+Alt+Delete for reboot, sometimes I have a hard freeze (haven’t tried SysRq yet).

I do NOT use Optimus or Bumblebee – I have set “discrete graphics” as the display option in the BIOS and the nvidia card is the only display exposed to the kernel.

Downgrading to Linux 3.9 until this issue is resolved. Sad to see an nvidia release that doesn’t support the latest kernel. :(

A comment on the Arch forums make me wonder if this is related to the problem experienced by those who use bumblebee. Many laptops now have motherboards that support the Optimus technology, despite not using it. I know that my particular Asus model was originally meant to ship with Optimus technology, although upon release it did not have it. I do assume that the motherboard still has support for it, despite it being deactivated or not used.

I am having the same problems (Lenovo IdeaPad Y500 with GeForce GT650M, Archlinux X86_64).
The laptop does not have Optimus enabled. When I boot into a text console, all is fine, but when I start an X session or enable persistence mode using nvidia-smi, the screen goes blank, the fan goes up to full speed, and it automatically powers down after 20-30 seconds.
Before it powers down, it is accessible via SSH. My logs are identical to TB’s.
I tested the nouveau drivers and they work fine.

Same problems, Lenovo Y500, 2x GT650M (SLi), text console working fine, starting X server turns the fan (s) to full speed and power goes down after a while.

I have to downgrade my kernel to 3.9.5-301. I got a new 802.11 ac driver. It requires to build driver with kernel source.

The issue that 319.32 is incompatible with 3.10 kernel has been spotted for more than a month ago. NVIDIA need to speed up its patch update. I believe they don’t get along well with open source society.

I am experiencing the same issue with a Clevo W150ER, NVIDIA GT 650M, linux-ck-ivybridge 3.10.3-1 on Arch. Why hasn’t NVIDIA fixed this yet? I began to wonder if it was my fault as it has not been working for over a month. I hope they fix it soon. /twiddles thumbs

Hi. I have a Sager NP2096 (Based on Compal JHL90), and have a GeForce 9600M GT.

Having the same log file in /var/lib/dkms/nvidia/319.32/build/make.log as OP.

Haven’t had this issue before, and I used to be able to use on 3.9.4 kernel, but that is broke now too. I think this has to do with nVidia driver, not the kernel.

Same issue on ArchLinux x64, kernel 3.10 and nVidia drivers 319.32. It’s a MSI GE60 laptop with nVidia GTX660M graphics card.

I think you’re mistaken. I did not post a dkms log and building succeeds for me, after applying the (unofficial) 3.10 patches that are going around.

We internally file bug to track this issue for Y500 notebook bug id 1341332 .

The issue is also present on Lenovo ThinkPad W530, although I don’t think it’s hardware-specific. The GPU “falls off bus” whenever trying to run anything with Bumblebee. It is also reported to be in switched off state by BBSwitch and I wasn’t able to turn it on, not even through direct ACPI-call.

Presumably this is caused by the patch, which allows drivers to be compiled for kernel 3.10.

Problem also had on Lenovo IdeaPad Y510P with GT 750M. I have tried the available patches for 325.08, 319.32 and 319.17, and have also tried downgrading to 3.9.9-1 kernel, and each combination of those, with no success. The patches allowed the module to compile but startx fails to execute, with the PCI:1:0:0 output referenced elsewhere.

I have just invested HEAVILY in a full blown NVIDIA dual-card laptop JUST to avoid having video problems with Radeon, this is all much like the disappointment when I bought an Optimus/ION-containing laptop around 2010 and could never use Linux on it. Imagine buying a state of the art laptop, expecting it to run Linux games beautifully, and then not even being able to open X! Twice now, almost $2000 down the drain!

NVIDIA, please consider open sourcing your drivers - you are a hardware vendor, not a software vendor!

So does that mean we’re going to get a reply when it’s implemented, a status on it, or in some obscure update list?

I am still experiencing this issue with the new driver on my Asus G75VX.

This issue should not be difficult for nvidia to reproduce, every arch user with certain laptop models have this problem.

I also am still experiencing the same issues with the new (325.15) driver on my Lenovo IdeaPad Y500.

Pity, I was going to recommend a nvidia laptop to a friend of mine who is thinking of running Linux, but since recently nvidia appears to be incapable of making a driver that even works, and given the recent improvements to the radeon open-source driver, I think I’ll be reconsidering.

Samething here with 3.10 kernels on Fedora 19.

Same thing here on Arch Linux 3.10.5-1-ARCH, with nvidia 325.15-1. The computer is a Dell Inspiron Turbo 14R, Optimus setup with nvidia GeForce GT640M.

Aug 07 16:24:51 lightning sudo[3253]: hexchain : TTY=pts/3 ; PWD=/home/hexchain ; USER=root ; COMMAND=/usr/bin/modprobe nvidia
Aug 07 16:24:51 lightning sudo[3253]: pam_unix(sudo:session): session opened for user root by hexchain(uid=0)
Aug 07 16:24:51 lightning kernel: nvidia: module license 'NVIDIA' taints kernel.
Aug 07 16:24:51 lightning kernel: Disabling lock debugging due to kernel taint
Aug 07 16:24:51 lightning sudo[3253]: pam_unix(sudo:session): session closed for user root
Aug 07 16:24:51 lightning kernel: vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
Aug 07 16:24:51 lightning kernel: [drm] Initialized nvidia-drm 0.0.0 20130102 for 0000:01:00.0 on minor 1
Aug 07 16:24:51 lightning kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  325.15  Wed Jul 31 18:50:56 PDT 2013
<Executed nvidia-smi in a different terminal>
Aug 07 16:24:53 lightning kernel: nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
Aug 07 16:24:59 lightning kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Aug 07 16:24:59 lightning kernel: NVRM: os_pci_init_handle: invalid context!
Aug 07 16:24:59 lightning kernel: NVRM: os_pci_init_handle: invalid context!
Aug 07 16:24:59 lightning kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Aug 07 16:24:59 lightning kernel: NVRM: os_pci_init_handle: invalid context!
Aug 07 16:24:59 lightning kernel: NVRM: os_pci_init_handle: invalid context!
Aug 07 16:24:59 lightning kernel: NVRM: RmInitAdapter failed! (0x25:0x28:1157)
Aug 07 16:24:59 lightning kernel: NVRM: rm_init_adapter(0) failed
Aug 07 16:30:39 lightning kernel: NVRM: request_irq() failed (-22)
Aug 07 16:30:41 lightning kernel: NVRM: request_irq() failed (-22)

Tried disabling MSI using NVreg_EnableMSI=0, no luck.

Tried booting into multi-user.target (runlevel 3) and directly use the XRandR method described here Chapter 32. Offloading Graphics Display with RandR 1.4, same error.

I have the same problem with an Ideapad y510p (750m). 3.10 kernel + 325 nVidia drivers.