"RmInitAdapter failed" with 370.23 but 367.35 works fine

My system (Gentoo ~amd64) is working fine with 367.35-r1 on kernel 4.7.2, with a patch applied for kernel 4.7 support, but if I install 370.23 (sans patch, of course), I get a black screen during boot (when the module gets loaded - normally, it would just flicker off briefly), the graphics card fan switches to full-speed and the following message appears in dmesg (retrieved via remote login):

[    7.652856] NVRM: RmInitAdapter failed! (0x53:0x3:1818)
[    7.652882] NVRM: rm_init_adapter failed for device bearing minor number 0

The message actually appears again a few seconds later, presumably when X tries to start.

I’ve turned on framebuffer support in the kernel (CONFIG_DRM_KMS_HELPER, CONFIG_DRM_KMS_FB_HELPER, etc.) but that made no difference. (I did that after initially thinking the black screen was simply down to the lack of fbcon support. I noticed the error messages after finding that it made no difference.)

I’ve tried the card in a different PCIe slot (suggested in a thread about a similar error message) and have also tried with iommu=off, remembering that I had an issue with a GeForce GPU in a different system a few years back when the IOMMU was enabled.

I’m using an Asus GTX460 1GB DirectCu in an Asus M5A99X EVO R2.0 motherboard with latest BIOS (2501). CPU is an FX-9590.
nvidia-bug-report.log.gz (57.9 KB)

I have this exact same problem on Arch Linux, and downgrading to 367.35 allows me to boot normally. Here are the details:

EDIT: Whoops, didn’t know about the nvidia-bug-report tool, should have read the parent more closely. Most of the below is probably irrelevant, and the bug report log is now attached.

roast cpl # pacman -Q|grep nvidia
lib32-nvidia-libgl 370.23-1
lib32-nvidia-utils 370.23-1
nvidia 370.23-4
nvidia-libgl 370.23-1
nvidia-settings 370.23-1
nvidia-utils 370.23-1
opencl-nvidia 370.23-1
roast cpl # journalctl -xe
[...]
-- The start-up result is done.
Aug 27 08:25:48 roast login[503]: LOGIN ON tty2 BY cpl
Aug 27 08:29:21 roast kernel: vgaarb: device changed decodes: PCI:0000:06:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
Aug 27 08:29:21 roast kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 244
Aug 27 08:29:21 roast kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  370.23  Mon Aug  8 18:02:36 PDT 2016
Aug 27 08:29:32 roast su[492]: pam_unix(su:session): session closed for user root
Aug 27 08:29:39 roast kernel: NVRM: RmInitAdapter failed! (0x53:0x3:1818)
Aug 27 08:29:39 roast kernel: NVRM: rm_init_adapter failed for device bearing minor number 0
Aug 27 08:33:23 roast sshd[764]: Accepted password for cpl from 192.168.11.9 port 49320 ssh2

The nvidia module is actually loaded:

roast cpl # lsmod|grep nvidia
nvidia_drm             53248  0
nvidia_modeset        765952  1 nvidia_drm
nvidia              11841536  1 nvidia_modeset
drm_kms_helper        118784  1 nvidia_drm
drm                   294912  3 drm_kms_helper,nvidia_drm

Module details:

roast cpl # modinfo nvidia
filename:       /lib/modules/4.7.2-1-ARCH/extramodules/nvidia.ko.gz
alias:          char-major-195-*
version:        370.23
supported:      external
license:        NVIDIA
srcversion:     F4C4F39F6CAEF3621E98A07
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:
vermagic:       4.7.2-1-ARCH SMP preempt mod_unload modversions
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_CheckPCIConfigSpace:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_TCEBypassMode:int
parm:           NVreg_UseThreadedInterrupts:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_AssignGpus:charp

Device info:

roast cpl # cat /proc/driver/nvidia/gpus/0000\:06\:00.0/information
Model:           GeForce GTX 460
IRQ:             33
GPU UUID:        GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:        PCIe
DMA Size:        39 bits
DMA Mask:        0x7fffffffff
Bus Location:    0000:06:00.0
Device Minor:    0

Kernel info:

roast cpl # uname -a
Linux roast 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux

Also, before anyone asks, I do have nouveau blacklisted:

roast modprobe.d # lsinitcpio /boot/initramfs-linux.img|grep nvidia
usr/lib/modprobe.d/nvidia.conf
roast modprobe.d # cat /usr/lib/modprobe.d/nvidia.conf
blacklist nouveau

and:

roast modprobe.d # rmmod nouveau
rmmod: ERROR: Module nouveau is not currently loaded

Any assistance would be greatly appreciated.
nvidia-bug-report.log.gz (55.6 KB)

Hi, I have a very similar problem.

I’m running Arch with a GTX 460 card. I don’t get the RmInitAdapter error message, however I do see the ???'s in /proc/driver/nvidia/gpus/0000:06:00.0/information and when starting X I get a black screen with the nvidia card fan going to maximum.

All help and suggestions are very welcome.

I have the same issue, it seems that only GTX 460 GPU are impacted.
nvidia-bug-report.log.gz (113 KB)

Same issue discussed here: Problems with nvidia 370.23-1, fan to full speed / Pacman & Package Upgrade Issues / Arch Linux Forums

tantal_fr, I didn’t find any error in your log. Please attach log as soon as issue hit.

sandipt,
Sorry, wrong one. This one may be good
nvidia-bug-report.log.gz (66.5 KB)

Just a heads up for anyone at nvidia who may be looking at this, this issue still exists with the latest drivers (370.28) on the GTX 460. Bug report attached.

It’s been almost 3 weeks since I’ve been able to use my Linux install - is there anywhere other than this forum post where I can track progress (if any) on this issue? Or, alternatively, if nvidia isn’t going to fix the issue (I get it, it’s an old card, and a niche case) could you please let us know so we can make alternate arrangements?
nvidia-bug-report.log.gz (55.8 KB)

Hi Clucas84, I also hope it gets fixed soon - personally I wouldn’t understand if nvidia doesn’t fix this as they made the choice not to open source their driver so ‘we’ can’t fix it ourselves.

In the mean time I have switched back to the 340xx driver (on Arch), which does work with the latest kernels.

>>on kernel 4.7.2, with a patch applied for kernel 4.7 support, but if I install 370.23 (sans patch, of course)

Is it required to apply this patch? Or just install fresh ARch linux or Gentoo with 4.7.2-1-ARCH kernel and nvidia driver can hit this issue? Is just starx hit this issue? Or there are some other reproduction steps ?

Hi Sandip,

The unofficial patch is only to allow compiling 367.x on a 4.7 kernel - it’s not required with 370.x and probably wouldn’t apply anyway. (It’s in this thread: https://devtalk.nvidia.com/default/topic/938665/linux/linux-4-7-rc1-367-18-build-errors/.)

The error should be reproducible (assuming it happens with any GTX 460 / motherboard combination) with a clean install. I did try a quick clean, minimal install on a spare hard disk and got the same error. (“Quick” is a relative term where Gentoo installs are concerned, but it helps to have lots of cores and GHz (and watts) to throw at it!)

The error seems to occur at the moment the nvidia kernel modules are loaded, during the early init process prior to starting X. (It’s during the sysinit phase of Gentoo’s openrc init, presumably when eudev triggers the kernel hotplug events and autoloads modules for all hardware that it finds.)

I’m away at the moment but I’ll be back home tomorrow; I could set up my test installation so it boots into text mode without loading the modules, then manually load nvidia-modeset, etc. one at a time to see when the blank screen/full-speed fan and error messages occur, if that would be helpful.

Regards,
Stephen

[Edit] I’ve just done the test; the error message appears and the fans rev up when loading nvidia.ko; it is not necessary to load nvidia-drm, nvidia-uvm or nvidia-modeset.

Thanks all. Is this issue really specific to GTX 460 ? Can I get output of dmidecode ? Also what desktop env you guys are running KDE, GNOME, Unity ot else ? Need reproduction steps so that we can investigate this issue. Any earlier or latest driver help to fix this issue? How long it take to repro this issue?

It seems so. See also https://bugs.archlinux.org/task/50510 and https://bbs.archlinux.org/viewtopic.php?pid=1649386.

I’ll attach it.

I’m using awesome wm, but it doesn’t seem to make a difference as the error is already triggered by loading the nvidia module, before even starting X11…

You will need a GTX 460. Then it’s really easy to reproduce. Just load the kernel module…

367.x works, 370.x is broken.
dmidecode.txt (15.8 KB)

Same as mika.fischer.

I uses KDE btw. Here my dmidecode
dmidecode.txt (28.5 KB)

And another data point from me - dmidecode incoming.

Xfce here, but my test clean-install didn’t have a desktop environment installed at all, just xorg-server so I could do startx (although it turned out that I didn’t need to go that far).

And likewise, 367.x works (currently running 367.44), 370.x doesn’t (I’ve tried 370.23 and 370.28).
dmidecode.txt (17.3 KB)

Same on Arch with Gnome. 460GTX
dmidecode.txt (18.1 KB)

Will attach dmidecode data when I get home.

I’m using xmonad with weston as my compositor, but as one of the previous posters mentioned, it doesn’t matter what your DE or WM is - you can’t even get X to start.

Re: being limited to 460, I can’t say for sure, but that seems to be the case on every forum I’ve looked at.

Thanks for looking into it.
dmidecode.txt (10.6 KB)

Just upgraded Arch with:

pacman -Syu
  • First boot with latest driver - display hangs up, gpu fan 100%. Rebooting via hard reset.
  • Second boot, I've added systemd.unit=multi-user.target, to disable graphical target . Then I executed nvidia-bug-report.sh (this hangs up display, and I do ctrl-alt-del reboot) (nvidia-bug-report-bad.log.gz)
  • Third boot, I booted to multi-user.target and downgrade to older drivers. Rebooted via cmd.
  • Fourth boot, All goes fine, issued nvidia-bug-report.sh (nvidia-bug-report-good.log.gz)

nvidia-bug-report-bad.log.gz (55.7 KB)
nvidia-bug-report-good.log.gz (105 KB)

Just curious, what cable do you use? (DVI-D/DVI-I Dual or Single, HDMI?)
Because, it’s seems like card with new driver doesn’t recognize connected display.
Maybe it’s not a problem, but a symptom. But judging from the logs is is taking place.

good:

[    27.310] (**) NVIDIA(0): ConnectedMonitor string: "DFP"
[    28.009] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:1:0:0
[    28.009] (--) NVIDIA(0):     CRT-0
[    28.009] (--) NVIDIA(0):     CRT-1
[    28.009] (--) NVIDIA(0):     DFP-0 (boot)
[    28.009] (--) NVIDIA(0):     DFP-1
[    28.009] (--) NVIDIA(0):     DFP-2
[    28.009] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-0".
.......
[    28.024] (--) NVIDIA(0): Acer F22 (DFP-0): connected
[    28.024] (--) NVIDIA(0): Acer F22 (DFP-0): Internal TMDS
[    28.024] (--) NVIDIA(0): Acer F22 (DFP-0): 330.0 MHz maximum pixel clock
.......
[    28.026] (--) NVIDIA(0): VideoBIOS: 70.04.13.00.01
[    28.026] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[    28.026] (**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
[    28.026] (**) NVIDIA(0):     device Acer F22 (DFP-0) (Using EDID frequencies has been
[    28.026] (**) NVIDIA(0):     enabled on all display devices.)
[    28.029] (II) NVIDIA(0): Validated MetaModes:
[    28.029] (II) NVIDIA(0):     "1680x1050_60+0+0"
[    28.029] (II) NVIDIA(0): Virtual screen size determined to be 1680 x 1050
[    28.051] (--) NVIDIA(0): DPI set to (90, 88); computed from "UseEdidDpi" X config
[    28.051] (--) NVIDIA(0):     option
[    28.051] (--) Depth 24 pixmap format is 32 bpp
[    28.052] (II) NVIDIA: Using 12288.00 MB of virtual memory for indirect memory
[    28.052] (II) NVIDIA:     access.
[    28.088] (II) NVIDIA(0): Setting mode "1680x1050_60+0+0"

bad:

[    17.330] (**) NVIDIA(0): ConnectedMonitor string: "DFP"
[    21.472] (II) NVIDIA(0): NVIDIA GPU GeForce GTX 460 (GF104) at PCI:1:0:0 (GPU-0)
[    21.472] (--) NVIDIA(0): Memory: 1048576 kBytes
[    21.472] (--) NVIDIA(0): VideoBIOS: 70.04.13.00.01
[    21.472] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[    21.472] (II) NVIDIA(0): Validated MetaModes:
[    21.472] (II) NVIDIA(0):     "NULL"
[    21.472] (II) NVIDIA(0): Virtual screen size determined to be 640 x 480
[    21.472] (WW) NVIDIA(0): Unable to get display device for DPI computation.

So, workaround is “go & buy Radeon”?

Can you at least remove GTX 460 from “Supported Products” list of latest driver?
Because there is no linux distributions that actually supports it.