Problems with GeForce GTX 980 on ASUSTeK G20AJ

The organization I work for recently bought an ASUSTeK G20AJ, which comes with an 980 already installed. We intended to use it for some CUDA testing, but we’ve had some serious problems both when trying to install Linux (before the nvidia driver is initialized), and when trying to initialise the official driver.

The problems with installing is as follows: When trying to install the standard Ubuntu (14.04 or 15.04) Desktop version we only got a black screen, directly after starting the kernel. Trying with nomodeset / removing silent and splash did not change this. With the latest CentOS the kernel messages and early boot messages were displayed, but the computer hang on initializing graphics, even in the “low graphics mode”. I finally managed to install Ubuntu 15.04 using the alternative netboot image in text mode, while removing the vga modeset from the grub boot line and adding nomodeset. (I was assuming that the official Nvidia Linux driver would work)

After booting I can see the card:
root@jocuda:~# lshw -C display
*-display UNCLAIMED
description: VGA compatible controller
product: GM204 [GeForce GTX 980]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller cap_list
configuration: latency=0
resources: memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:e000(size=128)

On first initialization of the card this warning is shown (visible with dmesg):
[ 1.942343] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 1.946340] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
[ 1.946471] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13c0)
NVRM: installed in this system is not supported by the 352.09
NVRM: NVIDIA Linux driver release. Please see ‘Appendix
NVRM: A - Supported NVIDIA GPU Products’ in this release’s
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.

If you try to manually insert the module the driver actually crashes:

root@jocuda:~# modprobe nvidia_352
modprobe: ERROR: could not insert ‘nvidia_352’: No such device

og i dmesg:

[ 74.428908] WARNING: CPU: 0 PID: 1246 at /build/buildd/linux-3.19.0/fs/proc/generic.c:360 proc_register+0x135/0x1c0()
[ 74.428909] proc_dir_entry ‘driver/nvidia’ already registered
[ 74.428909] Modules linked in: nvidia(POE+) rfcomm bnep eeepc_wmi btusb asus_wmi sparse_keymap bluetooth video mxm_wmi intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm arc4 crct10dif_pclmul crc32_pclmul rtl8821ae ghash_clmulni_intel aesni_intel btcoexist rtl_pci aes_x86_64 lrw rtlwifi gf128mul glue_helper ablk_helper cryptd mac80211 serio_raw cfg80211 lpc_ich snd_hda_intel snd_hda_controller drm snd_hda_codec snd_hwdep mei_me mei shpchp snd_soc_rt5640 snd_soc_rl6231 snd_soc_core snd_compress snd_pcm_dmaengine wmi tpm_infineon snd_pcm snd_timer snd i2c_hid soundcore snd_soc_sst_acpi dw_dmac dw_dmac_core i2c_designware_platform i2c_designware_core 8250_dw spi_pxa2xx_platform acpi_pad mac_hid autofs4 hid_generic usbhid hid e1000e ahci ptp psmouse libahci pps_core sdhci_acpi sdhci
[ 74.428946] CPU: 0 PID: 1246 Comm: modprobe Tainted: P OE 3.19.0-18-generic #18-Ubuntu
[ 74.428947] Hardware name: ASUSTeK COMPUTER INC. G20AJ/G20AJ, BIOS 0703 04/29/2015
[ 74.428948] ffffffff81abae58 ffff8801f3ef3b58 ffffffff817c27cd 0000000000000007
[ 74.428949] ffff8801f3ef3ba8 ffff8801f3ef3b98 ffffffff8107593a ffff8800d8fa5f00
[ 74.428951] ffff8800d9360d00 ffff8801f682cf00 ffff8800d8fa5f85 ffff8801f682cf38
[ 74.428952] Call Trace:
[ 74.428957] [] dump_stack+0x45/0x57
[ 74.428959] [] warn_slowpath_common+0x8a/0xc0
[ 74.428961] [] warn_slowpath_fmt+0x46/0x50
[ 74.428963] [] ? proc_alloc_inum+0x36/0x140
[ 74.428965] [] proc_register+0x135/0x1c0
[ 74.428966] [] proc_mkdir_data+0x52/0x80
[ 74.428968] [] proc_mkdir_mode+0x13/0x20
[ 74.429018] [] nv_register_procfs+0x5c/0x210 [nvidia]
[ 74.429048] [] nvidia_init_module+0x2b1/0x6f5 [nvidia]
[ 74.429052] [] ? 0xffffffffc112c000
[ 74.429071] [] nvidia_frontend_init_module+0x87/0xb9 [nvidia]
[ 74.429074] [] do_one_initcall+0xd8/0x210
[ 74.429077] [] ? kmem_cache_alloc_trace+0x189/0x200
[ 74.429079] [] ? load_module+0x15a4/0x1ce0
[ 74.429081] [] load_module+0x15de/0x1ce0
[ 74.429082] [] ? store_uevent+0x40/0x40
[ 74.429084] [] SyS_finit_module+0x86/0xb0
[ 74.429087] [] system_call_fastpath+0x16/0x1b
[ 74.429088] —[ end trace c8dcd46c6115082a ]—
[ 74.429090] NVRM: failed to register procfs!
[ 74.429170] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=io+mem
[ 74.429186] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13c0)
NVRM: installed in this system is not supported by the 352.09
NVRM: NVIDIA Linux driver release. Please see ‘Appendix
NVRM: A - Supported NVIDIA GPU Products’ in this release’s
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[ 74.429217] nvidia: probe of 0000:01:00.0 failed with error -1
[ 74.429234] Error: Driver ‘nvlink’ is already registered, aborting…
[ 74.429434] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 74.429435] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 74.429436] [drm] Module unloaded
[ 74.429497] NVRM: NVIDIA init module failed!

What I’ve tried:

  • Disabling the built-in igpu
  • Upgrading the bios/firmware
  • All available Nvidia drivers, e.g. the latest beta, the latest xorg-edgers, and so on
  • Changing the PCI Express settings visible in the bios

None of this had any effect. Switching to anything else than (I assume) EGA from Linux makes the display go black. The keyboard and network still works, so I’ve been able to test the various driver versions and such.

I do realise that the problems we’ve experienced might be on a lower level than the GTX 980, but as Windows 8.1 works on the computer I think it should be possible to get Linux to work on it as well.

Erik

Please follow the steps here to attach a complete bug report.

This was solved with adding the following two parameters to the kernel (e.g from the grub command line and GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub for Ubuntu):

pci=nocrs pci=realloc

So, there is probably a bug somewhere, but if it is in the linux kernel, the chipset on the motherboard or on the custom 980 in this machine is unknown. (The computer in question is closer to laptop than to a traditional desktop, and the two arguments above can be useful for laptops)