Could not load nvidia kernel module for vgpu (no such device) on RHEL7

Hello,

I’m trying to install the NVidia driver for vGPU on RHEL7 but the vGPU is apparently not recognized by the driver. I have the same error with the following drivers:

  • NVIDIA-Linux-x86_64-384.111-grid.run
  • NVIDIA-Linux-x86_64-390.42-grid.run

Nouveau is blacklisted and disabled:

[root@testgpu ~]# lsmod | grep -i nouveau
[root@testgpu ~]#

The vGPU is seen by the kernel:

[root@testgpu ~]# lspci | grep -i nvidia
02:00.0 VGA compatible controller: NVIDIA Corporation GP100GL (rev a1)
[root@testgpu ~]#

Kernel used is:

[root@testgpu ~]# uname -r
3.10.0-514.26.2.el7.x86_64
[root@testgpu ~]#

Kernel headers and dev packages are installed:

[root@testgpu ~]# rpm -qa | grep "^kernel.*$(uname -r)"
kernel-devel-3.10.0-514.26.2.el7.x86_64
kernel-tools-libs-3.10.0-514.26.2.el7.x86_64
kernel-tools-3.10.0-514.26.2.el7.x86_64
kernel-3.10.0-514.26.2.el7.x86_64
kernel-headers-3.10.0-514.26.2.el7.x86_64
[root@testgpu ~]#

dkms is also installed.

I just ran the installer without any option, said yes to dkms and no to 32bits. Here’s the content of the /var/log/nvidia-installer.log file:

[root@testgpu ~]# cat /var/log/nvidia-installer.log 
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Wed Apr 18 10:19:47 2018
installer version: 384.111

PATH: /usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/puppetlabs/bin:/root/bin

nvidia-installer command line:
    ./nvidia-installer

Unable to load: nvidia-installer ncurses v6 user interface

Using: nvidia-installer ncurses user interface
-> Detected 1 CPUs online; setting concurrency level to 1.
-> Tagging shared libraries with chcon -t textrel_shlib_t.
-> License accepted.
-> Installing NVIDIA driver version 384.111.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Installing both new and classic TLS OpenGL libraries.
-> Installing both new and classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility libraries? (Answer: No)
-> Will install GLVND GLX client libraries.
-> Will install GLVND EGL client libraries.
-> Skipping GLX non-GLVND file: "libGL.so.384.111"
-> Skipping GLX non-GLVND file: "libGL.so.1"
-> Skipping GLX non-GLVND file: "libGL.so"
-> Skipping EGL non-GLVND file: "libEGL.so.384.111"
-> Skipping EGL non-GLVND file: "libEGL.so"
-> Skipping EGL non-GLVND file: "libEGL.so.1"
Looking for install checker script at ./libglvnd_install_checker/check-libglvnd-install.sh
   executing: '/bin/sh ./libglvnd_install_checker/check-libglvnd-install.sh'...
   Checking for libglvnd installation.
   Checking libGLdispatch...
   Can't load library libGLdispatch.so.0: libGLdispatch.so.0: cannot open shared object file: No such file or directory
Will install libglvnd libraries.
Will install libEGL vendor library config file to /usr/share/glvnd/egl_vendor.d
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (384.111):
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-glcore.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/extensions/libglx.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-tls.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/tls/libnvidia-tls.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGLX_nvidia.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libOpenGL.so.0'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGLESv1_CM.so.1'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGLESv2.so.2'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGLdispatch.so.0'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGLX.so.0'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGL.so.1.0.0'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libEGL.so.1'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/drivers/nvidia_drv.so'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/libnvidia-wfb.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-gtk2.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-gtk3.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-cfg.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-ml.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/nvidia/gridd/libFlxCore64.so.2015.03'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/nvidia/gridd/libFlxComm64.so.2015.03'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/vdpau/libvdpau_nvidia.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libcuda.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-opencl.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libOpenCL.so.1.0.0'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-fatbinaryloader.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-ptxjitcompiler.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvcuvid.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-encode.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-ifr.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-fbc.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-compiler.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-eglcore.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-glsi.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libEGL_nvidia.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGLESv2_nvidia.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGLESv1_CM_nvidia.so.384.111'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-egl-wayland.so.1.0.1'...
   executing: '/usr/sbin/ldconfig'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
[root@testgpu ~]#

The modules are installed successfully:

[root@testgpu ~]# find /usr/lib/modules -name "*.ko" | grep -i nvidia
/usr/lib/modules/3.10.0-514.el7.x86_64/weak-updates/nvidia-uvm.ko
/usr/lib/modules/3.10.0-514.el7.x86_64/weak-updates/nvidia-modeset.ko
/usr/lib/modules/3.10.0-514.el7.x86_64/weak-updates/nvidia-drm.ko
/usr/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/nvidia.ko
/usr/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/nvidia-uvm.ko
/usr/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/nvidia-modeset.ko
/usr/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/nvidia-drm.ko
[root@testgpu ~]#

Kernel messages are:

[root@testgpu ~]# dmesg | grep -i nvidia
[  438.635687] nvidia: loading out-of-tree module taints kernel.
[  438.635692] nvidia: module license 'NVIDIA' taints kernel.
[  438.640539] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  438.644476] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[  438.644756] nvidia 0000:02:00.0: enabling device (0100 -> 0103)
[  438.646329] NVRM: The NVIDIA GPU 0000:02:00.0 (PCI ID: 10de:15f8)
NVRM: NVIDIA 384.111 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: specific graphics driver download page at www.nvidia.com.
[  438.646530] nvidia: probe of 0000:02:00.0 failed with error -1
[  438.646542] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  438.646543] NVRM: None of the NVIDIA graphics adapters were initialized!
[  438.646596] nvidia-nvlink: Unregistered the Nvlink Core, major device number 246
[root@testgpu ~]#

When trying with modprobe:

[root@testgpu ~]# modprobe -v nvidia
insmod /lib/modules/3.10.0-514.26.2.el7.x86_64/extra/nvidia.ko 
modprobe: ERROR: could not insert 'nvidia': No such device
[root@testgpu ~]#

The physical GPU is a Tesla P100, so I also tried with the driver for this card but without success.
Can anyone tell me if (and what) I’m doing wrong ?

Thanks.

The -grid driver is for VM guests, are you installing them on bare metal?

Hello generix,

I am installing in a VM guest.

If it’s a VM, doesn’t look good:
NVIDIA-Linux-x86_64-390.42-grid.run (only available with M10/M6/M60)

See this guide how to create the supported vgpu type:
https://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf

We followed the newer version of this guide that came with grid software.