ERROR: Unable to load the 'nvidia-drm' kernel module - CentOS 7 x86_64, version 396.54

I’ve posted here (https://forums.geforce.com/default/topic/1101187/geforce-drivers/error-unable-to-load-the-nvidia-drm-kernel-module-centos-7-x86_64-version-396-54) but I was recommended to post in this forum instead.

We’ve been having a problem with installing Nvidia drivers for some time now (several versions). Here’s an example:

$ curl http://download.nvidia.com/XFree86/Linux-x86_64/396.54/NVIDIA-Linux-x86_64-396.54.run > nvidia.bin
$ chmod a+rx nvidia.bin
$ ./nvidia.bin -a -q -s -X -Z -z --no-x-check --dkms
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.54..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

WARNING: Unable to determine the path to install the libglvnd EGL vendor
         library config files. Check that you have pkg-config and the
         libglvnd development libraries installed, or specify a path with
         --glvnd-egl-config-path.

ERROR: Unable to load the 'nvidia-drm' kernel module.

ERROR: Installation has failed.  Please see the file
       '/var/log/nvidia-installer.log' for details.  You may find
       suggestions on fixing installation problems in the README available
       on the Linux driver download page at www.nvidia.com.

I paste here the entire nvidia-installer.log:

nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Tue Mar  5 12:53:22 2019
installer version: 396.54

PATH: <snip>

nvidia-installer command line:
    ./nvidia-installer
    -a
    -q
    -s
    -X
    -Z
    -z
    --no-x-check
    --dkms

Using built-in stream user interface
-> Detected 16 CPUs online; setting concurrency level to 16.
-> Tagging shared libraries with chcon -t textrel_shlib_t.
-> Installing NVIDIA driver version 396.54.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Installing both new and classic TLS OpenGL libraries.
-> Installing both new and classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility libraries? (Answer: Yes)
-> Will install GLVND GLX client libraries.
-> Will install GLVND EGL client libraries.
-> Skipping GLX non-GLVND file: "libGL.so.396.54"
-> Skipping GLX non-GLVND file: "libGL.so.1"
-> Skipping GLX non-GLVND file: "libGL.so"
-> Skipping EGL non-GLVND file: "libEGL.so.396.54"
-> Skipping EGL non-GLVND file: "libEGL.so"
-> Skipping EGL non-GLVND file: "libEGL.so.1"
-> Skipping GLX non-GLVND file: "./32/libGL.so.396.54"
-> Skipping GLX non-GLVND file: "libGL.so.1"
-> Skipping GLX non-GLVND file: "libGL.so"
-> Skipping EGL non-GLVND file: "./32/libEGL.so.396.54"
-> Skipping EGL non-GLVND file: "libEGL.so"
-> Skipping EGL non-GLVND file: "libEGL.so.1"
Looking for install checker script at ./libglvnd_install_checker/check-libglvnd-install.sh
   executing: '/bin/sh ./libglvnd_install_checker/check-libglvnd-install.sh'...
   Checking for libglvnd installation.
   Checking libGLdispatch...
   Checking libGLdispatch dispatch table
   Checking call through libGLdispatch
   All OK
   libGLdispatch is OK
   Checking for libGLX
   libGLX is OK
   Checking for libEGL
   libEGL is OK
   Checking entrypoint library libOpenGL.so.0
   Checking call through libGLdispatch
   Checking call through library libOpenGL.so.0
   dlopen("libOpenGL.so.0") failed: libOpenGL.so.0: cannot open shared object file: No such file or directory
   Checking entrypoint library libGL.so.1
   Checking call through libGLdispatch
   Checking call through library libGL.so.1
   All OK
   Entrypoint library libGL.so.1 is OK
   
   Found libglvnd libraries: libGL.so.1 libEGL.so.1 libGLX.so.0 libGLdispatch.so.0 
   Missing libglvnd libraries: libOpenGL.so.0 
   
-> An incomplete installation of libglvnd was found. All of the essential libglvnd libraries are present, but one or more optional components are missing. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries. (Answer: Don't install libglvnd files)
Will not install libglvnd libraries.
-> Skipping GLVND file: "libOpenGL.so.0"
-> Skipping GLVND file: "libOpenGL.so"
-> Skipping GLVND file: "libGLESv1_CM.so.1.2.0"
-> Skipping GLVND file: "libGLESv1_CM.so.1"
-> Skipping GLVND file: "libGLESv1_CM.so"
-> Skipping GLVND file: "libGLESv2.so.2.1.0"
-> Skipping GLVND file: "libGLESv2.so.2"
-> Skipping GLVND file: "libGLESv2.so"
-> Skipping GLVND file: "libGLdispatch.so.0"
-> Skipping GLVND file: "libGLX.so.0"
-> Skipping GLVND file: "libGLX.so"
-> Skipping GLVND file: "libGL.so.1.7.0"
-> Skipping GLVND file: "libGL.so.1"
-> Skipping GLVND file: "libGL.so"
-> Skipping GLVND file: "libEGL.so.1.1.0"
-> Skipping GLVND file: "libEGL.so.1"
-> Skipping GLVND file: "libEGL.so"
-> Skipping GLVND file: "./32/libOpenGL.so.0"
-> Skipping GLVND file: "libOpenGL.so"
-> Skipping GLVND file: "./32/libGLdispatch.so.0"
-> Skipping GLVND file: "./32/libGLESv2.so.2.1.0"
-> Skipping GLVND file: "libGLESv2.so.2"
-> Skipping GLVND file: "libGLESv2.so"
-> Skipping GLVND file: "./32/libGLESv1_CM.so.1.2.0"
-> Skipping GLVND file: "libGLESv1_CM.so.1"
-> Skipping GLVND file: "libGLESv1_CM.so"
-> Skipping GLVND file: "./32/libGL.so.1.7.0"
-> Skipping GLVND file: "libGL.so.1"
-> Skipping GLVND file: "libGL.so"
-> Skipping GLVND file: "./32/libGLX.so.0"
-> Skipping GLVND file: "libGLX.so"
-> Skipping GLVND file: "./32/libEGL.so.1.1.0"
-> Skipping GLVND file: "libEGL.so.1"
-> Skipping GLVND file: "libEGL.so"
WARNING: Unable to determine the path to install the libglvnd EGL vendor library config files. Check that you have pkg-config and the libglvnd development libraries installed, or specify a path with --glvnd-egl-config-path.
Will install libEGL vendor library config file to /usr/share/glvnd/egl_vendor.d
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (396.54):
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-glcore.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/extensions/libglx.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-tls.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/tls/libnvidia-tls.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libGLX_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-glvkspirv.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/drivers/nvidia_drv.so'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/libnvidia-wfb.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-gtk2.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-gtk3.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-cfg.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-ml.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/vdpau/libvdpau_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libcuda.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-opencl.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libOpenCL.so.1.0.0'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-fatbinaryloader.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-ptxjitcompiler.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvcuvid.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-encode.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-ifr.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-fbc.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-compiler.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-eglcore.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-glsi.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libEGL_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libGLESv2_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libGLESv1_CM_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-egl-wayland.so.1.0.3'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libcuda.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-fatbinaryloader.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-ptxjitcompiler.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-ml.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libOpenCL.so.1.0.0'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-compiler.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-opencl.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libGLX_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-glcore.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-tls.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/tls/libnvidia-tls.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-glvkspirv.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/vdpau/libvdpau_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvcuvid.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-encode.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-eglcore.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-glsi.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libEGL_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libGLESv2_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libGLESv1_CM_nvidia.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-ifr.so.396.54'...
   executing: '/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-fbc.so.396.54'...
   executing: '/sbin/ldconfig'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

After reboot, the GPUs work fine, even with this error. Do you have any insight on how to solve it? I’ve looked into http://download.nvidia.com/XFree86/Linux-x86_64/396.54/README/knownissues.html but I could not find anything related with this error.

Thank you,
João

Since you’re installing using the --no-x-check option, I suspect there’s already an Xserver running so there are modules loaded and can’t be unloaded thus the newly installed modules can’t be loaded.
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
[url]https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/[/url]

Hi generix,

Thanks for your answer. I am running the above commands in an installation script that is being run on a machine that was installed with Minimal Centos 7, which starts in init level 3; there is no X server yet. That said, it is run after I install gdm, which will boot with X server next time, so maybe there’s some influence there.

I’ll run the command on a clean installation. If it still provides the same result, I will run the nvidia-bug-report.sh and attach it here.

Thanks a lot!
João

Ok, then there might be already the nouveau module loaded, maybe check for it. You’re not checking for it but blacklisting it using the installer -z -Z options so after reboot it works.

During that script I uninstall the nouveau module. So you’re telling me that, despite the uninstallation, I still need to unload it. I see, I’ll try to go that route.

I am following a procedure similar to here: Installing NVIDIA Drivers on RHEL or CentOS 7 - Advanced Clustering Technologies, but I do not reboot as many times: I just reboot once after the post-installation procedure. But if I do so, I have the error I mentioned initially. It seems like, if I want to avoid that error, the only way is by having an additional reboot to have the machine start without the nouveau module loaded (and complicate things if I don’t have an on-board VGA) and run the NVIDIA installer only then, which should not error anymore.

Is there any way to avoid this additional reboot? Some googling seems to show that it is not possible, at least in Ubuntu, since rmmod nouveau is impossible with it loaded.

Any suggestions? If not, I guess that I will have to ignore the NVIDIA installer error, but it makes debugging a failed installation much harder.

Thanks,
João

The nvidia cards of course have a vga bios so without nouveau, the kernel will just fall back to a vga console.
If you’re rebooting before the nvidia driver installation, you can simply create a nouveau blacklist file in /etc/modprobe.d and run dracut -f.
Otherwise, you would have to use the kernel parameter nouveau.modeset=0 from the start to keep nouveau from grabbing the console.

Of course, you could also just change from .run installer to packaged driver, e.g. rpmfusion.

Hi all,

Thanks generix for all the help.

Basically, from your feedback, and from my trials, it’s not possible to do nouveau blacklisting and nvidia driver installation at the same time. As such, I divided my script in two, with the first one doing all the necessary to blacklist nouveau, and a second one, which does all that I want to do, including the nvidia driver installation. The reboot in between both allows the nouveau to not be picked up, and the nvidia driver installation happens without any issue.

Thank you for all the help.
João