I had the GPU perfectly installed with driver 370.26 (using .run installation). I had it along with CUDA and cuDNN. Last week, after some regular system updates I got into a login loop.
It was not my first login loop, and after some struggle and using https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07#file-install-nvidia-driver-and-cuda-md, I’ve managed to login. However the installation returned Unable to load the ‘nvidia-drm’ kernel module
Output:
→ Installing ‘NVIDIA Accelerated Graphics Driver for Linux-x86_64’ (390.42):
executing: ‘/sbin/ldconfig’…
→ done.
→ Driver file installation is complete.
→ Installing DKMS kernel module:
→ done.
ERROR: Unable to load the ‘nvidia-drm’ kernel module.
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
Previously, I’ve tried to use PPA installation using
Your setup is really confusing. It seems the monitor is connected to the intel gpu which doesn’t work because you’re using kernel parameter ‘nomodeset’. Using the installer option --no-opengl-files tells that you’re indeed using graphics on the iGPU and the nvidia for CUDA only. The kernel driver is installed, but it seems to be blacklisted, do a
grep -i nvidia /etc/modprobe.d/*
to look for unusual entries.
You are right, the monitor is indeed connected to the iGPU. I thought that this will prevent the login loops during installation. Do you advise something else? Should I drop the –no-opengl-files and change the installation commands?
**update
I’ve tried to re-install the driver with the monitor connected to the nvidia-card. Got into login-loop with the same error
ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
Please remove the file
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf
and reinstall using the ubuntu package.
If the module still doesn’t load (use dmesg |grep -i nvidia to check) use
sudo modprobe -v nvidia to load the module and post any errors.
Sorry, but what do you mean in ubuntu package - ppa:graphics-drivers repo? nvidia-current? .deb? .run?
Also, does it matter if I use the intel-GPU or the nvidia-GPU with the monitor while doing the install?
removed the nvidia-installer-disable-nouveau.conf file (BTW there is still a blacklist-nouveau.conf that I’ve created according to aforementioned installation instructions)
tried -
sudo service lightdm stop
chmod +x NVIDIA-Linux-x86_64-390.42.run
sudo ./NVIDIA-Linux-x86_64-390.42.run --dkms
and got the ERROR: Unable to load the ‘nvidia-drm’ kernel module. with a login loop…
Next, I’ve uninstalled and followed:
dmesg |grep -i nvidia
[ 3.588359] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
[ 3.588495] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input15
[ 3.588545] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input16
And trying to load the module failed
sudo modprobe -v nvidia
modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.4.0-119-generic
Should I maybe use UEFI and not LEGACY mode in th BIOS configurations?
**Also in dmesg without the filtering, I got the following errors
[ 3.016684] snd_hda_intel 0000:00:1f.3: failed to add i915_bpo component master (-19)
[ 3.307765] vboxdrv: version magic '4.4.0-119-generic SMP mod_unload modversions ' should be '4.4.0-119-generic SMP mod_unload modversions retpoline '
[ 12.706358] vboxdrv: version magic '4.4.0-119-generic SMP mod_unload modversions ' should be '4.4.0-119-generic SMP mod_unload modversions retpoline '
Your gcc is too old, all new ubuntu kernels are compiled with retpoline mitigation, so you the gcc needs retpoline support, too to compile a loadable module. Upgrade your system/HWE.
Changing to an old kernel is just a temporary workaround, you should update your system. Something is wrong there, your system reports version 16.04.4 which is current but your kernel, xorg and gcc are the ones from 16.04.0, the initial release. So you might have some software that blocks proper updates. Run
sudo apt update
sudo apt upgrade
and carefully read what’s going on like deferred packages.
stezi, please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post will reveal a paperclip icon.
[ 9.112473] NVRM: API mismatch: the client has the version 440.82, but
NVRM: this kernel module has the version 390.132. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
Please remove all driver packages
sudo apt remove nvidia*
then reinstall the 440 driver
Hi, I’m also getting this error but haven’t been able to figure out the issue in my case. I’m using CentOs 7 and followed these instructions to try to install the correct driver for my 1050 GTX. I’m uploading the nvidia-bug-report.sh output nvidia-bug-report.log (406.6 KB) .
whats happened is that nvidia-detect suggest legacy driver series 390 and when I install nvidia-driver ir automatic install module version 390.132 . After removing all nvidia components and blacklist nouveau ( echo “blacklist nouveau” >> /etc/modprobe.d/blacklist.config ) I was able to get the best resolution of my gpu so far… but in the end i’m not using any nvidia driver. Should I install manually the 440.82 driver? Would I get any benefit from that?