Ubuntu Server 17.10, nVidia v390.30 driver, Cuda V9.0.176 : Cannot load nvidia driver since kernel update

Hi,

I’m using Ubuntu Server 17.10, nVidia v 390.30 driver Cuda V9.0.176 : Cannot load nvidia driver since kernel update : 4.13.0-39-generic.

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ echo $?
9
$ lsmod | grep nvidia
$ echo $?
1
$ glxinfo | grep -i nvidia
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Error: couldn't find RGB GLX visual or fbconfig
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
Xlib:  extension "GLX" missing on display ":0.0".
$ echo $?
1
$ dpkg -l | grep nvidia
rc  nvidia-384                                384.111-0ubuntu0.17.10.1                                   amd64        NVIDIA binary driver - version 384.111
rc  nvidia-387                                387.34-0ubuntu0~gpu17.10.2                                 amd64        NVIDIA binary driver - version 387.34
ii  nvidia-390                                390.30-0ubuntu1                                            amd64        NVIDIA binary driver - version 390.30
ii  nvidia-390-dev                            390.30-0ubuntu1                                            amd64        NVIDIA binary Xorg driver development files
ii  nvidia-390-diagnostic                     390.30-0ubuntu1                                            amd64        NVIDIA driver diagnostics utilities
ii  nvidia-modprobe                           390.30-0ubuntu1                                            amd64        Load the NVIDIA kernel driver and create device files
rc  nvidia-opencl-icd-384                     384.111-0ubuntu0.17.10.1                                   amd64        NVIDIA OpenCL ICD
rc  nvidia-opencl-icd-387                     387.34-0ubuntu0~gpu17.10.2                                 amd64        NVIDIA OpenCL ICD
ii  nvidia-opencl-icd-390                     390.30-0ubuntu1                                            amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                              0.8.5                                                      amd64        Tools to enable NVIDIA's Prime
hi  nvidia-settings                           390.48-0ubuntu0~gpu17.10.1                                 amd64        Tool for configuring the NVIDIA graphics driver

Can you please help me ?

Regards.

Sébastien MANSFELD
gpu-manager.log (2 KB)
Xorg.0.log (23.3 KB)

Check
gcc --version
it should be
5.5.0-1ubuntu2
if not, upgrade your system, then rebuild the kernel modules either by using dkms or purge/reinstall the kernel.
Otherwise, please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post.

$ gcc --version
gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nvidia-bug-report.sh says :

Running nvidia-bug-report.sh...ls: cannot access '/proc/driver/nvidia/./gpus/': No such file or directory

If the bug report script hangs after this point consider running with
--safe-mode and --extra-system-data command line arguments.

 complete.

I’ll run it once more with --safe-mode and --extra-system-data command line arguments.

nvidia-bug-report.log.gz (68.4 KB)

This is the correct gcc 7 version, so please attach the log.gz to have more info.

It’s attached now, please refresh this page with “CTRL+SHIFT+R”

There’s no kernel driver installed, you probably forgot to use the dkms option on install. This would rebuild the kernel module on kernel change. Just reinstall the driver and say ‘y’ when asked to use dkms.

There was one on the previous linux kernels. Do you know what package on which I can run “sudo dpkg-reconfigure” to rebuilt the nvidia kernel module on Ubuntu ?

Are you sure you installed from .deb? Then it’s strange that dkms didn’t work. Redownload here: [url]Tesla Driver for Ubuntu 17.04 | 390.30 | Linux 64-bit Ubuntu 17.04 | NVIDIA
Or just use the graphics ppa to install a later driver:
[url]https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa[/url]

I did install from the deb (see my dpkg output on the first post if this thread).

In my configuration, I use two repositories :

I have the "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64 " repository for cuda-9-0 and the “Index of /graphics-drivers/ppa/ubuntu” repository for nvidia-390 , is this normal ?

During the reinstallation (I’ve just uninstalled nvidia-* and cuda-9-0), I see this :

Setting up nvidia-390 (390.48-0ubuntu0~gpu17.10.3) ...
update-alternatives: using /usr/lib/nvidia-390/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_GL.conf (i386-linux-gnu_gl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-390/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_EGL.conf (i386-linux-gnu_egl_conf) in auto mode
update-alternatives: using /usr/share/nvidia-390/glamor.conf to provide /usr/share/X11/xorg.conf.d/glamoregl.conf (glamor_conf) in auto mode
/sbin/ldconfig.real: /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link

dpkg: error: version '-' has bad syntax: revision number is empty
dpkg: error: version '-' has bad syntax: revision number is empty
dpkg: error: version '-' has bad syntax: revision number is empty
dpkg: error: version '-' has bad syntax: revision number is empty

I post this before I reboot and I’ll let you know if it works.

It did not work :-(

It’s 9:45 PM in France so I have to leave now.

If you have any idea, please let me know and I’ll try tomorrow.

There seems to be an error in package lists.
You should be able to install the latest driver using
sudo apt install nvidia-396

It’s going to remove cuda-9-0, is this normal :

The following packages will be REMOVED:
   cuda-9-0 (9.0.176-1)
   cuda-demo-suite-9-0 (9.0.176-1)
   cuda-drivers (390.30-1)
   cuda-runtime-9-0 (9.0.176-1)
   libcuda1-390 (390.48-0ubuntu0~gpu17.10.3)
   nvidia-390 (390.48-0ubuntu0~gpu17.10.3)
   nvidia-390-dev (390.48-0ubuntu0~gpu17.10.3)
   nvidia-opencl-icd-390 (390.48-0ubuntu0~gpu17.10.3)
The following NEW packages will be installed:
   nvidia-396 (396.18-0ubuntu0~gpu17.10.2)

I also see this error :

The following packages were automatically installed and are no longer required:
   cuda-libraries-9-0 (9.0.176-1)
dpkg: error: version '-' has bad syntax: revision number is empty
dpkg: error: version '-' has bad syntax: revision number is empty
dpkg: error: version '-' has bad syntax: revision number is empty
...
/sbin/ldconfig.real: /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link

Still not working after installing driver v396 and rebooting :

$ lsmod | grep nvidia
$

Ok, I really have to leave.

Thank you very very much for your time and your help.

I hope you will also be available tomorrow.

Then the kernel is probably not registered with dkms. Reinstall cuda as it was before, then purge and reinstall the kernel.

Hi Generix,

Thanks for answer.

I don’t know anything about dkms, can you give some dkms commands to help investigate ?

The start would be

dkms status

lists all modules that have been built and for which kernel.
If you want to reinstall a module for a specific kernel

sudo dkms remove nvidia-384/384.111 -k 4.4.0-116-generic
sudo dkms install nvidia-384/384.111 -k 4.4.0-116-generic

To be safe, run

sudo update-initramfs -u

afterwards.

Here is the output of the commands :

$ dkms status
bbswitch, 0.8: added
nvidia-390, 390.48: added
virtualbox, 5.1.34, 4.13.0-36-generic, x86_64: installed
virtualbox-guest, 5.1.34, 4.13.0-36-generic, x86_64: installed
$ sudo dkms remove nvidia-390/390.48 -k $(uname -r)
Error! There is no instance of nvidia-390 390.48
for kernel 4.13.0-39-generic (x86_64) located in the DKMS tree.
$ sudo dkms install nvidia-390/390.48 -k $(uname -r)
Error! Your kernel headers for kernel 4.13.0-39-generic cannot be found.
Please install the linux-headers-4.13.0-39-generic package,
or use the --kernelsourcedir option to tell DKMS where it's located
$ sudo apt install -V linux-headers-generic
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
   linux-headers-4.13.0-39 (4.13.0-39.44)
   linux-headers-4.13.0-39-generic (4.13.0-39.44)
The following NEW packages will be installed:
   linux-headers-4.13.0-39 (4.13.0-39.44)
   linux-headers-4.13.0-39-generic (4.13.0-39.44)
   linux-headers-generic (4.13.0.39.42)
0 upgraded, 3 newly installed, 0 to remove and 1 not upgraded.
Need to get 11.6 MB of archives.
After this operation, 83.3 MB of additional disk space will be used.
Do you want to continue? [Y/n] 
Get:1 http://fr.archive.ubuntu.com/ubuntu artful-security/main amd64 linux-headers-4.13.0-39 all 4.13.0-39.44 [10.9 MB]
Get:2 http://fr.archive.ubuntu.com/ubuntu artful-security/main amd64 linux-headers-4.13.0-39-generic amd64 4.13.0-39.44 [704 kB]
Get:3 http://fr.archive.ubuntu.com/ubuntu artful-security/main amd64 linux-headers-generic amd64 4.13.0.39.42 [2,294 B]
Fetched 11.6 MB in 0s (39.0 MB/s)
Selecting previously unselected package linux-headers-4.13.0-39.
(Reading database ... 295890 files and directories currently installed.)
Preparing to unpack .../linux-headers-4.13.0-39_4.13.0-39.44_all.deb ...
Unpacking linux-headers-4.13.0-39 (4.13.0-39.44) ...
Selecting previously unselected package linux-headers-4.13.0-39-generic.......] 
Preparing to unpack .../linux-headers-4.13.0-39-generic_4.13.0-39.44_amd64.deb ...
Unpacking linux-headers-4.13.0-39-generic (4.13.0-39.44) ...........................] 
Selecting previously unselected package linux-headers-generic.
Preparing to unpack .../linux-headers-generic_4.13.0.39.42_amd64.deb ...
Unpacking linux-headers-generic (4.13.0.39.42) ...
Setting up linux-headers-4.13.0-39 (4.13.0-39.44) ...
Setting up linux-headers-4.13.0-39-generic (4.13.0-39.44) ...
Examining /etc/kernel/header_postinst.d.
run-parts: executing /etc/kernel/header_postinst.d/dkms 4.13.0-39-generic /boot/vmlinuz-4.13.0-39-generic
Setting up linux-headers-generic (4.13.0.39.42) ...
$ sudo dkms install nvidia-390/390.48 -k $(uname -r)
Module nvidia-390/390.48 already installed on kernel 4.13.0-39-generic/x86_64
$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.13.0-39-generic

Let me reboot, to see if it works and I’ll let you know.

It works, I don’t know how or why theses headers disappeared, anyway…

Thanks you very much, God bless you abundantly !