Driver update fails on Ubuntu after Cuda-6.5 install from .deb

Hello,

I have the following problem with Cuda installation on my Ubuntu-12.04 server.

  1. I installed Cuda-6.5 from the .deb package, which I downloaded from nVidia site, file cuda-repo-ubuntu1204-6-5-prod_6.5-19_amd64.deb. As part of cuda installation, this procedure also installs nVidia kernel driver, as far as I understand.

  2. I updated the driver to the newer one, using the executable .run file from Nvidia site, file NVIDIA-Linux-x86_64-346.47.run

The system was doing great after step 1: my GPUs were being recognized and I could run cuda code on them. But once I executed step 2, the driver wouldn’t load and the GPUs became inaccessible. The kernel log file had the following messages that I found relevant:

Mar 12 23:04:32 xxx kernel: [   25.112715] NVRM: API mismatch: the client has the version 346.47, but
Mar 12 23:04:32 xxx kernel: [   25.112716] NVRM: this kernel module has the version 343.19.  Please
Mar 12 23:04:32 xxx kernel: [   25.112717] NVRM: make sure that this kernel module and all NVIDIA driver
Mar 12 23:04:32 xxx kernel: [   25.112717] NVRM: components have the same version.

So, the kernel driver was updated kind-of partially when I executed the driver .run file. Indeed, one can see references to two different versions 343.19 and 346.17 in the above log, and these two come from the two steps of my installation procedure. The .run file of step 2 was actually rather explicit about the possibility of an issue: it warned me that there might be a conflict between different installation methods.

BTW, I remedied the situation by running (note --reinstall):

sudo apt-get install --reinstall nvidia-343 nvidia-modprobe nvidia-settings

Hence my question: what is the official method of updating the kernel driver after Cuda has been installed using the .deb package?

Thanks!

The .run file installer for the driver should have taken care of this. I suspect that your .run file install did not complete successfully. The kernel module interface must be rebuilt, it looks like that did not happen for some reason. Perhaps it could not find the appropriate symbols/headers for the kernel you have installed.

Try re-installing the 346.47 via the .run file and look for error messages or look in the installer log.

I’ve rerun the .run file. Same effect as before. Here’s the log file:

>cat /var/log/nvidia-installer.log 
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Fri Mar 13 19:23:37 2015
installer version: 346.47

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer

Using: nvidia-installer ncurses user interface
-> Detected 8 CPUs online; setting concurrency level to 8.
-> License accepted.
-> Installing NVIDIA driver version 346.47.
-> The NVIDIA driver appears to have been installed previously using a different installer. To prevent potential conflicts, it is recommended either to update the existing installation using the same mechanism by which it was originally installed, or to uninstall the existing installation before installing this driver.

Please review the message provided by the maintainer of this alternate installation method and decide how to proceed:

The package that is already installed is named nvidia-343.

You can upgrade the driver by running:
`apt-get install nvidia-343 nvidia-modprobe nvidia-settings`

You can remove nvidia-343, and all related packages, by running:
`apt-get remove --purge nvidia-343 nvidia-modprobe nvidia-settings`

This package is maintained by NVIDIA (cudatools@nvidia.com).


(Answer: Continue installation)
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Installing both new and classic TLS OpenGL libraries.
-> Installing both new and classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility libraries? (Answer: Yes)
-> Skipping installation of the libvdpau wrapper library.
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (346.47):
   executing: '/sbin/ldconfig'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
-> Running post-install sanity check:
-> done.
-> Post-install sanity check passed.
-> Running runtime sanity check:
-> done.
-> Runtime sanity check passed.
-> Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X?  Any pre-existing X configuration file will be backed up. (Answer: No)
-> Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 346.47) is now complete.  Please update your XF86Config or xorg.conf file as appropriate; see the file /usr/share/doc/NVIDIA_GLX-1.0/README.txt for details.

Seems pretty self-explanatory?

-> The NVIDIA driver appears to have been installed previously using a different installer. To prevent potential conflicts, it is recommended either to update the existing installation using the same mechanism by which it was originally installed, or to uninstall the existing installation before installing this driver.
...
You can remove nvidia-343, and all related packages, by running:
`apt-get remove --purge nvidia-343 nvidia-modprobe nvidia-settings`

So do that first. Then run your .run file installer.

Tried doing this. This seems to want to remove all of Cuda packages:

>sudo apt-get remove --purge nvidia-343 nvidia-modprobe nvidia-settings
[sudo] password for xxx: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  cuda-cusparse-dev-6-5 cuda-cublas-dev-6-5 cuda-samples-6-5
  cuda-command-line-tools-6-5 cuda-cufft-6-5 cuda-curand-dev-6-5
  cuda-documentation-6-5 cuda-driver-dev-6-5 cuda-cublas-6-5 cuda-toolkit-6-5
  cuda-cudart-dev-6-5 cuda-license-6-5 cuda-npp-6-5 cuda-cudart-6-5
  openjdk-7-jre-lib cuda-cusparse-6-5 cuda-curand-6-5 cuda-core-6-5
  cuda-visual-tools-6-5 cuda-cufft-dev-6-5 libkms1 cuda-npp-dev-6-5
  cuda-misc-headers-6-5
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  cuda* cuda-6-5* cuda-drivers* cuda-runtime-6-5* nvidia-343* nvidia-343-dev*
  nvidia-343-uvm* nvidia-modprobe* nvidia-settings*
0 upgraded, 0 newly installed, 9 to remove and 0 not upgraded.
After this operation, 242 MB disk space will be freed.
Do you want to continue [Y/n]?

Seems like that defies the purpose of the driver update.

I dislike the package methods personally.

You can probably still update your driver if that is your intent, using the package manager method, to whatever is currently available in the repo.

I prefer to install everything via runfile installers. If you install CUDA via a runfile installer, you can update the driver later as often as you want, via a runfile installer.

I don’t mind using package manager, but I’m just unsure how to use it in my situation. I only saw the .run file for the most recent driver, and it doesn’t seem to work. I’d be happy to use a package for the driver update, but I don’t remember seeing it anywhere.

It would be good if nVidia could explain what process they had in mind for the driver update, once Cuda has been installed off of a package.