Solved: NVIDIA driver installation fails.

Hi,

I’m trying to install NVIDIA driver onto my system using the runfile (NVIDIA-Linux-x86_64-185.18.36-pkg2.run).

I did download this from the nvidia website under the download drivers option (Download NVIDIA, GeForce, Quadro, and Tesla Drivers).

[b]I have all the pre-requisites for installing the driver:

  1. kernel version: kernel-3.10.0-693.11.6.el7.x86_64
  2. kernel headers: kernel-headers-3.10.0-693.11.6.el7.x86_64.rpm
  3. kernel devel: kernel-devel-3.10.0-693.11.6.el7.x86_64
  4. kernel tools: kernel-tools-3.10.0-693.11.6.el7.x86_64; kernel-tools-libs-3.10.0-693.11.6.el7.x86_64
  5. gcc: gcc-4.8.5-16.el7_4.1.x86_64[/b]

The installation fails. It prompts following message:

Using: nvidia-installer ncurses user interface
→ Tagging shared libraries with chcon -t textrel_shlib_t.
→ License accepted.
→ Installing NVIDIA driver version 185.18.36.
→ No precompiled kernel interface was found to match your kernel; would you li
ke the installer to attempt to download a kernel interface for your kernel f
rom the NVIDIA ftp site (ftp://download.nvidia.com)? (Answer: No)
→ No precompiled kernel interface was found to match your kernel; this means
that the installer will need to compile a new kernel interface.
→ Performing CC sanity check with CC=“cc”.
→ Performing CC version check with CC=“cc”.
→ Using the kernel source path
‘/lib/modules/3.10.0-693.11.6.el7.x86_64/source/’ as specified by the
‘–kernel-source-path’ commandline option.
ERROR: Unable to determine the version of the kernel sources located in
‘/lib/modules/3.10.0-693.11.6.el7.x86_64/source/’. Please make sure you
have installed the kernel source files for your kernel and that they are
properly configured; on Red Hat Linux systems, for example, be sure you
have the ‘kernel-source’ or ‘kernel-devel’ RPM installed. If you know
the correct kernel source files are installed, you may specify the
kernel source path with the ‘–kernel-source-path’ command line option.

I did explicitly mention the kernel path as well using --kernel-source-path but still the installer fails.

Can someone help me to get this fixed?

why are you trying to install such an old driver?

anyway, the error is indicating that your kernel headers don’t match your kernel. If you do a clean install of the OS, without updating anything, you should be able to work around this. Otherwise you will need to track down the difference.

Hi,

I did try to install nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm first but it prompted me with the same error.

Moreover my end goal is to benchmark GPUs installed on my system using HPL and i’m following http://hpl-calculator.sourceforge.net/Howto-HPL-GPU.pdf

This mentioned to use the older version so i thought of trying with that but it didn’t work either.

Am I missing something?

Thanks,
Karan

@txbob: I did perform a clean install of the OS without updating anything still the installer prompts me with the same error.

Can someone give me a link to download local run file equivalent for nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm?

I can only get an rpm file when i try to download it using the download drivers option on nvidia website.

390.30 version seems fine to install;
You may find drivers either here
[url]Official Drivers | NVIDIA
[url]https://developer.nvidia.com/cuda-toolkit[/url]
or use ubuntu repo :
[url]How can I install CUDA 9 on Ubuntu 17.10 - Ask Ubuntu

I did try the install on a freshly build system and installed all the required kernel-header and kernel-devel packages (matching my kernel version) but still the logs say:

ERROR: Unable to determine the version of the kernel sources located in
‘/lib/modules/3.10.0-693.11.6.el7.x86_64/source/’. Please make sure you
have installed the kernel source files for your kernel and that they are
properly configured; on Red Hat Linux systems, for example, be sure you
have the ‘kernel-source’ or ‘kernel-devel’ RPM installed. If you know
the correct kernel source files are installed, you may specify the
kernel source path with the ‘–kernel-source-path’ command line option.

I have a UEFI system. Do i need to disable it first?

could that be a reason of failure?

what is the software you tried to install that returned that errors?

what is output of

uname -a

?
Yuo may somehow explicitly declare the path like:

sh Nvidia-version_you_are_using-.run --kernel-source-path=/lib/modules/2.6.24-16-server/build/include/

Hence you are to adjust path as per your environment.
source of the example: https://www.linuxquestions.org/questions/linux-software-2/nvidia-driver-install-script-unable-to-determine-the-version-of-the-kernel-sources-690731/

two of them;

firstly tried with the latest:

  1. 390.30 [Tesla Driver for Linux RHEL 7 | 390.30 | Linux 64-bit RHEL 7 | NVIDIA]

secondly with the one which is there in how-to hpl-gpu guide.

  1. 185.18.36; the older run file version [Download NVIDIA, GeForce, Quadro, and Tesla Drivers]-------> why i’m using this is because i am trying to benchmark gpus in my systems using HPL and i’m following how-to-hpl-gpu guide [http://hpl-calculator.sourceforge.net/Howto-HPL-GPU.pdf]

Thanks.

Since you are using CentOS you may use for reference the method from:
https://www.centos.org/forums/viewtopic.php?t=61162#p258109

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install nvidia-detect.x86_64
yum install $(nvidia-detect)

either line 3 or 4 should autodetect and suggest driver installation

However, if you have installed something already you need to uninstall /remove the nvidia installation:
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#handle-uninstallation
For HPL GPU I found updated reference at HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. It has entry for 2016 year. Though I didn’t work with that method.

yum install fails, asks for a public key for nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm [Says it’s not installed]

yum install nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm
Examining nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm: nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64
Marking nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm to be installed
Resolving Dependencies
→ Running transaction check
—> Package nvidia-diag-driver-local-repo-rhel7-390.30.x86_64 0:1.0-1 will be installed
→ Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package
Arch Version
Repository Size

Installing:
nvidia-diag-driver-local-repo-rhel7-390.30
x86_64 1.0-1 /nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64 89 M

Transaction Summary

Install 1 Package

Total size: 89 M
Installed size: 89 M
Is this ok [y/d/N]: y
Downloading packages:
warning: /root/nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm: Header V3 RSA/SHA512 Signature, key ID 7fa2af80: NOKEY

Public key for nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm is not installed

try

wget http://us.download.nvidia.com/tesla/390.30/nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm

and install from rpm

i) `rpm -i nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm' 
ii) `yum clean all`
iii) `yum install cuda-drivers` 
iv) `reboot`

source

i can’t do a direct install. i’ve to download it and then run it.

wget does download

i know but i’m restricted to use wget. i have only one option i.e. to download it explicitly and then install :(

what does happen if you execute in the terminal?

wget http://us.download.nvidia.com/tesla/390.30/nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm

ok so --nogpgcheck worked!

yum install --nogpgcheck nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm
Examining nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm: nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64
Marking nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm to be installed
Resolving Dependencies
→ Running transaction check
—> Package nvidia-diag-driver-local-repo-rhel7-390.30.x86_64 0:1.0-1 will be installed
→ Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package
Arch Version
Repository Size

Installing:
nvidia-diag-driver-local-repo-rhel7-390.30
x86_64 1.0-1 /nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64 89 M

Transaction Summary

Install 1 Package

Total size: 89 M
Installed size: 89 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64 1/1
Verifying : nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64 1/1

Installed:
nvidia-diag-driver-local-repo-rhel7-390.30.x86_64 0:1.0-1

Complete!

Can you guide me about what am i supposed to do next after:

i) rpm -i nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm' ii) yum clean alliii)yum install cuda-drivers iv)reboot`

** important thing is i need to install cuda 9.1 toolkit after this as i need cuBLAS libraries to compile HPL and when i see the installation guide for cuda in here (Installation Guide Linux :: CUDA Toolkit Documentation) Will the yum install cuda-drivers not conflict with when i install cuda toolkit?

basically, you could have used the method from
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=RHEL&target_version=7&target_type=rpmnetwork

but this is for installing cuda on network but for me it’s cuda local (rpm).

So do i need to install the base installer and both patches from here (https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=RHEL&target_version=7&target_type=rpmlocal)