i am having similar issue in RHEL7.6 Kernel version is 3.10.0-957.10.1.el7.x86_64
entry in /etc/default/grub
GRUB_CMDLINE_LINUX=“crashkernel=auto rd.lvm.lv=rootvg/lv_root rd.lvm.lv=rootvg/lv_swap rd.lvm.lv=rootvg/lv_usr rd.driver.blacklist=nouveau nouveau.modeset=0 rhgb quiet splash nomodeset”" audit=1"
i have Tesla v100 GPU cards assigned to vmware vm via passthrough, able to see the PCI in operating system
[root@server ~]# lspci | grep -i nvidia
13:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
[root@server ~]#
have disabled the nouveau and re-created initramfs as well.
[root@server modprobe.d]# cat nvidia-installer-disable-nouveau.conf
generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0
[root@server modprobe.d]# pwd
/etc/modprobe.d
[root@server modprobe.d]# cat blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
[root@server modprobe.d]#
while installing latest nvidia cuda driver i am having below error.
[root@server satmp]# more /var/log/cuda-installer.log
INFO: Checking compiler version…
INFO: gcc location: /bin/gcc
INFO: gcc version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)
INFO: Window setup complete
INFO: Cleaning up window
INFO: Complete
INFO: Initializing menu
INFO: Setting install of manpages to 0
INFO: Path doesn’t end with ‘/’, adding it
INFO: Setup complete
INFO: Cleaning up window
INFO: Complete
INFO: Components to install:
INFO: Driver
INFO: /bin/lsb_release
INFO: Executing NVIDIA-Linux-x86_64-418.39.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --inst
all-libglvnd 2>&1
INFO: Verifying archive integrity… OK
INFO: Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 418.39…
…
…
…
…
INFO:
INFO: WARNING: One or more modprobe configuration files to disable Nouveau are
INFO: already present at:
INFO: /etc/modprobe.d/nvidia-installer-disable-nouveau.conf. Please be
INFO: sure you have rebooted your system since these files were written.
INFO: If you have rebooted, then Nouveau may be enabled for other
INFO: reasons, such as being included in the system initial ramdisk or
INFO: in your X configuration file. Please consult the NVIDIA driver
INFO: README and your Linux distribution’s documentation for details on
INFO: how to correctly disable the Nouveau kernel driver.
INFO:
INFO:
INFO: Welcome to the NVIDIA Software Installer for Unix/Linux
INFO:
INFO: Detected 20 CPUs online; setting concurrency level to 20.
INFO: Installing NVIDIA driver version 418.39.
INFO: For some distributions, Nouveau can be disabled by adding a file in the
INFO: modprobe configuration directory. Would you like nvidia-installer to
INFO: attempt to create this modprobe file for you? (Answer: Yes)
INFO:
INFO: One or more modprobe configuration files to disable Nouveau have been
INFO: written. For some distributions, this may be sufficient to disable
INFO: Nouveau; other distributions may require modification of the initial
INFO: ramdisk. Please reboot your system and attempt NVIDIA driver installation
INFO: again. Note if you later wish to reenable Nouveau, you will need to delete
INFO: these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf,
INFO:
INFO: Performing CC sanity check with CC=“/bin/cc”.
INFO: Kernel source path: ‘/lib/modules/3.10.0-957.10.1.el7.x86_64/source’
INFO: Kernel output path: ‘/lib/modules/3.10.0-957.10.1.el7.x86_64/build’
INFO: Performing Compiler check.
INFO: Performing Dom0 check.
INFO: Performing Xen check.
INFO: Performing PREEMPT_RT check.
INFO: Performing vgpu_kvm check.
INFO: Cleaning kernel module build directory.
INFO: Building kernel modules
: [############### ] 43% ] 0%
: [############################# ] 95%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%#] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100%
: [##############################] 100% 100%
: [##############################] 100%
: [##############################] 100%
INFO:
INFO: ERROR: Unable to load the kernel module ‘nvidia.ko’. This happens most
INFO: frequently when this kernel module was built against the wrong or
INFO: improperly configured kernel sources, with a version of gcc that
INFO: differs from the one used to build the target kernel, or if another
INFO: driver, such as nouveau, is present and prevents the NVIDIA kernel
INFO: module from obtaining ownership of the NVIDIA GPU(s), or no NVIDIA
INFO: GPU installed in this system is supported by this NVIDIA Linux
INFO: graphics driver release.
INFO:
INFO: Please see the log entries ‘Kernel module load error’ and 'Kernel
INFO: messages’ at the end of the file ‘/var/log/nvidia-installer.log’ for
INFO: more information.
INFO:
INFO:
INFO: ERROR: Installation has failed. Please see the file
INFO: ‘/var/log/nvidia-installer.log’ for details. You may find
INFO: suggestions on fixing installation problems in the README available
INFO: on the Linux driver download page at www.nvidia.com.
INFO:
INFO: Kernel module compilation complete.
INFO: Unable to determine if Secure Boot is enabled: No such file or directory
INFO: Kernel module load error: No such device
INFO: Kernel messages:
INFO: [ 8.960145] type=1131 audit(1552995203.835:68): pid=1 uid=0
INFO: auid=4294967295 ses=4294967295 msg='unit=plymouth-read-write comm=“systemd”
INFO: exe=“/usr/lib/systemd/systemd” hostname=? addr=? terminal=? res=success’
INFO: [ 8.972153] type=1130 audit(1552995203.847:69): pid=1 uid=0
INFO: auid=4294967295 ses=4294967295 msg='unit=rhel-import-state comm=“systemd”
INFO: exe=“/usr/lib/systemd/systemd” hostname=? addr=? terminal=? res=success’
INFO: [ 8.986702] type=1130 audit(1552995203.862:70): pid=1 uid=0
INFO: auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup
INFO: comm=“systemd” exe=“/usr/lib/systemd/systemd” hostname=? addr=? terminal=?
INFO: [ 9.002907] type=1305 audit(1552995203.878:71): audit_enabled=1 old=1
INFO: auid=4294967295 ses=4294967295 res=1
INFO: [ 9.002940] type=1305 audit(1552995203.878:72): audit_pid=11273 old=0
INFO: auid=4294967295 ses=4294967295 res=1
INFO: [ 9.231390] NET: Registered protocol family 40
INFO: [ 9.337251] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
INFO: [ 9.340742] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 9 vectors
INFO: [ 9.341455] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
INFO: [ 10.504129] random: crng init done
INFO: [ 110.286629] VFIO - User Level meta-driver version: 0.3
INFO: [ 110.317196] ipmi message handler version 39.2
INFO: [ 110.320216] ipmi device interface
INFO: [ 110.335144] nvidia: loading out-of-tree module taints kernel.
INFO: [ 110.335151] nvidia: module license ‘NVIDIA’ taints kernel.
INFO: [ 110.335153] Disabling lock debugging due to kernel taint
INFO: [ 110.399900] nvidia: module verification failed: signature and/or
INFO: required key missing - tainting kernel
INFO: [ 110.482205] nvidia-nvlink: Nvlink Core is being initialized, major
INFO: device number 240
INFO: [ 110.482925] NVRM: This PCI I/O region assigned to your NVIDIA device is
INFO: NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
INFO: [ 110.482927] NVRM: The system BIOS may have misconfigured your GPU.
INFO: [ 110.482931] nvidia: probe of 0000:13:00.0 failed with error -1
INFO: [ 110.482951] NVRM: The NVIDIA probe routine failed for 1 device(s).
INFO: [ 110.482952] NVRM: None of the NVIDIA graphics adapters were initialized!
INFO: [ 110.483092] nvidia-nvlink: Unregistered the Nvlink Core, major device
INFO: number 240
INFO: Finished with code: 256
[ERROR]: Install of driver component failed.
INFO: CUDA Toolkit 10.1
INFO: /bin/lsb_release