Nvidia driver/CUDA installation causes centos 7 to hang on boot. unable to access user interface.

I’ve been tasked with installing CUDA on some new servers that all came with Centos 7 installed. I followed the instructions for installing CUDA, which goes smoothly until I restart the computer. Upon restart a boot log check list is displayed, and the computer hangs there indefinitely. I can go into the command line with ctrl+alt+f2, and the good news is that CUDA works proper, the samples compile and run fine, but I’m finding no way to get the GUI working without uninstalling the NVIDIA driver and switching back to the nouveau that came with it, which breaks CUDA.

You need to remove all traces of the nouveau driver, before installing the nvidia driver.

Something like this:

Switch to runlevel 3.

as root:

echo -e “blacklist nouveau\noptions nouveau modeset=0” > /etc/modprobe.d/disable-nouveau.conf
dracut --force

Then reboot into runlevel 3 and run the CUDA 7 runfile installer.

Thanks for your help, however the system still hangs in the same spot.

I unistalled the driver and cuda. I added the blacklist using the command you gave me. I found more commands to remove nouveau from google, including yum remove xorg-x11-drv-nouveau, did the dracut --force. After all this and running the installers, I have the same issues.

edit:
Looking at the boot log more closely. Before the boot would always hang after a different [ OK ] print out. But now the following line is always displayed, that I haven’t seen before:
[* ] A start job is running for Wait for Plymouth Boot Screen to Quit

I don’t know the history of the system(s) nor have you provided any logs or indicated where in the boot process it is hanging. It’s possible that there are other conflicting nvidia components.

All of the following are as root.

What is the result of running

yum list nvidia-*

Which version of CUDA are you trying to install?
Are you using a runfile installer, or a package manager method?
Do you have an nvidia GPU? Which one?

What is the result of running:

lspci -v |grep NV

What is the result of running:

dmesg |grep NVRM

and

dmesg |grep nouv

The latest attempt I made I used the elrepo repository to try and install the driver, so the result of

yum list nvidia-* is:
Installed Packages:
nvidia-x11-drv.x86_64 346.59-1.el7.elrepo
Available Packages
nvidia-detect.x86_64 346.59-1.el7.elrepo
nvidia-x11-drv-304xx.x86_64 304.125-1.el7.elrepo
nvidia-x11-drv-304xx-32bit.x86_64 304.125-1.el7.elrepo
nvidia-x11-drv-32bit.x86_64 346.59-1.el7.elrepo
nvidia-x11-drv-340xx.x86_64 340.76-1.el7.elrepo
nvidia-x11-drv-340xx-32bit.x86_64 340.76-1.el7.elrepo

The version of CUDA I’m installing is 7.0

I’ve tried to use both the runfile installer and the package manager method.

The servers have a nvidia tesla k20c

lspci -v |grep NV:
83:00.0 3D controller: NVIDIA Corporation GK110GL [Testla K20c]

dmesg |grep NVRM:
[ 2.058067] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 340.87 Thu Mar 19 23:39:02 PDT 2015
[ 1803.786443] NVRM: API mismatch: the client has the version 346.46, but
NVRM: this kernel module has the version 340.87. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version
[ 1803.786454] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22

dmesg |grep nouv:
[0.0000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.4.2.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/swap vconsole.font=latarcyrheb-sun16.rd.lvm.lv=centos/root crashkernel=auto vconsole.keymap-us rhgb quiet nouveau.modeset=0 rd.driver.blacklist=nouveau
[0.0000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.4.2.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/swap vconsole.font=latarcyrheb-sun16.rd.lvm.lv=centos/root crashkernel=auto vconsole.keymap-us rhgb quiet nouveau.modeset=0 rd.driver.blacklist=nouveau

So this is a problem:

[ 2.058067] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 340.87 Thu Mar 19 23:39:02 PDT 2015
[ 1803.786443] NVRM: API mismatch: the client has the version 346.46, but
NVRM: this kernel module has the version 340.87. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version

346.46 is coming from CUDA 7 installer. Not sure where 340.87 is coming from, probably a repo. 340.87 cannot be used with CUDA 7.

You cannot mix runfile and repo installation methods.

When I cull through the data you have presented, I find elements of the following nvidia drivers:

346.59, 346.46, 340.87

I suggest starting over with a clean install of Centos7, switch to runlevel 3, remove nouveau, and use the CUDA 7 runfile installer (only).

Alternatively, you can study the linux getting started guide, which includes tips about how to clean up when switching from one install method to the other:

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#handle-uninstallation

Hi, I am having the same problem as the original poster. However, the samples are not running. I am getting the error

modprobe: ERROR: could not insert 'nvidia_uvm': Required key not available.

Does anyone know what could be the problem? I am using the runfile installation method and having already made sure that the Nouveau drivers are disabled.

Hi ,
I am new to nvenc , i tried to run nvenc with input as file works perfectly fine.
BUt when i try same thing with Decklink card output as input to the nvenc using ffmpeg it give error Init Cuda().

Below are the command line which i used .

only Decklink card cmd line … (works fine)
ffmpeg -y -format_code hp59 -f decklink -i ‘DeckLink Mini Recorder 4K’ -map 0 -vf scale=1280:720 -c:v libx264 -ac 2 -ar 48000 -c:a libfdk_aac -b:a 96k output.ts

File base Nvenc cmd line … (works fine)
./ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -i Despicable3.mp4 -map 0 -vf scale_cuda=1280:720 -c:v h264_nvenc -ac 2 -ar 48000 -c:a libfdk_aac -b:a 96k out.ts

But Decklink card and nvenc give error
./ffmpeg -y -format_code hp59 -f decklink -i ‘DeckLink Mini Recorder 4K’ -map 0 -vf scale=1280:720 -c:v hevc_nvenc -profile:v main -level 3.1 -pix_fmt yuv420p test67.ts

Note :
Me running in server with centos 7
Cuda 9.0 is insatlled
nvidia-kmod-384.59-2.el7.x86_64
GPU = p5000

Is my cmd line has problem or i am missing anything ?

Error
[hevc_nvenc @ 0x355b520] Cannot init CUDA
[hevc_nvenc @ 0x355b520] cuCtxPushCurrent failed
Error initializing output stream 0:0 – Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
Conversion failed!

installation
[root@sedev soft]# lspci -v |grep NV
04:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P5000] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 11b2
04:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
Subsystem: NVIDIA Corporation Device 11b2
[root@sedev soft]# dmesg |grep NVRM
[ 3.284404] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.59 Wed Jul 19 23:53:34 PDT 2017 (using threaded interrupts)
[ 2100.112771] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.59 Wed Jul 19 23:53:34 PDT 2017 (using threaded interrupts)
[root@sedev soft]# dmesg |grep nouv
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-514.26.2.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-514.26.2.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off

Hi, Everyone! I have resolve this problem.
Please run the following commands.

yum install xorg-x11-server-Xorg

Xorg -configure :0

cp /root/xorg.conf.new /etc/X11/xorg.conf

I have entered my X windows successfully!

Good luck!

However, the cuda does not work!
When I install the cuda Toolkit, the /etc/X11/xorg.conf only has the following lines:

#RPM Fusion - nvidia-xorg.conf
Section “Device”
Identifier “Videocard0”
Driver “nvidia”
ENDSection

I guess this is the reason that leads to the X window cannot be started.

Dear All,
I have resolve this problem!
I found that there is some problem with the /etc/X11/xorg.conf
After installing the cuda Toolkit, the /etc/X11/xorg.conf is as follows:

#RPM Fusion - nvidia-xorg.conf
Section "Device"
      Identifier "Videocard0"
      Driver "nvidia"
ENDSection

When we startx, a error appered. no screens found.
Obviously, the xorg.conf installed by cuda Toolkit has nothing configuration about the monitor, mouse, etc...

Therefore, we should configure the xorg.conf


Step 1:
#yum install xorg-server-Xorg
#X -configure xorg.conf

Step2:
#vim xorg.conf
In the Section "Files":
Adding    ModulePath       "usr/lib64/nvidia/modules"   below    the    ModulePath     "/usr/lib64/xorg/modules"
In the Section "InputDevice"
Replacing the kbd with nvidia


Then startx. The X windows can be run.

Good Lucks!

Dear All,

I am also getting the same error. Output of startx is no screen found and I tried to follow the above steps.

Step 1:
yum install xorg-server-Xorg
Output of Step 1 is

Loaded plugins: fastestmirror, langpacks
base | 3.6 kB 00:00:00
cuda | 2.5 kB 00:00:00
epel/x86_64/metalink | 6.9 kB 00:00:00
epel | 4.3 kB 00:00:00
extras | 3.4 kB 00:00:00
updates | 3.4 kB 00:00:00
(1/4): extras/7/x86_64/primary_db | 128 kB 00:00:00
(2/4): epel/x86_64/updateinfo | 843 kB 00:00:00
(3/4): updates/7/x86_64/primary_db | 3.6 MB 00:00:00
(4/4): epel/x86_64/primary_db | 4.8 MB 00:00:01
Determining fastest mirrors

  • base: ftp.iitm.ac.in
  • epel: epel.mirror.net.in
  • extras: ftp.iitm.ac.in
  • updates: ftp.iitm.ac.in
    No package xorg-server-Xorg available.
    Error: Nothing to do

Below is the output of my /etc/X11/xorg.conf file
Section “Device”
Identifier “Videocard0”
Driver “nvidia”
EndSection

Also, as mentioned in below step2 where can I find this xorg.conf file?

Step2:
#vim xorg.conf
In the Section “Files”:
Adding ModulePath “usr/lib64/nvidia/modules” below the ModulePath “/usr/lib64/xorg/modules”
In the Section “InputDevice”
Replacing the kbd with nvidia

Please suggest the right path to resolve my problem.

Dear All,

I am also getting the same error. Output of startx is no screen found and I tried to follow the above steps.

Step 1:
yum install xorg-server-Xorg
Output of Step 1 is

Loaded plugins: fastestmirror, langpacks
base | 3.6 kB 00:00:00
cuda | 2.5 kB 00:00:00
epel/x86_64/metalink | 6.9 kB 00:00:00
epel | 4.3 kB 00:00:00
extras | 3.4 kB 00:00:00
updates | 3.4 kB 00:00:00
(1/4): extras/7/x86_64/primary_db | 128 kB 00:00:00
(2/4): epel/x86_64/updateinfo | 843 kB 00:00:00
(3/4): updates/7/x86_64/primary_db | 3.6 MB 00:00:00
(4/4): epel/x86_64/primary_db | 4.8 MB 00:00:01
Determining fastest mirrors

  • base: ftp.iitm.ac.in
  • epel: epel.mirror.net.in
  • extras: ftp.iitm.ac.in
  • updates: ftp.iitm.ac.in
    No package xorg-server-Xorg available.
    Error: Nothing to do

Below is the output of my /etc/X11/xorg.conf file
Section “Device”
Identifier “Videocard0”
Driver “nvidia”
EndSection

Also, as mentioned in below step2 where can I find this xorg.conf file?

Step2:
#vim xorg.conf
In the Section “Files”:
Adding ModulePath “usr/lib64/nvidia/modules” below the ModulePath “/usr/lib64/xorg/modules”
In the Section “InputDevice”
Replacing the kbd with nvidia

Please suggest the right path to resolve my problem.

Dear Shubhra,

Usually, after installing the Xorg by the command "yum install xorg-server-Xorg".

The xrog.conf is located at the home folder of the ROOT user.

Good luck!

Best Wishes,

Youshan

Dear Ysliu,

I couldn’t find that file in home folder of root user but same file naming xorg.conf is located in /etc/X11.
Do I need to change that file and what are those changes? Can you please explain in details. I am stuck in this problem since last 15 days.

Thnaks in advance!!

Dear Shubhra,
Your outputs of the first step does not succeed.
Please see the following information from your outputs:

No package xorg-server-Xorg available.
Error: Nothing to do

The xorg.conf in /etc/X11 is generated by the Cuda Toolkit instead of the command "yum install xorg-server-Xorg".
Good luck!

Best Wishes,
Youshan

Dear Shubhra,
You try the following commands:
yum install xorg-x11-server-Xorg
Xorg -configure xorg.conf
Good luck!
Best Wishes,
Youshan

#yum install xorg-x11-server-Xorg
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile

  • base: ftp.iitm.ac.in
  • epel: epel.mirror.net.in
  • extras: ftp.iitm.ac.in
  • updates: ftp.iitm.ac.in
    Package xorg-x11-server-Xorg-1.19.3-11.el7.x86_64 already installed and latest version
    Nothing to do

Seems like it is already installed.
After that I tried Xorg -configure xorg.conf (pwd - /etc/X11/) and got the below output

X -configure xorg.conf
Unrecognized option: xorg.conf
(EE)
Fatal server error:
(EE) Unrecognized option: xorg.conf

What am I supposed to do now?

#yum install xorg-x11-server-Xorg
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile

  • base: ftp.iitm.ac.in
  • epel: epel.mirror.net.in
  • extras: ftp.iitm.ac.in
  • updates: ftp.iitm.ac.in
    Package xorg-x11-server-Xorg-1.19.3-11.el7.x86_64 already installed and latest version
    Nothing to do

Seems like it is already installed.
After that I tried Xorg -configure xorg.conf (pwd - /etc/X11/) and got the below output

X -configure xorg.conf
Unrecognized option: xorg.conf
(EE)
Fatal server error:
(EE) Unrecognized option: xorg.conf

What am I supposed to do now?

Dear Shubhra,
There is message when you type X -configure.
You can try Xorg -configure :0.
Good luck!
Youshan