How to disable X server when installing CUDA8.0?

clockzhong · July 13, 2017, 1:00am

Hi, all,

I’ve strictly followed the CUDA 8.0 installation process mentioned here: Installation Guide Linux :: CUDA Toolkit Documentation
but I found it doesn’t describe the X server issues, because when I execute it in my Ubuntu16.10 environment it report I need stop X Server before installing CUDA8.0,
The following is the error message:
…
Installing the NVIDIA display driver…
It appears that an X server is running. Please exit X before installation. If you’re sure that X is not running, but are getting this error, please delete any X lock files in /tmp.

===========
= Summary =

Driver: Installation Failed
Toolkit: Installation skipped
Samples: Installation skipped

Question one: Can anyone who could add steps on how to stop the X server into the CUDA8.0 installation guide?

Question two: Who could help me on how to stop the X server before the official guide release? I use the steps mentioned here:
https://askubuntu.com/questions/799184/how-can-i-install-cuda-on-ubuntu-16-04/913302
Uses:
sudo service lightdm stop
and
sudo service lightdm start
But I couldn’t restart the GUI system after following it, I don’t know why. Who could help? Thanks advance.

Robert_Crovella · July 13, 2017, 1:24am

Really?

So you followed this instruction:

“Reboot into text mode (runlevel 3).” ?

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-installation

Did you also note that Ubuntu 16.10 is not listed as one of the supported environments for CUDA 8?

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

sudo service lightdm stop

can certainly (as an alternative to the recommended method of switching to runlevel 3) be used to stop the GDM on Ubuntu. Restarting the GUI should not be an issue, you should not use

sudo service lightdm start

but instead restart your system, which will be typical after the driver install anyway, for runfile installation (which is the method you are using).

If the GUI does not return on a reboot (assuming you are using the lightdm method, not the runlevel 3 method) then it means the driver install was not successful for some reason. But there would be no information in your posting to diagnose that or even indicate that is the case.

clockzhong · July 13, 2017, 1:32am

txbob, thanks for your reply, I’ll check it after my system recover.
My GUI system crashed again after my trial on CUDA8.0 again.
In order to install CUDA8.0, I’ve reinstalled my Ubuntu 5 times in last 2 days.

Robert_Crovella · July 13, 2017, 1:36am

What GPUs (both NVIDIA and non-NVIDIA) do you have in your system? Is the GUI display being hosted on a monitor attached to a NVIDIA GPU? Or is it attached to something else (another GPU, the motherboard, etc.)?

clockzhong · July 13, 2017, 2:19am

It’s an ASUS laptop, the monitor attached to the NVIDIA GTX960M. My environment works very well on CUDA7.x, but since tensorflow upgraded 3 days ago, and forced me to upgrade CUDA to 8.0, I was in troubles until now.

I’m trying to reinstall the whole system again, just finished the Ubuntu16.10 installation. I feel the difference between ubuntu16.10 and ubuntu16.04 is not so big. I’ll update the latest result soon. Thanks, txbob.

clockzhong · July 13, 2017, 3:19am

This time, I don’t use “sudo service lightdm stop” to stop the X server, I use the “sudo systemctl set-default multi-user.target” to run into the runlevel 3, it’s described here:
https://askubuntu.com/questions/788323/change-runlevel-on-16-04
According to the above document description, the “runlevel” is an old mechanism, we need use “targert”.
I could successfully switch between TEXT and GUI mode with the above solution before installing CUDA8.0.
But after I installed CUDA8.0, then I couldn’t switch back to GUI mode, the system stuck during boot, is it the reason I’ve installed the driver? Should I uninstall the driver?

I even couldn’t boot into the grub bootloader. I don’t know why, it seems I need reinstall the system the 6th time.

Robert_Crovella · July 13, 2017, 3:41am

Yes, in spite of your initial questions, the issue here has nothing to do with how you stop the GDM.

The issue is that a plain CUDA 8 install does not work correctly on some Optimus laptops. You have an Optimus laptop that shipped from the factory (I think) with Windows. That was the supported environment for it, and when you installed Linux, you took on the challenge of getting the Optimus config to work correctly in an Linux environment. As you’re discovering it is non trivial.

You indicated that you had CUDA 7.5 working correctly. Whatever method you used to do that, should work on CUDA 8. The method should not change. But the install guide does not contain the necessary info to make the install work correctly on every possible Optimus laptop out there.

One method I have used successfully on some optimus laptops is to install the GPU driver (or CUDA) with the --no-opengl-libs install switch (or --no-opengl-files). This assumes that you have a linux install that is working correctly with the base (intel) graphics. That will leave your intel display driver stack intact. The corruption of that stack by the nvidia driver install is the proximal reason for your GUI going haywire.

However this only works on some laptops that will power the NVIDIA GPU even if it is not “in use”. In some cases this may be controllable via the laptop BIOS. It also has various implications: your NVIDIA GPU can never be used for laptop graphics, and any display ports (e.g. perhaps DVI, etc, it depends on the laptop design) that are hardwired to the NVIDIA GPU cannot be used for display.

The more involved methodologies involve using something like bumblebee or Prime to make the Optimus graphics scenario on linux function approximately similar to the way it works on windows.

Good luck!

clockzhong · July 13, 2017, 3:49am

One thing still couldn’t be explained: Why the CUDA7.x works well?

I’ll try with package installation, instead of the *.run file, and I’ll also change the Ubuntu from 16.10 to 16.04.

Robert_Crovella · July 13, 2017, 3:52am

If you did a package install instead of runfile, that may have been the difference. I think a package install is worth a try. I don’t know exactly what packages it will pull in in your case, and it may be that it pulls in the right stuff to avoid GUI corruption. Otherwise I can’t explain it. But as I said already, if you can identify exactly what method you used to make CUDA 7.5 work, the same method should work with CUDA 8.

clockzhong · July 13, 2017, 11:30am

txbob, Thanks for you great help. After I switch to *.deb package installation, I could install CUDA8.0 and use it on my ASUS laptop now. But I’m still not sure which difference causes this problem, the different installation packages or the Ubuntu16.10/16.04, anyway, it works now.

I’m installing Tensflow now, but found its latest version only reqeust CUDA8.0, and CUDNN 5.x, I’ve installed both latest versions, so I need downgrade cudnn from 6.0 to 5.1 ,and try again. The good thing is I needn’t re-install whole system this time.
I need buy a top-level desktop to do my work, or the environment setup works waste too much time.

clockzhong · July 13, 2017, 11:51am

Finally, the tensorflow works again!!! I could continue my work after 2 days’ pause.

How to disable X server when installing CUDA8.0?

=========== = Summary =

===========
= Summary =