installing CUDA breaks my X Server.

cuda128 · August 31, 2015, 2:21pm

Hello,

I’m trying to install the Tesla K40c in a HP Proliant DL380g9. HP confirmed it’s supported. I also have an onboard graphics card.

lspci |grep VGA
01:00.1 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200EH (rev 01)

When I perform a fresh clean install of ubuntu 14.04.3, X works fine. I’m installing the server and then running apt-get install ubuntu-desktop. All is fine after that( I can login via the gui) as soon as I install the driver for the tesla k40C, it breaks X. I get a login prompt but it just keeps taking me to the login prompt(The GUI is not fully loaded.) I’m also wondering why linux is reporting my Matrox card as unclaimed. when running lshw -C display , that’s before and after tesla driver install, yet X works ( I’m pretty certain UNCLAIMED means that a driver has not claimed it yet.)

# lshw -C display
  *-display
       description: 3D controller
       product: NVIDIA Corporation
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:08:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: iomemory:39f0-39ef iomemory:39f0-39ef irq:16 memory:93000000-93ffffff memory:39fe0000000-39fefffffff memory:39ff0000000-39ff1ffffff
  *-display UNCLAIMED
       description: VGA compatible controller
       product: MGA G200EH
       vendor: Matrox Electronics Systems Ltd.
       physical id: 0.1
       bus info: pci@0000:01:00.1
       version: 01
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list
       configuration: latency=0
       resources: memory:91000000-91ffffff memory:92a88000-92a8bfff memory:92000000-927fffff

I’m also a bit confused on what the difference is between the cuda__linux.run
file and the NVIDIA-Linux-x86_64-346.89.run file. Ultimately , want to use the onboard for X gui and the Tesla for Computing. Can someone kindly point me in the right direction.

http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Getting_Started_Linux.pdf

I’ve tried the run method and the repo method

I’m also not seeing the /dev/Nvidia* in section 4.4. Device Node Verification
I do see them after going through the installation. I’ve ran the samples so I know cudo is working but it’s breaking my Xorg server

Robert_Crovella · August 31, 2015, 3:07pm

The linux getting started guide that you linked is a useful resource. I’ll offer some suggestions.

Start over with a clean install. The runfile and repo methods of installation are not compatible with each other, which you will discover if you read that guide carefully. History of doing one will corrupt the other method.
Get your system running the way you want it (X GUI and all) without the K40c GPU installed.
Follow the instructions in the guide to remove the nouveau driver.
Install the K40c GPU. At this point, you haven’t installed the driver, so your GUI should still be working.
Grab the latest linux driver only for your K40c – the runfile installer, such as this one:

[url]http://www.nvidia.com/download/driverResults.aspx/88814/en-us[/url]

Run the driver installer from step 5, but select “no” if prompted to install any OpenGL libraries and select “no” if prompted to modify the xorg.conf file.
At this point, if your GUI problems have not surfaced, then you are probably past the trouble. You can then run the CUDA 7 linux runfile installer (not repo method) and simply select “no” when prompted to install the driver.
If your login-loop has returned after step 6, try re-running whatever steps you had used (e.g. apt-get install ubuntu-desktop, etc.) to get X desktop up and running originally. You might need to do some force-reinstall or purge reinstall.

cuda128 · August 31, 2015, 3:54pm

Thanks for the reply, although it seems I’m never prompted if I want to install OpenGL or confgiure X, I’m assuming i’ll have to pass it some parameters judging by looking at the source file of the run file, I see a --no-opengl-files

Robert_Crovella · August 31, 2015, 4:06pm

the driver installer has command line help that is available by specifying --help, I believe.

cuda128 · August 31, 2015, 6:24pm

Thanks a Bunch, seems to be working

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K40c"
  CUDA Driver Version / Runtime Version          7.0 / 7.0
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 11520 MBytes (12079136768 bytes)
  (15) Multiprocessors, (192) CUDA Cores/MP:     2880 CUDA Cores
  GPU Max Clock rate:                            745 MHz (0.75 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 8 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 7.0, NumDevs = 1, Device0 = Tesla K40c
Result = PASS