i am having some trouble installing cuda support on an ibm softlayer machine with nvidia K80.
since we’d like to get a series of these up we need to either fix this , get a different gpu, or move from softlayer to another host.
install steps i took:
- preinstall -
lspci|grep -i nvidia
83:00.0 3D controller: NVIDIA Corporation Device 102d (rev a1)
84:00.0 3D controller: NVIDIA Corporation Device 102d (rev a1)
uname -m && cat /etc/*release
root@brain2:~# uname -m && cat /etc/*release
x86_64
...
DISTRIB_DESCRIPTION="Ubuntu 14.04.3 LTS"
root@brain2:~# gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
didn’t check checksum since no checksum for cuda_7.5.18_linux.run is listed at https://developer.nvidia.com/cuda-downloads/checksums
(and no filesizes listed for what is there, btw)
- downloaded runfile
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run
disabled nouveau drivers
root@brain2:~# more /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
didn’t reboot since i only have cmdline access to machine anyway
- ran runfile
chmod +x cuda_7.5.18_linux.run
sudo sh cuda_7.5.18_linux.run
- notice missing libs for examples, try to install, give up:
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
freeglut3-dev : Depends: libgl1-mesa-dev but it is not going to be installed or
libgl-dev
libglu1-mesa-dev : Depends: libglu1-mesa (= 8.0.2-0ubuntu3) but 9.0.0-2 is to be installed
Depends: libgl1-mesa-dev but it is not going to be installed or
libgl-dev
libxmu-dev : Depends: libxmu6 (= 2:1.1.0-3) but 2:1.1.1-1 is to be installed
try adding sources
echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
more /etc/apt/sources.list
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
no dice, give up on examples
-
reboot
-
verify device nodes - FAIL ! no /dev/nvidia* exists. tried nvidia-smi , that command doesnt succeed:
root@brain2:~# nvidia-smi
modprobe: FATAL: Module nvidia not found.
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
fwiw, nvidia-fb does work:
root@brain2:~# modprobe nvidiafb -vvvv
modprobe: INFO: ../libkmod/libkmod.c:354 kmod_set_log_fn() custom logging function 0x7fcd9d64b090 registered
modprobe: DEBUG: ../libkmod/libkmod-index.c:790 index_mm_open() file=/lib/modules/3.13.0-74-generic/modules.dep.bin
modprobe: DEBUG: ../libkmod/libkmod-index.c:790 index_mm_open() file=/lib/modules/3.13.0-74-generic/modules.alias.bin
modprobe: DEBUG: ../libkmod/libkmod-index.c:790 index_mm_open() file=/lib/modules/3.13.0-74-generic/modules.symbols.bin
modprobe: DEBUG: ../libkmod/libkmod-index.c:790 index_mm_open() file=/lib/modules/3.13.0-74-generic/modules.builtin.bin
modprobe: DEBUG: ../libkmod/libkmod-module.c:529 kmod_module_new_from_lookup() input alias=nvidiafb, normalized=nvidiafb
modprobe: DEBUG: ../libkmod/libkmod-module.c:535 kmod_module_new_from_lookup() lookup modules.dep nvidiafb
modprobe: DEBUG: ../libkmod/libkmod.c:544 kmod_search_moddep() use mmaped index 'modules.dep' modname=nvidiafb
modprobe: DEBUG: ../libkmod/libkmod.c:392 kmod_pool_get_module() get module name='nvidiafb' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:400 kmod_pool_add_module() add 0x7fcd9e7e2760 key='nvidiafb'
modprobe: DEBUG: ../libkmod/libkmod.c:392 kmod_pool_get_module() get module name='vgastate' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:392 kmod_pool_get_module() get module name='vgastate' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:400 kmod_pool_add_module() add 0x7fcd9e7e6540 key='vgastate'
modprobe: DEBUG: ../libkmod/libkmod-module.c:184 kmod_module_parse_depline() add dep: /lib/modules/3.13.0-74-generic/kernel/drivers/video/vgastate.ko
modprobe: DEBUG: ../libkmod/libkmod.c:392 kmod_pool_get_module() get module name='fb_ddc' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:392 kmod_pool_get_module() get module name='fb_ddc' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:400 kmod_pool_add_module() add 0x7fcd9e7e26f0 key='fb_ddc'
modprobe: DEBUG: ../libkmod/libkmod-module.c:184 kmod_module_parse_depline() add dep: /lib/modules/3.13.0-74-generic/kernel/drivers/video/fb_ddc.ko
modprobe: DEBUG: ../libkmod/libkmod.c:392 kmod_pool_get_module() get module name='i2c_algo_bit' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:392 kmod_pool_get_module() get module name='i2c_algo_bit' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:400 kmod_pool_add_module() add 0x7fcd9e7e6810 key='i2c_algo_bit'
modprobe: DEBUG: ../libkmod/libkmod-module.c:184 kmod_module_parse_depline() add dep: /lib/modules/3.13.0-74-generic/kernel/drivers/i2c/algos/i2c-algo-bit.ko
modprobe: DEBUG: ../libkmod/libkmod-module.c:190 kmod_module_parse_depline() 3 dependencies for nvidiafb
modprobe: DEBUG: ../libkmod/libkmod-module.c:556 kmod_module_new_from_lookup() lookup nvidiafb=0, list=0x7fcd9e7e26d0
modprobe: DEBUG: ../libkmod/libkmod-module.c:441 kmod_module_unref() kmod_module 0x7fcd9e7e2760 released
modprobe: DEBUG: ../libkmod/libkmod.c:408 kmod_pool_del_module() del 0x7fcd9e7e2760 key='nvidiafb'
modprobe: DEBUG: ../libkmod/libkmod-module.c:441 kmod_module_unref() kmod_module 0x7fcd9e7e6810 released
modprobe: DEBUG: ../libkmod/libkmod.c:408 kmod_pool_del_module() del 0x7fcd9e7e6810 key='i2c_algo_bit'
modprobe: DEBUG: ../libkmod/libkmod-module.c:441 kmod_module_unref() kmod_module 0x7fcd9e7e26f0 released
modprobe: DEBUG: ../libkmod/libkmod.c:408 kmod_pool_del_module() del 0x7fcd9e7e26f0 key='fb_ddc'
modprobe: DEBUG: ../libkmod/libkmod-module.c:441 kmod_module_unref() kmod_module 0x7fcd9e7e6540 released
modprobe: DEBUG: ../libkmod/libkmod.c:408 kmod_pool_del_module() del 0x7fcd9e7e6540 key='vgastate'
modprobe: INFO: ../libkmod/libkmod.c:321 kmod_unref() context 0x7fcd9e7e22e0 released
installed the boot script listed under ‘device node verification’ here http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#runfile-verifications , which fails as above.
-
changed grub to do a text-only boot (tho that happens anyway with these remote machines) , rebooted, same story - no /dev/nvidia
-
re-ran runfile ,result as before is
Driver: Installed
Toolkit: Installed in /usr/local/cuda-7.5
Samples: Installed in /root, but missing recommended libraries
- reboot, still no /dev/nvidia, and nvidia-smi still fails to communicate with driver…