TensorFlow cannot find cuDNN [Ubuntu 16.04 + CUDA7.5]

I’m trying to run “$ python cifar10_train.py” in a tensorflow env in anaconda2.
Getting this output:

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:99] Couldn’t open CUDA library libcudnn.so. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1562] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
Segmentation fault (core dumped)

$ nvidia-smi
Sun May 15 23:45:12 2016
±-----------------------------------------------------+
| NVIDIA-SMI 361.42 Driver Version: 361.42 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 0000:01:00.0 On | N/A |
| 40% 29C P8 1W / 38W | 157MiB / 2047MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 915 G /usr/lib/xorg/Xorg 108MiB |
| 0 1327 G compiz 37MiB |
±----------------------------------------------------------------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

$ ls -la /usr/local/cuda/lib64/libcudnn*
lrwxrwxrwx 1 3319 users 13 Feb 9 12:48 libcudnn.so → libcudnn.so.4
lrwxrwxrwx 1 3319 users 17 Feb 9 12:48 libcudnn.so.4 → libcudnn.so.4.0.7
-rwxrwxr-x 1 3319 users 61453024 Feb 8 17:12 libcudnn.so.4.0.7
-rw-rw-r-- 1 3319 users 62025862 Feb 8 17:12 libcudnn_static.a

$ sudo nano ~/.bash_profile
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-7.5/lib64
export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda-7.5/bin:$PATH

I encountered the same problem tonight. I used Ubuntu 15.10. I guess if it is because cuda 7.5 only be compatible with Ubuntu 15.04 and 14.04?

It seems like the right way to do this (now that the Ubuntu 16.04 repo has been updated) is to use the steps detailed at https://devtalk.nvidia.com/default/topic/932554/cuda-setup-and-installation/-ubuntu-16-04-install-cuda-7-5/ and https://devtalk.nvidia.com/default/topic/926383/testing-ubuntu-16-04-for-cuda-development-awesome-integration-/.

It looks like doing the following

sudo apt-get install nvidia-cuda-toolkit
sudo apt-get install nvidia-cuda-361-updates
(download and extract cudann)
sudo cp include/cudnn.h /usr/include
sudo cp lib64/libcudnn* /usr/lib/x86_64-linux-gnu/
sudo chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn*
nvidia-smi
nvcc -V

makes things work great on the command line, but I’m not sure how to get TensorFlow (during setup via ./configure) to use this version not installed in /usr/local/cuda.

Anybody have thoughts on this? I’m tempted to create some symlinks, but that seems like a hack. Any thoughts appreciated!

And looks like rockpereira has the same install versions I have (per nvidia-smi and nvcc -V) so it’s possible that I’ll just be hit with the same problems down the road. Might be a red herring.

The Xenial+ppa repo has the CUDA toolkit. But I could not get nVidia’s cuDNN to work with that either.

To install just the driver & CUDA toolkit from the repo

Add the ppa repo
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update

Install the recommended driver (currently nvidia-364)
$ sudo ubuntu-drivers autoinstall
$ sudo reboot

Install everything else
$ sudo apt-get install nvidia-{prime,profiler,settings,visual-profiler}
$ sudo apt-get install nvidia-cuda-{dev,doc,gdb,toolkit}

The CUDA folder is
/usr/lib/nvidia-cuda-toolkit

I was able to get TensorFlow (from source) built and compiled with the Xenial Ubuntu repos. The biggest issue I had was that TensorFlow expected all the CUDA and cuDNN libraries and headers to be installed under /usr/local/cuda. To deal with that I essentially had to create temporarily symlinks, which was ugly. The positive was that I could use CUDA from the Xenial repo, cuDNN downloaded from NVIDIA and the standard gcc (5.3.1) and they played nice together. For the record I did the following:

sudo apt-get install nvidia-cuda-toolkit
sudo apt-get install nvidia-cuda-361-updates
sudo apt-get install nvidia-nsight
sudo apt-get install nvidia-profiler
sudo apt-get install libcupti-dev zlib1g-dev

# Put symlinks in /usr/local/cuda
sudo mkdir /usr/local/cuda
cd /usr/local/cuda
sudo ln -s  /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s  /usr/include/ include
sudo ln -s  /usr/bin/ bin
sudo ln -s  /usr/lib/x86_64-linux-gnu/ nvvm
sudo mkdir -p extras/CUPTI
cd extras/CUPTI
sudo ln -s  /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s  /usr/include/ include

# Install cudann
#http://askubuntu.com/questions/767269/how-can-i-install-cudnn-on-ubuntu-16-04
# Download cudann as detailed above and extract
cd ~/Downloads/cuda
sudo cp include/cudnn.h /usr/include
sudo cp lib64/libcudnn* /usr/lib/x86_64-linux-gnu/
sudo chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn*

# ... Install TensorFlow from source ...

Success!

(tf3) rock@ubuntu:~/anaconda2/envs/tf3/lib/python3.5/site-packages/tensorflow/models/image/cifar10 $ python cifar10_train.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.2545
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.79GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) → (device: 0, name: GeForce GTX 750 Ti$
2016-05-16 21:58:10.281520: step 0, loss = 4.68 (6.3 examples/sec; 20.168 sec/batch)
2016-05-16 21:58:13.245997: step 10, loss = 4.66 (616.7 examples/sec; 0.208 sec/batch)
2016-05-16 21:58:15.522023: step 20, loss = 4.64 (631.4 examples/sec; 0.203 sec/batch)
2016-05-16 21:58:17.588216: step 30, loss = 4.62 (622.5 examples/sec; 0.206 sec/batch)
2016-05-16 21:58:19.709766: step 40, loss = 4.60 (587.4 examples/sec; 0.218 sec/batch)

2016-05-16 23:01:24.944477: step 15160, loss = 0.78 (565.1 examples/sec; 0.227 sec/batch)
2016-05-16 23:01:27.175203: step 15170, loss = 0.80 (614.2 examples/sec; 0.208 sec/batch)
2016-05-16 23:01:30.004882: step 15180, loss = 0.77 (570.4 examples/sec; 0.224 sec/batch)
2016-05-16 23:01:32.958172: step 15190, loss = 0.86 (557.9 examples/sec; 0.229 sec/batch)

Chiming in to say that this also worked for me. Not the “cleanest” setup (apt regularly notifies me that /usr/lib/x86_64-linux-gnu/libcudnn.so.5 is not a symbolic link), but…it works! You definitely saved me some time, thank you for that.

If you add the graphics ppa and install the latest drivers it should work if you have the cuda library installed without any additional modification.

sudo apt install libcudart7.5

I’m using cuDNN 5.0.5 deb packages from NVIDIA’s download site (installed with “sudo dpkg -i libcudnn5*.deb”), and the Ubuntu repo versions of the rest of the cuda toolkit / libraries. Using this approach, I didn’t have to copy over the cudnn files into the system directories (the deb puts them there for you), but I did use the above symlink approach to make it look like everything is installed in /usr/local/cuda. This worked great for the “./configure” step, thanks!

“nvidia-cuda-361-updates” isn’t showing up for me in the 16.04 repo, Did you mean “nvidia-361-updates” or “libcuda1-361-updates”?

thanks

To placebo10:
You’re right. But you will need the nvidia-cuda* files.

$ sudo apt-get install nvidia-cuda-{dev,doc,gdb,toolkit}

I used the CUDA runfile downloaded from nVidia, instead of a repo install. The post is about 2 months old, and TensorFlow v0.9 is available.

https://devtalk.nvidia.com/default/topic/936429/-solved-tensorflow-with-gpu-in-anaconda-env-ubuntu-16-04-cuda-7-5-cudnn-/#4880949

I’d suggest staying away from runfiles and using .debs only.