CUDA 6.5 - cuDNN not working.

Hi,

I am a non-root user working on a cluster machine. The cluster has a number of GPU nodes (Tesla K20m, k40m…). The system is running Scientific Linux release 6.6 (Carbon).

The system admin has installed CUDA 5.5 and CUDA 6.5.

I have locally installed the bleeding edge version of Theano. I have also tried using the current release version and some archived version with no variation in the errors reported.

The issue I am having is getting cuDNN to work with CUDA 6.5. I know how to install cuDNN locally and initially tried V2 (as recommended by the archive cuDNN versions).

When I run theano I get the following message (using cuDNN V2.)

Using gpu device 0: Tesla K20m (CNMeM is enabled with initial size: 90.0% of memory, cuDNN Version is too old. Update to v5, was 2000.)

I have tried doing what it suggests, updating cuDNN to v5 (although on the nVidia site it says this is only for CUDA 7.5).

When running the theano code again theano recognises that cuDNN 5005 is installed, but I later get crashes in the code (that don’t occur when running the code on a CPU or GPU with cuDNN disabled).

Any advice would be greatly appreciated- if I had CUDA 7.5 installed I would understand why it was telling me that the cuDNN version is too old but I don’t! I am asking my system admin to update CUDA to 7.5 but this may take way longer than I have.

I can provide the log for the crash that occurs, however I’m not sure how relevant that is.

what GPU driver version is installed on your machine? you can get this with nvidia-smi

Hi,

Thanks for the response.

±-----------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 0000:02:00.0 Off | 0 |
| N/A 40C P0 73W / 235W | 113MiB / 11519MiB | 91% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K40m Off | 0000:03:00.0 Off | 0 |
| N/A 30C P8 20W / 235W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K40m Off | 0000:83:00.0 Off | 0 |
| N/A 35C P8 20W / 235W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K40m Off | 0000:84:00.0 Off | 0 |
| N/A 35C P8 20W / 235W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+

Since you have 352.39 loaded, which is a GPU driver that is compatible with CUDA 7.5 (are you sure CUDA 7.5 isn’t installed somewhere on that machine? Maybe its installed but not the default), it would be possible for you to load CUDA 7.5 locally in your own workspace, without your sysadmin having to make any changes to the machine setup.

You would do this by getting the cuda 7.5 runfile installer appropriate for your OS, and running it. When prompted, you would select “no” to install the GPU driver, and you would give it local directories in your own workspace when prompted for the CUDA install paths.

You would then need to set the PATH and LD_LIBRARY_PATH environment variables to point to your local copy of CUDA 7.5. After that, you should be able to use a cuDNN version that is compatible with CUDA 7.5

Apart from that approach, there aren’t any other options. The theano version you are using expects a particular cuDNN version (obviously) and that cuDNN version depends on a newer version of CUDA than what you are using. There is no way to use cuDNN v5 with CUDA 6.5.

Hi,

Thanks for the advice, I will try and install CUDA locally as you have suggested.