cuDNN fails with CUDNN_STATUS_INTERNAL_ERROR on MNIST sample execution

My System:

OS: Ubuntu 16.04
GPU: GTX 1080
CUDA: 8.0.61
cuDNN: 6.0.21

I’ve installed CUDA / cuDNN in the following installation routine:

#downloaded from <a target='_blank' rel='noopener noreferrer' href='https://developer.nvidia.com/cuda-80-ga2-download-archive'>https://developer.nvidia.com/cuda-80-ga2-download-archive</a>
sudo dpkg -i cuda.deb
sudo apt update
sudo apt install cuda
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

#downloaded from <a target='_blank' rel='noopener noreferrer' href='https://developer.nvidia.com/cuda-80-ga2-download-archive'>https://developer.nvidia.com/cuda-80-ga2-download-archive</a>
sudo dpkg -i cuda-patch.deb
sudo apt update
sudo apt upgrade

#downloaded from <a target='_blank' rel='noopener noreferrer' href='https://developer.nvidia.com/rdp/cudnn-download'>https://developer.nvidia.com/rdp/cudnn-download</a>
sudo dpkg -i cudnn.deb
sudo dpkg -i cudnn-dev.deb
sudo dpkg -i cudnn-doc.deb

To test CUDA I use the following routine:

cd /usr/local/cuda/samples
sudo make clean && sudo make -j$(nproc) -Wno-deprecated-gpu-targets
cd bin/x86_64/linux/release
./deviceQuery
./bandwithTest

Both test result in PASS.

To test cuDNN I use the following routine from http://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#verify:

cd /usr/src/cudnn_samples_v6/mnistCUDNN
sudo make clean && sudo make -j$(nproc) -Wno-deprecated-gpu-targets
./mnistCUDNN

This fails with the following output:

cudnnGetVersion() : 6021 , CUDNN_VERSION from cudnn.h : 6021 (6.0.21)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 20  Capabilities 6.1, SmClock 1797.0 Mhz, MemSize (Mb) 8107, MemClock 5005.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
CUDNN failure
Error: CUDNN_STATUS_INTERNAL_ERROR
mnistCUDNN.cpp:394
Aborting...

Can someone tell me if I did something wrong or how to fix this?

what happens if you run the mnistCUDNN test as root?
Do you still get the same error?

With root privileges the cuDNN test routine passes.

You’re not supposed to have to do this. That was just a diagnostic test.

If you’re still having trouble running it as an ordinary user, there are maybe a few things to check:

  1. [url]CUDNN_STATUS_INTERNAL_ERROR when using cudnn7.0 with CUDA 8.0 - CUDA Setup and Installation - NVIDIA Developer Forums

  2. [url]cuda - Tensorflow only works under root after drivers update - Stack Overflow

You are on the right track. I only wanted to check my cuDNN installation, because I was running in the same error mentioned in the StackOverflow thread in the first place.

Could you elaborate on the accepted answer?

My cuDNN test routine still fails, if I simply add

sudo usermod -a -G nvidia-persistenced $USER

to my installation routine.

Any update on this? Unfortunately I can’t comment on the StackOverflow thread due to unmet reputation requirements.

I have the same issues…

OS: Ubuntu 16.04
GPU: GTX 1080 Ti
CUDA: Cuda compilation tools, release 9.0, V9.0.176
cuDNN: 7.1.3
driver version: NVIDIA-SMI 384.111 Driver Version: 384.111

I have passed CUDA test
./deviceQuery
./bandwithTest

I need to add sudo to pass mnistCUDNN test

No use for the following command:
sudo usermod -a -G nvidia-persistenced $USER

I could not use CuDNN to run Tensorflow now.

Please advice!

This bug persists for me to this day. Thus, I would also appreciate an answer after all this time.

My ‘work around’ is just to launch the IDE or the script with root privileges. This opens up a lot more problems, but at least I can run training on the GPU.

My issue was resolved by using the following steps: (Don’t know which one fixed the issue through)

  1. log out and cool-reboot/shutdown
  2. sudo rm -rf .nv/
  3. sudo usermod -a -G nvidia-persistenced $USER
  4. log out and cool-reboot/shutdown again

This indeed fixed the bug for me. How did you find the solution? Would have been nice to know 6 months ago when I encountered it.

from the link NVIDIA mentioned:

You’re not supposed to have to do this. That was just a diagnostic test.

If you’re still having trouble running it as an ordinary user, there are maybe a few things to check:

  1. https://devtalk.nvidia.com/default/topic/1024761/cuda-setup-and-installation/cudnn_status_internal_error-when-using-cudnn7-0-with-cuda-8-0/

  2. cuda - Tensorflow only works under root after drivers update - Stack Overflow
    ===
    Also, from web, find out “log out” indeed did some help better than hot-reboot or cool-reboot.

1 Like

I don’t think the reboot alone will do much good. I had this problem for about half a year and shut the PC down countless times. I think the clearing of the cache is the critical part. But that is on me. I completely missed it in the other thread. Anyway, thanks for the info.