Over the weekend my test maching with CUDA 5.0 configured hit a bit of a snag. We were able to call nvidia-smi without any issues previously, but today I get the error:
# nvidia-smi
Failed to intialize NVML: Function not found.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:
You should always run with libnvidia-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64. libnvidia-ml.so in TDK package is a stub library that is attached only for build purposes (e.g. machine that you build your application doesn't have to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The only helpful text it gives is that I should always run with libnvidia-ml.so in /usr/lib and /usr/lib64, and the files are both in the appropriate location. I’m concerned that a user may have preformed an update that is causing the hangup, but I’m not sure where to begin searching for a solution. I can still compile and execute CUDA code, so the problem seems to be limited to nvidia-smi.
Any input on where to search for problematic files would be appreciated.