nvidia-smi programmatically
Hi all,
I need to retrieve programmatically some informations retrieved by nvidia-smi,
I prefer avoid to fork/exec an nvidia-smi run and or strace it and perform the same
ioctl calls. It seems that the struct cudaDeviceProp doesn't contains all needed such as:
fan speed, temperature, gpu utilization and such.

Am I missing something?
Hi all,

I need to retrieve programmatically some informations retrieved by nvidia-smi,

I prefer avoid to fork/exec an nvidia-smi run and or strace it and perform the same

ioctl calls. It seems that the struct cudaDeviceProp doesn't contains all needed such as:

fan speed, temperature, gpu utilization and such.



Am I missing something?

#1
Posted 04/13/2012 03:44 PM   
You should take a look at NVML:

http://developer.nvidia.com/nvidia-management-library-nvml
You should take a look at NVML:



http://developer.nvidia.com/nvidia-management-library-nvml

#2
Posted 04/13/2012 03:45 PM   
Thank you, exactly what I was looking for.
Thank you, exactly what I was looking for.

#3
Posted 04/13/2012 05:53 PM   
[quote name='mfatica' date='13 April 2012 - 05:45 PM' timestamp='1334331949' post='1395856']
You should take a look at NVML:

http://developer.nvidia.com/nvidia-management-library-nvml
[/quote]

I did try today to use it but I'm getting a driver mismatch.

Calling the nvmlInit() indeed I get this error:

Error: API mismatch: the NVIDIA kernel module has version 295.41,
but this NVIDIA driver component has version 295.45. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.

I'm using CUDA4.2 and on the download page the driver proposed is
indeed the one I have installed: 295.41 and even looking in the
ftp site: I have found 295.40 and 295.49 no trace of 295.45 needed
by NVML.

Any chance to have the NVML version for the current driver 295.41
or at least the driver 295.45 required by NVML?
[quote name='mfatica' date='13 April 2012 - 05:45 PM' timestamp='1334331949' post='1395856']

You should take a look at NVML:



http://developer.nvidia.com/nvidia-management-library-nvml





I did try today to use it but I'm getting a driver mismatch.



Calling the nvmlInit() indeed I get this error:



Error: API mismatch: the NVIDIA kernel module has version 295.41,

but this NVIDIA driver component has version 295.45. Please make

sure that the kernel module and all NVIDIA driver components

have the same version.



I'm using CUDA4.2 and on the download page the driver proposed is

indeed the one I have installed: 295.41 and even looking in the

ftp site: I have found 295.40 and 295.49 no trace of 295.45 needed

by NVML.



Any chance to have the NVML version for the current driver 295.41

or at least the driver 295.45 required by NVML?

#4
Posted 05/04/2012 03:50 PM   
Hey kalman,

I installed the r295 driver locally on a 64 bit machine, but I wasn't able to reproduce the error.

[code]
[0 ralexander@ralexander-test:~]
$ nvidia-smi
Fri May 4 13:20:33 2012
+------------------------------------------------------+
| NVIDIA-SMI 3.295.41 Driver Version: 295.41 |
|-------------------------------+----------------------+----------------------+
...
[/code]

Can you run:
$ which nvidia-smi

It seems possible that an earlier version of nvidia-smi is on your path.

Can you also try reinstalling the r295.41 driver?

Thanks,
Robert Alexander
Hey kalman,



I installed the r295 driver locally on a 64 bit machine, but I wasn't able to reproduce the error.





[0 ralexander@ralexander-test:~]

$ nvidia-smi

Fri May 4 13:20:33 2012

+------------------------------------------------------+

| NVIDIA-SMI 3.295.41 Driver Version: 295.41 |

|-------------------------------+----------------------+----------------------+

...




Can you run:

$ which nvidia-smi



It seems possible that an earlier version of nvidia-smi is on your path.



Can you also try reinstalling the r295.41 driver?



Thanks,

Robert Alexander

#5
Posted 05/04/2012 08:41 PM   
[quote name='Robert Alexander' date='04 May 2012 - 10:41 PM' timestamp='1336164070' post='1404389']
Hey kalman,

I installed the r295 driver locally on a 64 bit machine, but I wasn't able to reproduce the error.

[code]
[0 ralexander@ralexander-test:~]
$ nvidia-smi
Fri May 4 13:20:33 2012
+------------------------------------------------------+
| NVIDIA-SMI 3.295.41 Driver Version: 295.41 |
|-------------------------------+----------------------+----------------------+
...
[/code]

Can you run:
$ which nvidia-smi

It seems possible that an earlier version of nvidia-smi is on your path.

Can you also try reinstalling the r295.41 driver?

Thanks,
Robert Alexander
[/quote]


The problem arise not using the nvidia-smi. I'm trying to use the NVML, and the first
instruction to do is to call the nvmlInit() on my own application, the error I'm getting
is indeed not related to nvidia-smi (it's working fine).

I did download tdk_2.295.1_linux.tar.gz from here http://developer.nvidia.com/nvidia-management-library-nvml
as suggested by mfatica (see post above), I did copy the header nvml.h and the library
libnvidia-ml.so present in the package (no installer?), I was able to correctly compile
(well apart a comma at the end of an enumerator list present in the nvml.h) and link correctly
but at runtime I get the error about driver mismatch, it seems that libnvidia-ml.so was built
against another driver (despite the fact the archive is named tdk_2.295.1).

Further investigating on it I did a ldd on nvidia-smi and it seems that in my system
there is already an libnvidia-ml.so (in /usr/lib), so I guess there was no need to use
the libraries inside the tdk_2.295.1_linux.tar.gz but only the header nvml.h is required
from that archive (why do not distribute it with cuda toolkit then?). Deleting the
libnvidia-ml.so copied from tdk now my application doesn't complain anymore (at least
calling the nvmlInit).

While we are at it, running nvidia-smi --help I see the list of supported devices:

Supported products:

Tesla: S1070, S2050, C1060, C2050/70/75, M2050/70/75/90, X2070/90, Gemini


what is Gemini ?
[quote name='Robert Alexander' date='04 May 2012 - 10:41 PM' timestamp='1336164070' post='1404389']

Hey kalman,



I installed the r295 driver locally on a 64 bit machine, but I wasn't able to reproduce the error.





[0 ralexander@ralexander-test:~]

$ nvidia-smi

Fri May 4 13:20:33 2012

+------------------------------------------------------+

| NVIDIA-SMI 3.295.41 Driver Version: 295.41 |

|-------------------------------+----------------------+----------------------+

...




Can you run:

$ which nvidia-smi



It seems possible that an earlier version of nvidia-smi is on your path.



Can you also try reinstalling the r295.41 driver?



Thanks,

Robert Alexander







The problem arise not using the nvidia-smi. I'm trying to use the NVML, and the first

instruction to do is to call the nvmlInit() on my own application, the error I'm getting

is indeed not related to nvidia-smi (it's working fine).



I did download tdk_2.295.1_linux.tar.gz from here http://developer.nvidia.com/nvidia-management-library-nvml

as suggested by mfatica (see post above), I did copy the header nvml.h and the library

libnvidia-ml.so present in the package (no installer?), I was able to correctly compile

(well apart a comma at the end of an enumerator list present in the nvml.h) and link correctly

but at runtime I get the error about driver mismatch, it seems that libnvidia-ml.so was built

against another driver (despite the fact the archive is named tdk_2.295.1).



Further investigating on it I did a ldd on nvidia-smi and it seems that in my system

there is already an libnvidia-ml.so (in /usr/lib), so I guess there was no need to use

the libraries inside the tdk_2.295.1_linux.tar.gz but only the header nvml.h is required

from that archive (why do not distribute it with cuda toolkit then?). Deleting the

libnvidia-ml.so copied from tdk now my application doesn't complain anymore (at least

calling the nvmlInit).



While we are at it, running nvidia-smi --help I see the list of supported devices:



Supported products:



Tesla: S1070, S2050, C1060, C2050/70/75, M2050/70/75/90, X2070/90, Gemini





what is Gemini ?

#6
Posted 05/05/2012 07:00 PM   
Hi kalman,

Sorry for the confusion. You should always [b]run[/b] with libnvida-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.

libnvida-ml.so in TDK package is a stub library that is attached only for [b]build[/b] purposes (e.g. machine that you build your application doesn't have to have Display Driver installed).

We'll add a note to the README to make it more clear.

Since NVML is targeted at Tesla customers and CUDA Toolkit is big enough without it we've decided to keep it separate.

Regrads,
Przemyslaw Zych
Hi kalman,



Sorry for the confusion. You should always run with libnvida-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.



libnvida-ml.so in TDK package is a stub library that is attached only for build purposes (e.g. machine that you build your application doesn't have to have Display Driver installed).



We'll add a note to the README to make it more clear.



Since NVML is targeted at Tesla customers and CUDA Toolkit is big enough without it we've decided to keep it separate.



Regrads,

Przemyslaw Zych

#7
Posted 05/07/2012 11:22 AM   
Scroll To Top