334.21 driver returns 999 on cuInit (CUDA)
I've tested the same system setup with a 331.49 driver, which returns 0 correctly. My small piece of test code: [code]#include <stdio.h> #include <dlfcn.h> int main() { void *cudalib = dlopen("libcuda.so", RTLD_NOW); int (*__cuInit)(unsigned int) = (int(*)(unsigned int)) dlsym( cudalib, "cuInit" ); int retval = (*__cuInit)(0); printf("%d", retval); }[/code] To test: [code]gcc -ldl test.c -o test ./test[/code]
I've tested the same system setup with a 331.49 driver, which returns 0 correctly.

My small piece of test code:

#include <stdio.h>
#include <dlfcn.h>
int main() {
void *cudalib = dlopen("libcuda.so", RTLD_NOW);
int (*__cuInit)(unsigned int) = (int(*)(unsigned int)) dlsym( cudalib, "cuInit" );
int retval = (*__cuInit)(0);
printf("%d", retval);
}


To test:
gcc -ldl test.c -o test
./test

#1
Posted 03/04/2014 04:35 AM   
Hmm, figured the main problem - you always have to run a cuda program as root for once, and afterwards, all cuda programs can be ran as regular user. Even manually modprobe of nvidia_uvm could not fix this, I still have to run a program (the program above for example) as root once. Any help will be really appreciated!
Hmm, figured the main problem - you always have to run a cuda program as root for once, and afterwards, all cuda programs can be ran as regular user.

Even manually modprobe of nvidia_uvm could not fix this, I still have to run a program (the program above for example) as root once.

Any help will be really appreciated!

#2
Posted 03/04/2014 10:00 AM   
I recently updated to 334.21-1 on Arch Linux. Prior to the upgrade, the CUDA 5.5 samples all ran correctly as a normal user. Since the upgrade, deviceQueryDrv emits the following error message: [code] ./deviceQueryDrv Starting... CUDA Device Query (Driver API) statically linked version /usr/bin/nvidia-modprobe: unrecognized option: "-u" ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for usage information. cuInit(0) returned 999 -> CUDA_ERROR_UNKNOWN Result = FAIL [/code] When running deviceQueryDrv as root, I get the following slightly different output: [code] ./deviceQueryDrv Starting... CUDA Device Query (Driver API) statically linked version modprobe: FATAL: Module nvidia-uvm not found. cuInit(0) returned 999 -> CUDA_ERROR_UNKNOWN Result = FAIL [/code] Of note here is the apparently missing nvidia-uvm kernel module. Other threads in this forum mention that this module is unused - perhaps this changed with 334.21-1?
I recently updated to 334.21-1 on Arch Linux. Prior to the upgrade, the CUDA 5.5 samples all ran correctly as a normal user. Since the upgrade, deviceQueryDrv emits the following error message:

./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version
/usr/bin/nvidia-modprobe: unrecognized option: "-u"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for usage
information.

cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL


When running deviceQueryDrv as root, I get the following slightly different output:

./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version
modprobe: FATAL: Module nvidia-uvm not found.
cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL


Of note here is the apparently missing nvidia-uvm kernel module. Other threads in this forum mention that this module is unused - perhaps this changed with 334.21-1?

#3
Posted 03/04/2014 07:20 PM   
works without sudo with cuda 6.0 rc EDIT: runs without root the deviceQueryDrv sample
works without sudo with cuda 6.0 rc

EDIT: runs without root the deviceQueryDrv sample

#4
Posted 03/04/2014 08:05 PM   
Just applied for Cuda developer access to get at the RC. I'll reply back when I try it.
Just applied for Cuda developer access to get at the RC. I'll reply back when I try it.

#5
Posted 03/04/2014 09:22 PM   
The reason it works after running as root is root has the right to create a device node. Once it's created, users can run programs-- but only because by default it's owned by root, group root, world read/writable... seriously? I'm in Funtoo so I first added nvidia_uvm to /etc/conf.d/modules thus it's always loaded but the node doesn't get created. I also have a local script (/etc/local.d/nv_smi_pm.start) where I switch on persistent mode so I added these lines to it: [code]mknod -m 660 /dev/nvidia-uvm c 249 0 chgrp video /dev/nvidia-uvm [/code] now everything works. I suppose you could write a proper udev rule but I'm not on that. Update: I just discovered nvidia-modprobe. If you run it as root: [code]nvidia-modprobe -c0 -u[/code] it loads the module and creates the node just as it would be auto-created... the --help indicates it was meant to be setuid in order to work for everyone but package maintainers might have other ideas. Those default permissions are terribly DoS-happy.
The reason it works after running as root is root has the right to create a device node. Once it's created, users can run programs-- but only because by default it's owned by root, group root, world read/writable... seriously? I'm in Funtoo so I first added nvidia_uvm to /etc/conf.d/modules thus it's always loaded but the node doesn't get created. I also have a local script (/etc/local.d/nv_smi_pm.start) where I switch on persistent mode so I added these lines to it:

mknod -m 660 /dev/nvidia-uvm c 249 0
chgrp video /dev/nvidia-uvm


now everything works. I suppose you could write a proper udev rule but I'm not on that.

Update:
I just discovered nvidia-modprobe. If you run it as root:
nvidia-modprobe -c0 -u
it loads the module and creates the node just as it would be auto-created... the --help indicates it was meant to be setuid in order to work for everyone but package maintainers might have other ideas. Those default permissions are terribly DoS-happy.

#6
Posted 03/09/2014 08:39 AM   
The device node should really be created by nvidia-uvm module itself. I've made a wrong udev rule that works: [code]KERNEL=="nvidia_uvm", RUN+="/usr/bin/bash -c '/usr/bin/mknod -m 660 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0; /usr/bin/chgrp video /dev/nvidia-uvm'" [/code] Please, Nvidia, fix this!
The device node should really be created by nvidia-uvm module itself. I've made a wrong udev rule that works:

KERNEL=="nvidia_uvm", RUN+="/usr/bin/bash -c '/usr/bin/mknod -m 660 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \  -f 1) 0; /usr/bin/chgrp video /dev/nvidia-uvm'"


Please, Nvidia, fix this!

#7
Posted 03/27/2014 03:46 PM   
I used a similar rule under Ubuntu 14.04, just ran into this after I decided to install driver 337.12 from xorg-edgers. My first issue was that the kernel 3.12 patch to the uvm module was outdated in the xorg-edgers repo of the driver, so I kept getting a module build error... so I did the changes manually to the file and compiled with: [code]dkms install -m nvidia-337-uvm/337.12[/code] Next, I realized that I had this issue that CUDA programs work only after sudo... so I tried the rule felixonmars posted, and for me it seems to need the 666 permissions, otherwise I still get the same issue. I also manually add nvidia & nvidia-uvm to /etc/modules and do an rm /dev/nvidia-uvm before I recreate it. I also don't need the chgrp video line. Also, on Ubuntu 14.04 mknod and chgrp are in /bin, not /usr/bin Just figured I'd add this here in case someone else is struggling with this...
I used a similar rule under Ubuntu 14.04, just ran into this after I decided to install driver 337.12 from xorg-edgers.

My first issue was that the kernel 3.12 patch to the uvm module was outdated in the xorg-edgers repo of the driver, so I kept getting a module build error... so I did the changes manually to the file and compiled with:

dkms install -m nvidia-337-uvm/337.12

Next, I realized that I had this issue that CUDA programs work only after sudo... so I tried the rule felixonmars posted, and for me it seems to need the 666 permissions, otherwise I still get the same issue. I also manually add nvidia & nvidia-uvm to /etc/modules and do an rm /dev/nvidia-uvm before I recreate it. I also don't need the chgrp video line. Also, on Ubuntu 14.04 mknod and chgrp are in /bin, not /usr/bin

Just figured I'd add this here in case someone else is struggling with this...

#8
Posted 04/11/2014 05:19 AM   
For anyone trying to figure out how to fix the patch failure: I just edited /usr/src/nvidia-337-uvm-337.12/dkms.conf and commented out the line [code]PATCH[0]="buildfix_kernel_3.12.patch"[/code] and then run the dkms comment from comment #8. I'm running saucy with a 3.11 kernel.
For anyone trying to figure out how to fix the patch failure: I just edited /usr/src/nvidia-337-uvm-337.12/dkms.conf and commented out the line
PATCH[0]="buildfix_kernel_3.12.patch"

and then run the dkms comment from comment #8.

I'm running saucy with a 3.11 kernel.

#9
Posted 04/11/2014 01:14 PM   
Scroll To Top

Add Reply