334.21 driver returns 999 on cuInit (CUDA)

I’ve tested the same system setup with a 331.49 driver, which returns 0 correctly.

My small piece of test code:

#include <stdio.h>
#include <dlfcn.h>
int main() {
  void *cudalib = dlopen("libcuda.so", RTLD_NOW);
  int (*__cuInit)(unsigned int) = (int(*)(unsigned int)) dlsym( cudalib, "cuInit" );
  int retval = (*__cuInit)(0);
  printf("%d", retval);
}

To test:

gcc -ldl test.c -o test
./test

Hmm, figured the main problem - you always have to run a cuda program as root for once, and afterwards, all cuda programs can be ran as regular user.

Even manually modprobe of nvidia_uvm could not fix this, I still have to run a program (the program above for example) as root once.

Any help will be really appreciated!

I recently updated to 334.21-1 on Arch Linux. Prior to the upgrade, the CUDA 5.5 samples all ran correctly as a normal user. Since the upgrade, deviceQueryDrv emits the following error message:

./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
/usr/bin/nvidia-modprobe: unrecognized option: "-u"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for usage
       information.

cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL

When running deviceQueryDrv as root, I get the following slightly different output:

./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
modprobe: FATAL: Module nvidia-uvm not found.
cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL

Of note here is the apparently missing nvidia-uvm kernel module. Other threads in this forum mention that this module is unused - perhaps this changed with 334.21-1?

works without sudo with cuda 6.0 rc

EDIT: runs without root the deviceQueryDrv sample

Just applied for Cuda developer access to get at the RC. I’ll reply back when I try it.

The reason it works after running as root is root has the right to create a device node. Once it’s created, users can run programs-- but only because by default it’s owned by root, group root, world read/writable… seriously? I’m in Funtoo so I first added nvidia_uvm to /etc/conf.d/modules thus it’s always loaded but the node doesn’t get created. I also have a local script (/etc/local.d/nv_smi_pm.start) where I switch on persistent mode so I added these lines to it:

mknod -m 660 /dev/nvidia-uvm c 249 0
chgrp video /dev/nvidia-uvm

now everything works. I suppose you could write a proper udev rule but I’m not on that.

Update:
I just discovered nvidia-modprobe. If you run it as root:

nvidia-modprobe -c0 -u

it loads the module and creates the node just as it would be auto-created… the --help indicates it was meant to be setuid in order to work for everyone but package maintainers might have other ideas. Those default permissions are terribly DoS-happy.

The device node should really be created by nvidia-uvm module itself. I’ve made a wrong udev rule that works:

KERNEL=="nvidia_uvm", RUN+="/usr/bin/bash -c '/usr/bin/mknod -m 660 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \  -f 1) 0; /usr/bin/chgrp video /dev/nvidia-uvm'"

Please, Nvidia, fix this!

I used a similar rule under Ubuntu 14.04, just ran into this after I decided to install driver 337.12 from xorg-edgers.

My first issue was that the kernel 3.12 patch to the uvm module was outdated in the xorg-edgers repo of the driver, so I kept getting a module build error… so I did the changes manually to the file and compiled with:

dkms install -m nvidia-337-uvm/337.12

Next, I realized that I had this issue that CUDA programs work only after sudo… so I tried the rule felixonmars posted, and for me it seems to need the 666 permissions, otherwise I still get the same issue. I also manually add nvidia & nvidia-uvm to /etc/modules and do an rm /dev/nvidia-uvm before I recreate it. I also don’t need the chgrp video line. Also, on Ubuntu 14.04 mknod and chgrp are in /bin, not /usr/bin

Just figured I’d add this here in case someone else is struggling with this…

For anyone trying to figure out how to fix the patch failure: I just edited /usr/src/nvidia-337-uvm-337.12/dkms.conf and commented out the line

PATCH[0]="buildfix_kernel_3.12.patch"

and then run the dkms comment from comment #8.

I’m running saucy with a 3.11 kernel.