Cuda broken in 396.24.02 and 396.24.10 Vulkan beta drivers on Linux

Starting with 396.24.02 and continuing with 396.24.10 any cuda apps (including hevc_nvenc and h264_nvec) fail to run, i.e. simple deviceQuery returns error 30. hevc_nvenc, h264_nvenc and deviceQuery all work fine on 396.18.11 beta vulkan driver.

I’ve been having an issue with CUDA and the latest drivers as well, including 396.45. I thought it was an issue with my deployment/install. Glad to see I am not alone. I’ve just been reverting to older driver versions where it works. 396.24 seems to be ok but not 396.24.02.

No problems here, cuda 9.2, nvidia-driver 396.45
Error 30 in deviceQuery would point to the nvidia-uvm module not being loaded or dev nodes not being created.

Thanks for the reply. Did a bit more digging and in my case see the nvidia-uvm module being loaded with 396.24 but not with 396.45. I was able to load that manually with modprobe but still could not get CUDA working, probably related to the dev nodes you mentioned. I’m not too familiar with that process but it seems like that is something which should be setup with xorg initialization. I’ll do a bit more digging into this.

The nvidia-uvm module is only for cuda, so xorg has nothing to do with it. Loading that module/creating the dev nodes is specific to distro/repo. If that doesn’t work for whatever reasons, nvidia provides the suid helper nvidia-modprobe. If that doesn’t exist, the common workaround is to run deviceQuery as root once after each boot.

I ran into this problem trying to run latest dxvk (a dxd11 translation layer running with wine) simultaneous with hevc_nvenc encoding of vulkan swapchain framegrabs. Latest dxvk seems to be relying on later driver versions than 396.18.11 and has removed many bug workarounds for earlier nvidia vulkan driver so it freezes a lot in witcher 3 unless I revert to dxvk 0.52. When trying recommended 396.24.02/396.24.10 drivers though ffmpeg can’t find cuda to run hevc_nvenc or h264_nvenc. I installed latest cuda to track the problem down and no cuda programs including the simple deviceQuery work with 396.24.02 or 396.24.10, doesn’t matter if device nodes are manually created and uvm loaded manually or not. I even tried hacks of using 396.18.11 libcuda together with 396.24.10 driver and that fails with a different error than 30.

Just tested 396.45 along with dxvk 0.63, the vulkan app runs fine but cuda is broken the same as it was with 396.24.02 and 396.24.10 so ffmpeg + hevc_nvenc does not start and any cuda app such as simple deviceQuery returns error 30 even after manual modprobe of nvidia_uvm with device nodes (/dev/nvidia0 and /dev/nvidiactl) set up.

This has been my general findings after playing around with this a bit more. My main reason for going to these driver versions is DXVK (which seems to work) but then CUDA apps break. Not sure what distro you are on, Gentoo here.

Same here. I’m on Slackware 14.2, which is unsupported by Nvidia for cuda but I’m able to install cuda and get it working fine on any driver not 396.24.02, 396,24.10, or 396.45 all of which throw the error 30. I don’t use cuda apps myself, I just need hevc_nvenc in ffmpeg to work and it bombs out with the cuda init error on these drivers. 396.18.11 with dxvk 0.52 seems pretty stable on witcher 3 at least, its my current fallback until this gets resolved.

Hm, I haven’t had problems with older versions of 396 branch, but .45 broke cuda for me (fails device enumeration).

For the moment I’m back on .24 driver.

Like said, WFM.
Gentoo, kernel 4.9, cuda 9.2, driver 396.45
Please check if nvidia-modprobe throws any errors:
nvidia-modprobe -u -c 0
should load the uvm module and create the correct /dev nodes:
crw-rw---- 1 root video 195, 0 23. Jul 12:08 /dev/nvidia0
crw-rw---- 1 root video 195, 255 23. Jul 12:08 /dev/nvidiactl
crw-rw---- 1 root video 195, 254 23. Jul 12:08 /dev/nvidia-modeset
crw-rw-rw- 1 root root 249, 0 23. Jul 12:09 /dev/nvidia-uvm
crw-rw-rw- 1 root root 249, 1 23. Jul 12:09 /dev/nvidia-uvm-tools

The system boots without uvm loaded. Running ‘nvidia-modprobe -u -c 0’ loads nvidia-uvm and sets up dev nodes as you listed. No errors in dmesg. CUDA apps simply do not work. Reverting to the 396.24 driver makes CUDA work again (and without the need to manually load nvidia-uvm).

I’m using a 4.17 kernel but that seems to be the only differences in our system. I’ll try going back to an older one to see if that helps any. Other than that not really sure what else to try.

I’m running 4.14.50 on Slackware. ffmpeg/hevc_nvenc and deviceQuery work fine with 396.18.11 with no manual modprobe :

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Vers
Result = PASS
bash-4.3$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Jul 22 19:30 /dev/nvidia-modeset
crw-rw-rw- 1 root root 243, 0 Jul 22 19:45 /dev/nvidia-uvm
crw-rw-rw- 1 root root 243, 1 Jul 22 21:58 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195, 0 Jul 22 19:30 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Jul 22 19:30 /dev/nvidiactl

Using manual modprobe on 396.45 :
nvidia-modprobe -u -c 0

ls /dev/nvidia*
/dev/nvidia-modeset /dev/nvidia-uvm-tools /dev/nvidiactl
/dev/nvidia-uvm /dev/nvidia0

./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
→ unknown error
Result = FAIL

ffmpeg / hevc_nvenc also fail with cuda init error.

Hello.

( sorry bad english )

Use Slackware64-14.2 kernel-4.17.9

Same problem mentioned in previous posts.

Later versions of NVIDIA-Linux-x86_64-396.24.run do not recognize cuda .

Below is some information.

NVIDIA-Linux-x86_64-396.45.run

nvidia-modprobe -u -c 0

ls -l /dev | grep -i nvidia

crw-rw-rw-  1 root        root      195,   0 Jul 23 09:10 nvidia0
crw-rw-rw-  1 root        root      195, 255 Jul 23 09:10 nvidiactl
crw-rw-rw-  1 root        root      195, 254 Jul 23 09:10 nvidia-modeset
crw-rw-rw-  1 root        root      511,   0 Jul 23 09:12 nvidia-uvm
crw-rw-rw-  1 root        root      511,   1 Jul 23 09:12 nvidia-uvm-tool

$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

./deviceQuery

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

NVIDIA-Linux-x86_64-396.24.run

ls -l /dev | grep -i nvidia

crw-rw-rw-  1 root        root      195,   0 Jul 23 09:28 nvidia0
crw-rw-rw-  1 root        root      195, 255 Jul 23 09:28 nvidiactl
crw-rw-rw-  1 root        root      195, 254 Jul 23 09:28 nvidia-modeset
crw-rw-rw-  1 root        root      511,   0 Jul 23 09:29 nvidia-uvm
crw-rw-rw-  1 root        root      511,   1 Jul 23 09:29 nvidia-uvm-tool

$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

./deviceQuery

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1050 Ti"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 4037 MBytes (4233035776 bytes)
  ( 6) Multiprocessors, (128) CUDA Cores/MP:     768 CUDA Cores
  GPU Max Clock rate:                            1418 MHz (1.42 GHz)
  Memory Clock rate:                             3504 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1, Device0 = GeForce GTX 1050 Ti
Result = PASS

[b]*detect correctly - NVIDIA-Linux-x86_64-396.24.run

*does not detect correctly - NVIDIA-Linux-x86_64-396.24.02.run
*does not detect correctly - NVIDIA-Linux-x86_64-396.37.run ( included in package cuda_9.2.148_396.37_linux.run )
*does not detect correctly - NVIDIA-Linux-x86_64-396.45.run[/b]

Hello, these kernel versions also fail with 396.45 (posting so others don’t waste time trying them)

4.9.114 (previous longterm)
4.14.50 (current longterm)
4.16.18 (EOL)

Having upgraded to 396.45, I’m encountering the same issue. UVM module is loaded and all device nodes are present, but deviceQuery now always returns errno 30. This is with kernel 4.17.11. Downgrading to 396.24 restored CUDA functionality.

Hm, common denominator for people having problems with 396.45 (and not .24) seems to be gentoo, can you folks confirm?

It wouldn’t be the first time that drivers broke on gentoo, this package breaks surprisingly often on gentoo, ebuilds have near zero testing as far I can tell (ebuild is a collection of semi-hardcoded copies of libraries from extracted installer to filesystem, every time Nvidia changes something in installer naming scheme, it silently breaks).

At some point I will probably get around to writing a proper test suite and submitting it to to Jeroen Roovers (package maintainer) as I’m fairly certain there is not much actual testing being done before publishing these ebuilds (vulkan, opencl and cuda all broke for me multiple times over span of last few years because of forgotten icd files or shared libraries etc). And the worst thing is that these things could have been easily prevented by having a simple test suite with trivial “hello world” applications testing basic functionality prior to releasing it to the users.

/rant

Myself and one other respondent are on Slackware 14.2, not Gentoo. 396.18.11 was the last vulkan beta which runs cuda (and ffmpeg hevc_nvenc, which needs cuda) successfully. 396.24.02, 396.24.10, and 396.45 all won’t even pass simple deviceQuery which works fine on 396.18.11 , or 396.24, or 390 series etc. ldd shows no missing lib refs in libcuda.so. I’m starting to wonder if some major change was made in the driver release such as a different compiler or some spectre/meltdown mitigations etc. which is now breaking things.

@trayb: could be.

Edit: tested .45 installed via nvidias installer and stable kernel with same result (after reverting back to .24 cuda works once again).

I had the same issue after upgrading.
Did some experimentation with my kernel config. Apparently .45 needs “Numa Memory Allocation and Scheduler Support”(CONFIG_NUMA) enabled, at least on my particular hardware/software combo.
Had always assumed that option was for multisocket systems.