Starting with 396.24.02 and continuing with 396.24.10 any cuda apps (including hevc_nvenc and h264_nvec) fail to run, i.e. simple deviceQuery returns error 30. hevc_nvenc, h264_nvenc and deviceQuery all work fine on 396.18.11 beta vulkan driver.
I’ve been having an issue with CUDA and the latest drivers as well, including 396.45. I thought it was an issue with my deployment/install. Glad to see I am not alone. I’ve just been reverting to older driver versions where it works. 396.24 seems to be ok but not 396.24.02.
No problems here, cuda 9.2, nvidia-driver 396.45
Error 30 in deviceQuery would point to the nvidia-uvm module not being loaded or dev nodes not being created.
Thanks for the reply. Did a bit more digging and in my case see the nvidia-uvm module being loaded with 396.24 but not with 396.45. I was able to load that manually with modprobe but still could not get CUDA working, probably related to the dev nodes you mentioned. I’m not too familiar with that process but it seems like that is something which should be setup with xorg initialization. I’ll do a bit more digging into this.
The nvidia-uvm module is only for cuda, so xorg has nothing to do with it. Loading that module/creating the dev nodes is specific to distro/repo. If that doesn’t work for whatever reasons, nvidia provides the suid helper nvidia-modprobe. If that doesn’t exist, the common workaround is to run deviceQuery as root once after each boot.
I ran into this problem trying to run latest dxvk (a dxd11 translation layer running with wine) simultaneous with hevc_nvenc encoding of vulkan swapchain framegrabs. Latest dxvk seems to be relying on later driver versions than 396.18.11 and has removed many bug workarounds for earlier nvidia vulkan driver so it freezes a lot in witcher 3 unless I revert to dxvk 0.52. When trying recommended 396.24.02/396.24.10 drivers though ffmpeg can’t find cuda to run hevc_nvenc or h264_nvenc. I installed latest cuda to track the problem down and no cuda programs including the simple deviceQuery work with 396.24.02 or 396.24.10, doesn’t matter if device nodes are manually created and uvm loaded manually or not. I even tried hacks of using 396.18.11 libcuda together with 396.24.10 driver and that fails with a different error than 30.
Just tested 396.45 along with dxvk 0.63, the vulkan app runs fine but cuda is broken the same as it was with 396.24.02 and 396.24.10 so ffmpeg + hevc_nvenc does not start and any cuda app such as simple deviceQuery returns error 30 even after manual modprobe of nvidia_uvm with device nodes (/dev/nvidia0 and /dev/nvidiactl) set up.
This has been my general findings after playing around with this a bit more. My main reason for going to these driver versions is DXVK (which seems to work) but then CUDA apps break. Not sure what distro you are on, Gentoo here.
Same here. I’m on Slackware 14.2, which is unsupported by Nvidia for cuda but I’m able to install cuda and get it working fine on any driver not 396.24.02, 396,24.10, or 396.45 all of which throw the error 30. I don’t use cuda apps myself, I just need hevc_nvenc in ffmpeg to work and it bombs out with the cuda init error on these drivers. 396.18.11 with dxvk 0.52 seems pretty stable on witcher 3 at least, its my current fallback until this gets resolved.
The system boots without uvm loaded. Running ‘nvidia-modprobe -u -c 0’ loads nvidia-uvm and sets up dev nodes as you listed. No errors in dmesg. CUDA apps simply do not work. Reverting to the 396.24 driver makes CUDA work again (and without the need to manually load nvidia-uvm).
I’m using a 4.17 kernel but that seems to be the only differences in our system. I’ll try going back to an older one to see if that helps any. Other than that not really sure what else to try.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version 9.2 / 9.2
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 4037 MBytes (4233035776 bytes)
( 6) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores
GPU Max Clock rate: 1418 MHz (1.42 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1, Device0 = GeForce GTX 1050 Ti
Result = PASS
*does not detect correctly - NVIDIA-Linux-x86_64-396.24.02.run
*does not detect correctly - NVIDIA-Linux-x86_64-396.37.run ( included in package cuda_9.2.148_396.37_linux.run )
*does not detect correctly - NVIDIA-Linux-x86_64-396.45.run[/b]
Having upgraded to 396.45, I’m encountering the same issue. UVM module is loaded and all device nodes are present, but deviceQuery now always returns errno 30. This is with kernel 4.17.11. Downgrading to 396.24 restored CUDA functionality.
Hm, common denominator for people having problems with 396.45 (and not .24) seems to be gentoo, can you folks confirm?
It wouldn’t be the first time that drivers broke on gentoo, this package breaks surprisingly often on gentoo, ebuilds have near zero testing as far I can tell (ebuild is a collection of semi-hardcoded copies of libraries from extracted installer to filesystem, every time Nvidia changes something in installer naming scheme, it silently breaks).
At some point I will probably get around to writing a proper test suite and submitting it to to Jeroen Roovers (package maintainer) as I’m fairly certain there is not much actual testing being done before publishing these ebuilds (vulkan, opencl and cuda all broke for me multiple times over span of last few years because of forgotten icd files or shared libraries etc). And the worst thing is that these things could have been easily prevented by having a simple test suite with trivial “hello world” applications testing basic functionality prior to releasing it to the users.
Myself and one other respondent are on Slackware 14.2, not Gentoo. 396.18.11 was the last vulkan beta which runs cuda (and ffmpeg hevc_nvenc, which needs cuda) successfully. 396.24.02, 396.24.10, and 396.45 all won’t even pass simple deviceQuery which works fine on 396.18.11 , or 396.24, or 390 series etc. ldd shows no missing lib refs in libcuda.so. I’m starting to wonder if some major change was made in the driver release such as a different compiler or some spectre/meltdown mitigations etc. which is now breaking things.
I had the same issue after upgrading.
Did some experimentation with my kernel config. Apparently .45 needs “Numa Memory Allocation and Scheduler Support”(CONFIG_NUMA) enabled, at least on my particular hardware/software combo.
Had always assumed that option was for multisocket systems.