Hi, we’ve been trying various driver versions (both RPM and .run) on our EL6/7 Dell R740xds with no success. I’ll paste a bunch of output below but ultimately it seems as if the driver’s half working. The card is detected, but there’s a lot of output that doesn’t make sense and X won’t load.
cat /proc/driver/nvidia/gpus/0000:3b:00.0/information
Model: Tesla P100-PCIE-12GB
IRQ: 324
GPU UUID: GPU-????????-????-????-????-????????????
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:3b:00.0
Device Minor: 0
dmesg -T | grep -i -e nvidia -e nvrm
[Tue Dec 5 12:19:34 2017] nvidia: loading out-of-tree module taints kernel.
[Tue Dec 5 12:19:34 2017] nvidia: module license 'NVIDIA' taints kernel.
[Tue Dec 5 12:19:34 2017] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[Tue Dec 5 12:19:34 2017] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
[Tue Dec 5 12:19:34 2017] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.98 Thu Oct 26 15:16:01 PDT 2017 (using threaded interrupts)
[Tue Dec 5 12:19:34 2017] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 384.98 Thu Oct 26 14:41:13 PDT 2017
[Tue Dec 5 12:19:35 2017] [drm] [nvidia-drm] [GPU ID 0x00003b00] Loading driver
[Tue Dec 5 12:20:15 2017] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 238
[Tue Dec 5 12:20:15 2017] nvidia 0000:3b:00.0: irq 324 for MSI/MSI-X
From Xorg.0.log
[ 47.728] (II) NVIDIA dlloader X Driver 384.98 Thu Oct 26 14:06:45 PDT 2017
[ 47.728] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 47.728] (++) using VT number 1
[ 47.730] (EE) No devices detected.
[ 47.730] (EE)
Fatal server error:
[ 47.730] (EE) no screens found(EE)
[ 47.730] (EE)
I’ve tried 375, 381 and 384 drivers. I’ve also updated the R740xd to the latest BIOS available and run the NVIDIA Firmware Update Utility (v5.402.0) from Dell’s support site. I’ve tried using the version of the driver downloaded from both Dell’s support site and from NVIDIA’s site directly.
Since nvidia-smi was working I was wondering whether there was just something funky going on with the /proc output but with X still not being happy I’m not sure where to go next.
Without a monitor you have to add
Option “AllowEmptyInitialConfiguration”
to the device section of xorg.conf.
The pci busid in xorg.conf is decimal, not hexadecimal, 3b=59
Thank you. I’m still not quite in (seems like some noMachine issues) but I do notice that having corrected the Bus ID and adding that conf option, the /proc/driver output is now correct.
Should X have to be running for the contents of /proc/driver/nvidia to be complete and correct?
I’ll fight with noMachine a bit further and hopefully reply shortly saying all’s well.
Just to clarify for the first few posts: This is expected behavior. The proc interface is created as soon as the nvidia.ko kernel module is loaded, but some of the data isn’t queried from the GPU until the driver is actually initialized. That’s why you’ll see the relevant information if something else (such as an X server or nvidia-persistenced) is keeping the /dev/nvidia* devices open, and question marks otherwise. The reason it works with nvidia-smi is that it opens /dev/nvidia*, queries the information, and then closes it.
If you want to keep the GPU initialized all the time even if no other clients are using it, that’s what nvidia-persistenced is for.