Hello,
I have been trying to test remote direct rendering on a centos 7.3 box with a Tesla P4 on board (and I should add an integrated graphic chip on the host motherboard). While I can connect a screen on this box for debugging purposes this should be a headless (remote) server which I would then use from a local computer using either a VNC client or vglconnect.
Before going much further in the details my question is:
Can I do remote direct rendering on a Tesla P4?
Assuming that I could here are the unsuccessful step I took so far:
- installed centos 7.3 with:
“Server with GUI”, “mate-desktop-environment”, “mate-desktop”, “xfce-desktop”
and:
systemctl enable gdm.service
systemctl set-default graphical.target
Do I actually need the graphical.target as default? Even though this is a headless server?
- installed NVIDIA driver with:
rpm -i nvidia-diag-driver-local-repo-rhel7-390.30-1.0-1.x86_64.rpm
yum clean all
yum install cuda-drivers
reboot
- run nvidia-xconfig
nvidia-xconfig --use-display-device=none --busid="PCI:1:0:0" --virtual=1280x1024
- enable the Direct Rendering Manager Kernel Modesetting:
modprobe -r nvidia-drm ; modprobe nvidia-drm modeset=1
Do I actually need this step?
- installed VirtualGL, TurboVNC and TigerVNC (so far however I have only tried TigerVNC)
stopped the GDM and run:
vglserver_config
- restarted gdm.service or even attempted to start /usr/bin/X :0
in either case X tries to start but fails…
systemctl status gdm.service
● gdm.service - GNOME Display Manager
Loaded: loaded (/usr/lib/systemd/system/gdm.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-03-12 17:12:17 PDT; 1h 10min ago
Process: 3081 ExecStartPost=/bin/bash -c TERM=linux /usr/bin/clear > /dev/tty1 (code=exited, status=0/SUCCESS)
Main PID: 3078 (gdm)
CGroup: /system.slice/gdm.service
└─3078 /usr/sbin/gdm
Mar 12 17:12:19 myserver.server.com gdm[3078]: GdmDisplay: display lasted 1.000655 seconds
Mar 12 17:12:20 myserver.server.com gdm[3078]: Child process 3106 was already dead.
Mar 12 17:12:20 myserver.server.com gdm[3078]: GdmDisplay: display lasted 0.962944 seconds
Mar 12 17:12:21 myserver.server.com gdm[3078]: Child process 3110 was already dead.
Mar 12 17:12:21 myserver.server.com gdm[3078]: GdmDisplay: display lasted 0.939461 seconds
Mar 12 17:12:22 myserver.server.com gdm[3078]: Child process 3123 was already dead.
Mar 12 17:12:22 myserver.server.com gdm[3078]: GdmDisplay: display lasted 0.959484 seconds
Mar 12 17:12:23 myserver.server.com gdm[3078]: Child process 3127 was already dead.
Mar 12 17:12:23 myserver.server.com gdm[3078]: GdmDisplay: display lasted 0.958884 seconds
Mar 12 17:12:23 myserver.server.com gdm[3078]: GdmLocalDisplayFactory: maximum number of X display failures reached: check X server log for errors
The relevant part of Xorg.0.log follows:
[ 3817.519] (II) Loading sub module "fb"
[ 3817.519] (II) LoadModule: "fb"
[ 3817.519] (II) Loading /usr/lib64/xorg/modules/libfb.so
[ 3817.519] (II) Module fb: vendor="X.Org Foundation"
[ 3817.519] compiled for 1.17.2, module version = 1.0.0
[ 3817.519] ABI class: X.Org ANSI C Emulation, version 0.4
[ 3817.519] (II) Loading sub module "wfb"
[ 3817.519] (II) LoadModule: "wfb"
[ 3817.520] (II) Loading /usr/lib64/xorg/modules/libwfb.so
[ 3817.520] (II) Module wfb: vendor="X.Org Foundation"
[ 3817.520] compiled for 1.17.2, module version = 1.0.0
[ 3817.520] ABI class: X.Org ANSI C Emulation, version 0.4
[ 3817.520] (II) Loading sub module "ramdac"
[ 3817.520] (II) LoadModule: "ramdac"
[ 3817.520] (II) Module "ramdac" already built-in
[ 3817.521] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[ 3817.521] (==) NVIDIA(0): RGB weight 888
[ 3817.521] (==) NVIDIA(0): Default visual is TrueColor
[ 3817.521] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[ 3817.521] (**) NVIDIA(0): Option "UseDisplayDevice" "None"
[ 3817.521] (**) NVIDIA(0): Enabling 2D acceleration
[ 3817.521] (**) NVIDIA(0): Option "UseDisplayDevice" set to "none"; enabling NoScanout
[ 3817.521] (**) NVIDIA(0): mode
[ 3817.521] (EE) NVIDIA(0): Failed to initialize the GLX module; please check in your X
[ 3817.521] (EE) NVIDIA(0): log file that the GLX module has been loaded in your X
[ 3817.521] (EE) NVIDIA(0): server, and that the module is the NVIDIA GLX module. If
[ 3817.521] (EE) NVIDIA(0): you continue to encounter problems, Please try
[ 3817.521] (EE) NVIDIA(0): reinstalling the NVIDIA driver.
[ 3818.188] (EE) NVIDIA(GPU-0): UseDisplayDevice "None" is not supported with GRID
[ 3818.188] (EE) NVIDIA(GPU-0): displayless
[ 3818.188] (EE) NVIDIA(GPU-0): Failed to select a display subsystem.
Of course I can connect with TigerVNC and vglconnect but on TigerVNC I get indirect rendering (I test it with glxinfo) while vglconnect I get:
vglrun glxinfo
name of display: localhost:10.0
[VGL] ERROR: Could not open display :0.
Here is the output of nvidia-smi:
Mon Mar 12 17:54:55 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:01:00.0 Off | 0 |
| N/A 71C P0 25W / 75W | 0MiB / 7611MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Here is my xorg.conf:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 390.30 (buildmeister@swio-display-x64-rhel04-14) Wed Jan 31 22:46:17 PST 2018
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
FontPath "/usr/share/fonts/default/Type1"
EndSection
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/input/mice"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "Tesla P4"
BusID "PCI:1:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "UseDisplayDevice" "None"
SubSection "Display"
Virtual 1280 1024
Depth 24
EndSubSection
EndSection
As you may have guessed, if you have followed insofar, I am a little bit at a lost. Is there any way I could upload the nvidia-bug-report.log.gz?
Any help would be greatly appreciated.
Thanks!
nvidia-bug-report.log (412 KB)