[435.17] Am I observing the proper results with the new PRIME?

Hey there,

I just installed the 435.17 beta drivers on my Razer Blade 15" (mid-2019 Base Model, with RTX 2060). They seem to work fine so far, and power management is enabled (0x02).

I’m currently running with no external screen, and on a regular X session. However, I’m curious about the output of nvidia-smi:

Sat Aug 17 11:30:27 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.17       Driver Version: 435.17       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   39C    P8     7W /  N/A |    370MiB /  5934MiB |      7%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2822      G   /usr/lib/xorg/Xorg                            97MiB |
|    0      2967      G   /usr/bin/gnome-shell                         141MiB |
|    0      3416      G   ...uest-channel-token=16562083045299223019   130MiB |
+-----------------------------------------------------------------------------+

Why are the Xorg/gnome-shell processes running on the NVIDIA GPU? Aren’t they supposed to run on the iGPU (intel), unless explicitely ran with the __NV_PRIME_RENDER_OFFLOAD=1 flag?

glxinfo gives similar odd results:

$ glxinfo | grep vendor
server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation

Again, am I not supposed to see the Intel driver used here, unless explicitely running it with NV_PRIME_RENDER_OFFLOAD?

My Xorg config just has:

Section "ServerLayout"
  Identifier "layout"
  Option "AllowNVIDIAGPUScreens"
EndSection

xrandr seems to properly list providers:

Providers: number : 2
Provider 0: id: 0x27d cap: 0x1, Source Output crtcs: 4 outputs: 7 associated providers: 1 name:NVIDIA-0
Provider 1: id: 0x43 cap: 0x6, Sink Output, Source Offload crtcs: 3 outputs: 1 associated providers: 1 name:modesetting

Both

i915.modeset=1

and

nvidia_drm.modeset=1

are set.

Am I missing something, or is this the expected behavior?

Thanks!

I dug a little bit more my Xorg config, and it looks indeed like Xorg was only loading the nvidia module, but not the intel/modesetting one. After installing i965-va-driver, turning off nvidia_drm.modeset, and changing my Xorg config to this:

Section "ServerLayout"
        Identifier "layout"
        Screen 0  "Screen0"
        Option "AllowNVIDIAGPUScreens"
EndSection

Section "Monitor"
        Identifier   "Monitor0"
        VendorName   "Monitor Vendor"
        ModelName    "Monitor Model"
EndSection

Section "Device"
        Identifier  "Card0"
        Driver      "modesetting"
        BusID       "PCI:0:2:0"
EndSection

Section "Device"
        Identifier  "Card1"
        Driver      "nvidia"
        BusID       "PCI:1:0:0"
EndSection


Section "Screen"
        Identifier "Screen0"
        Device     "Card0"
        Monitor    "Monitor0"
EndSection

It looks like it now works somewhat properly:

$ glxinfo | grep vendor
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
OpenGL vendor string: Intel Open Source Technology Center

With PRIME:

$ __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor
server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation

So, this seems to work as intended so far. However, nvidia-smi still has references to Xorg:

$ nvidia-smi
Sat Aug 17 13:31:25 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.17       Driver Version: 435.17       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   38C    P8     1W /  N/A |     17MiB /  5934MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     18943      G   /usr/lib/xorg/Xorg                            15MiB |
+-----------------------------------------------------------------------------+

Perhaps it is the expected behavior, since Xorg loads the glxserver_nvidia module, so it must hold some reference to the GPU. However, I am under the impression that it prevents the GPU from turning off completely, as it seems to stay in “1W” power usage mode.

Digging into powertop stats, it looks like the /sys/bus/pci/devices/0000:01:00.0/power/control device isn’t set to ‘auto’, but to ‘on’, which seems to prevent the GPU from sleeping (as expected and explained in the docs). However, the udev rules doesn’t seem to set that runtime PM value automatically on this device.

Setting it to ‘auto’, the GPU indeed seems to be powered off:

$ nvidia-smi
Sat Aug 17 13:35:25 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.17       Driver Version: 435.17       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   38C    P3    N/A /  N/A |     17MiB /  5934MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     18943      G   /usr/lib/xorg/Xorg                            15MiB |
+-----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:01:00.0

All good for me, hope this helps other who might be in the same conditions as me. It may also help expanding the docs a bit :)

I haven’t tested plugging an external monitor yet, but I’ll keep you tuned.

Thanks for FINALLY doing this Nvidia, I was starting to lose hope :)

If you manage to figure out the udev issue for /sys/bus/pci/devices/0000:01:00.0/power/control, please post here. I’ve tried countless different ways to do this, including a late-running systemd unit. So far the only way to do it seems to be manually.

I’ve successfully gotten to the same point you have here, but the thing that’s of interest to me is that nvidia-smi is reporting P3 as the power state (and N/A as the power draw), as well as the infoROM warning at the bottom. We should be seeing something in the P12-P15 range, so I’m assuming this is actually a bug in the driver. It takes about 2 seconds to query when running nvidia-smi, so I’m relatively certain it has to wake up the card, and the power management is actually working, it’s just odd.

@SenojEkul: I’ll check it out sometime in the future

@g.prime: Yes I guess it’s a glitch in the driver, but power consumption seems good

I however have an issue with external (HDMI) displays. When the driver is in offload mode (ie. you have an NVIDIA-G0 provider, and not and NVIDIA-0), it actually reports 0 outputs despite listing DFP-0 to DFP-5 in Xorg log.

When running on the regular, non offloaded driver, outputs are working as intended.

Anyone have an idea?

@xplodgui,

Having the NVIDIA device load as a GPU screen (i.e. the NVIDIA-G0 provider) and display a desktop rendered by a different provider is RandR’s “display offload sink” capability, also sometimes referred to as “reverse PRIME”. Acting as a display offload sink is currently not supported.

@aplattner,

Thanks for your feedback, that’s what I supposed. Is this something planned in the near future? That would make the driver 100% perfect for all Optimus laptops.

Thanks!

@aplattner

Is there some kind of user feedback area where we can say we would like this feature? In my case, my laptop works well, but I have only one display output and my Thunderbolt 3 eGPU is plugged in to it. I am very happy that I can render from my eGPU to my laptop display, but I would be even happier to be able to use my external monitor (plugged in to my eGPU) without having to launch a separate X session on it.