PRIME and PRIME Synchronization

Good news! The latest 370.23 beta driver release (http://www.nvidia.com/Download/driverResults.aspx/105855/en-us) contains initial, experimental support for PRIME Synchronization! For reasons explained below the functionality can’t be officially supported yet, but if you’re brave enough, all the pieces are there to try it out.

I noticed that there is some confusion about what exactly PRIME is and how it works. In addition to explaining how to set up PRIME Synchronization, I’m taking this opportunity to clear up some questions and confusion. If you have any more questions, ask away.

Enjoy!

What is PRIME?

PRIME is a collection of features in the Linux kernel, X server, and various drivers to enable GPU offloading with multi-GPU configurations under Linux. It was initially conceived to allow one GPU to display output rendered by another GPU, such as in laptops with both a discrete GPU and an integrated GPU (e.g., NVIDIA Optimus-enabled laptops).

Why is PRIME necessary?

When you imagine how Optimus works, you probably envision something like “GPU switching,” where there would be two GPUs and a hardware switch. The switch, or multiplexer (mux), would allow you to change which GPU drives the screen. When you start an intensive game, it would switch the display to the higher power discrete NVIDIA GPU and use it until you are done playing.

When it comes to GPU switching on Macs or older GPU switching PCs, you would be more or less correct, but modern Optimus-enabled PCs use something known as a “muxless” configuration: there is no switch. Instead, only the integrated GPU is connected to the display, and the NVIDIA GPU is floating, connected only to the system memory. Without a software solution, there would be no way to display the output from the NVIDIA GPU. The goal of PRIME is to allow the NVIDIA GPU to share its output with the integrated GPU so that it can be presented to the display.

Because PRIME requires the integrated and discrete GPUs to work in tandem to display the intended output, it cannot simply be a feature of any one driver, but the Linux graphics ecosystem as a whole.

How does PRIME work?

At a high level, features in the Linux kernel’s Direct Rendering Manager (https://en.wikipedia.org/wiki/Direct_Rendering_Manager#DMA_Buffer_Sharing_and_PRIME) enable drivers to exchange system memory buffers with each other in a vendor-neutral format. Userspace can leverage this functionality in a variety of ways to share rendering results between drivers and their respective GPUs.

The X server presents two methods for sharing rendering results between drivers: “output,” and “offload.” If you use the proprietary NVIDIA driver with PRIME, you’re probably most familiar with “output.”

“Output” allows you to use the discrete GPU as the sole source of rendering, just as it would be in a traditional desktop configuration. A screen-sized buffer is shared from the dGPU to the iGPU, and the iGPU does nothing but present it to the screen.

“Offload” attempts to mimic more closely the functionality of Optimus on Windows. Under normal operation, the iGPU renders everything, from the desktop to the applications. Specific 3D applications can be rendered on the dGPU, and shared to the iGPU for display. When no applications are being rendered on the dGPU, it may be powered off. NVIDIA has no plans to support PRIME render offload at this time.

How do I set up PRIME with the NVIDIA driver?

The best way to think about PRIME “output” mode is that it allows the iGPU’s displays to be configured as if they belonged to the dGPU.

With that in mind, there are a few steps that need to be taken to take advantage of this functionality:

  • If you’re setting up PRIME on an Optimus laptop, there are likely no displays available on the dGPU. By default, the NVIDIA X driver will bail out if there are no displays available. In order to ensure that the X server can start with no heads connected directly to the dGPU, you must add ‘Option “AllowEmptyInitialConfiguration”’ to the “Screen” section of xorg.conf.
  • By default, your xorg.conf will be set up such that only the NVIDIA driver will be loaded. In order to ensure that the iGPU’s heads are available for configuration, the ‘modesetting’ driver must be specified. See the README link below for an example.
  • The X server must be told to configure iGPU displays using PRIME. This can be done using the ‘xrandr’ command line tool, via ‘xrandr –setprovideroutputsource modesetting NVIDIA-0’. If this fails, you can verify the available graphics devices using ‘xrandr –listproviders’.

More detailed instructions can be found in the README (Chapter 32. Offloading Graphics Display with RandR 1.4) .

Once PRIME is configured as described above, the iGPU’s heads may be configured as if they were the dGPU’s with any RandR-aware tool, be it ‘xrandr’ or a distro-provided graphical tool. The easiest way to get something to display is ‘xrandr –auto’, but more complicated configuration is possible as well.

Most likely, you’re going to want a script to do the xrandr/RandR configuration automatically on startup of the X server. If you’re using ‘startx’ this can be done simply by adding the xrandr commands to your .xinitrc. If you’re using a display manager such as LightDM, there are more specific instructions to get it to run the xrandr commands at startup. The Arch Linux Wiki (https://wiki.archlinux.org/index.php/NVIDIA_Optimus#Display_Managers) has a good section on the topic. If you’re using Ubuntu, Canonical provides a set of scripts enabled by the ‘nvidia-prime’ package that allow you to easily switch PRIME on and off using an added menu in ‘nvidia-settings’, but these scripts are neither provided nor officially supported by NVIDIA.

What is “PRIME Synchronization” about?

A lack of synchronization at a critical step in the pipeline has resulted in ugly artifacts under PRIME configurations for years, and I’ve been working to fix it. Let me explain:


Simplified Pipeline with OpenGL VSync

In a normal desktop configuration, games and other applications render into the GPU’s video memory, and the dGPU display engine pipes the result to the display, most commonly refreshing at 60 Hz. Without something commonly known as “vsync” to synchronize the application’s rendering to the screen’s refresh interval, you can’t be sure that the screen won’t refresh to half of one frame, and half of another. When this happens, you get an ugly artifact known as tearing (Screen tearing - Wikipedia). Fortunately, it’s a solved problem.


Simplified Pipeline with OpenGL VSync and PRIME Synchronization

With PRIME, there’s an extra step. Games and other applications continue to render into the dGPU’s video memory, but the final result needs to be placed into the shared buffer in system memory so that it can be scanned out by the iGPU’s display engine. Traditional vsync can synchronize the rendering of the application with the copy into system memory, but there needs to be an additional mechanism to synchronize the copy into system memory with the iGPU’s display engine. Such a mechanism would have to involve communication between the dGPU’s and the iGPU’s drivers, unlike traditional vsync.

Up until recently, the Linux kernel and X server lacked the required functionality to allow the dGPU and iGPU drivers to communicate and synchronize the copy with the scanout. Because of this limitation, there was virtually nothing any one driver could do to provide the necessary synchronization; it required improvements to the greater ecosystem.

Over the past many months, I’ve been working to implement and upstream the necessary improvements to the X server and iGPU kernel and userspace drivers so that we could leverage them from within our driver. Finally, they have landed (PRIME Synchronization & Double Buffering Land In The X.Org Server - Phoronix). Unfortunately, the changes required breaking the binary interface (ABI) between the X server and its drivers, so it may be a while before it propagates to mainstream distros.

How does PRIME Synchronization work?

It’s not much different than vsync. Rather than sharing just one screen-sized buffer from the dGPU driver to the iGPU driver, we share two (https://en.wikipedia.org/wiki/Multiple_buffering#Double_buffering_in_computer_graphics). The iGPU driver asks for the dGPU driver to copy the dGPU’s current X screen contents into the iGPU’s buffer that is hidden from view. When the copy operation completes, the iGPU flips to display the updated buffer. When the iGPU driver notices that the screen has refreshed and the updated buffer is being displayed, it starts the whole process again with the now-hidden buffer. This way, we can ensure that dGPU driver never has to copy into a buffer that is currently being displayed, eliminating the chance for tearing.

Of course nothing is ever that simple to implement in practice, but conceptually it’s nothing new.

How do I set up PRIME Synchronization?

If all of the requirements for PRIME Synchronization are fulfilled, it is enabled automatically.

To support PRIME Synchronization, the system needs:

  • Linux kernel 4.5 or higher
  • An X server with ABI 23 or higher (as yet officially unreleased, use commit 2a79be9)
  • Compatible drivers

The “modesetting” driver tracks the X server, so the driver shipped with an X server of ABI 23 or higher will be compatible when run against Intel iGPUs.

The latest 370.23 beta driver release (http://www.nvidia.com/Download/driverResults.aspx/105855/en-us) contains an initial implementation of PRIME Synchronization. Because the X ABI has yet to be frozen, however, it is subject to change. Any change to the ABI will break compatibility with the NVIDIA driver, so we cannot officially support the new functionality until the ABI is frozen. If you wish to test the new features despite them being experimental, the latest r370 driver release supports X servers built from Git commit 2a79be9. X servers with video driver ABI 23 built from other commits may or may not work. It’s best to stick to the commit that the driver has been built against.

To start X with an unofficially supported ABI (ABI 23 included), add the following section to your xorg.conf:

Section “ServerFlags”
    Option “IgnoreABI” “1”
EndSection

The NVIDIA driver’s PRIME Synchronization support relies on DRM-KMS, which is disabled by default due to its current incompatibility with SLI. To enable it, run ‘sudo rmmod nvidia-drm; sudo modprobe nvidia-drm modeset=1’. In other words, load the nvidia-drm module with the parameter modeset=1.

If the above requirements are fulfilled, PRIME Synchronization should be enabled. The functionality is still experimental, so it’s possible that there may be kinks to work out.

If PRIME Synchronization is enabled, OpenGL applications can synchronize to the iGPU’s heads as they would with a normal dGPU head, and the names of PRIME heads can be specified via __GL_SYNC_DISPLAY_DEVICE.

If for whatever reason you have support for PRIME Synchronization but wish to disable it, you may disable it via ‘xrandr –output --set “PRIME Synchronization” 0’ and re-enable it via ‘xrandr –output --set “PRIME Synchronization” 1’.

“Offload” attempts to mimic more closely the functionality of Optimus on Windows. Under normal operation, the iGPU renders everything, from the desktop to the applications. Specific 3D applications can be rendered on the dGPU, and shared to the iGPU for display. When no applications are being rendered on the dGPU, it may be powered off."

Will we see this be implemented in the future? There are several issues with the Output option. I don’t want to logout and log back in every time I want to do anything on the dGPU, and when I tested it on Ubuntu 16.04 with nvidia-prime, it didn’t even turn the dGPU off after I switched back to the Intel GPU.

Please make this a priority :/

1 Like

Alex,

First off, I’m very glad to see this released, I’ve been watching the other thread here for a long time. Thanks for the hard work.

Were there any changes required to xorg.conf for Optimus laptops? I haven’t been able to get the latest release running with both dGPU and iGPU outputs (or both modesetting and NVIDIA-0 xrandr providers).
To be clear, PRIME works on this laptop with older driver/xorg versions.

I’m running Arch, and I built xorg-server, xorg-xrandr, etc. from git. I checked out the exact commit you specified from the xserver repo.

xrandr --listproviders with the suggested xorg config:

Providers: number : 2
Provider 0: id: 0x86 cap: 0x2, Sink Output crtcs: 3 outputs: 2 associated providers: 0 name:modesetting
Provider 1: id: 0x44 cap: 0x0 crtcs: 4 outputs: 1 associated providers: 0 name:modesetting

dmesg shows nvidia-modeset is doing its thing:

[  451.506110] nvidia-modeset: Allocated GPU:0 (GPU-3f2b6e11-0497-faba-b749-9aba663089a2) @ PCI:0000:01:00.0
[  451.556706] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[  451.556709] [drm] No driver support for vblank timestamp query.

X seems to decide nvidia and nv are not needed, so it unloads them and fails to load GLX (full log linked below):

[  3562.233] (EE) Failed to initialize GLX extension (Compatible NVIDIA X driver not found)

libglx belongs to nvidia:

/usr/lib/xorg/modules/extensions/libglx.so is owned by nvidia-libgl-beta 370.23-1
/usr/lib/xorg/modules/extensions/libglx.so.1 is owned by nvidia-libgl-beta 370.23-1
/usr/lib/xorg/modules/extensions/libglx.so.370.23 is owned by nvidia-libgl-beta 370.23-1
/usr/lib/xorg/modules/extensions/libglx.xorg is owned by xorg-server-git 1.18.0.485.r15570.g2a79be9-1

My X and nvidia packages:

lib32-libxxf86vm 1.1.4-1
lib32-nvidia-libgl-beta 370.23-1
lib32-nvidia-utils-beta 370.23-1
lib32-opencl-nvidia-beta 370.23-1
libxxf86dga 1.1.4-1
libxxf86vm 1.1.4-1
nvidia-beta 370.23-1
nvidia-gdk 352.55-1
nvidia-libgl-beta 370.23-1
nvidia-utils-beta 370.23-1
opencl-nvidia-beta 370.23-1
xf86dgaproto 2.1-3
xf86driproto 2.1.1-3
xf86vidmodeproto 2.3.1-3
xorg-bdftopcf 1.0.5-1
xorg-font-util 1.3.1-1
xorg-font-utils 7.6-4
xorg-fonts-alias 1.0.3-1
xorg-fonts-encodings 1.0.4-4
xorg-fonts-misc 1.0.3-4
xorg-mkfontdir 1.0.7-2
xorg-mkfontscale 1.1.2-1
xorg-server-common-git 1.18.0.485.r15570.g2a79be9-1
xorg-server-devel-git 1.18.0.485.r15570.g2a79be9-1
xorg-server-git 1.18.0.485.r15570.g2a79be9-1
xorg-server-xdmx-git 1.18.0.485.r15570.g2a79be9-1
xorg-server-xephyr-git 1.18.0.485.r15570.g2a79be9-1
xorg-server-xnest-git 1.18.0.485.r15570.g2a79be9-1
xorg-server-xvfb-git 1.18.0.485.r15570.g2a79be9-1
xorg-server-xwayland-git 1.18.0.485.r15570.g2a79be9-1
xorg-setxkbmap 1.3.1-1
xorg-util-macros 1.19.0-1
xorg-xauth 1.0.9-1
xorg-xbacklight 1.2.1-1
xorg-xdpyinfo 1.3.2-1
xorg-xev 1.2.2-1
xorg-xinit-git 1:1.3.4.r12.g4525e14-1
xorg-xinput 1.6.2-1
xorg-xkbcomp 1.3.1-1
xorg-xkill 1.0.4-1
xorg-xmessage 1.0.4-2
xorg-xmodmap 1.0.9-1
xorg-xprop 1.2.2-1
xorg-xrandr-git 1.5.0-1
xorg-xrdb 1.1.0-2
xorg-xset 1.2.3-1

Additionally, I temporarily installed xf86-input-libinput and xf86-video-nv from git (master) manually.

Here’s my nvidia-bug-report tarball: https://drive.google.com/file/d/0B6lJHFvpa2H6OWlzaFRDSlNRUUU/view?usp=sharing

Let me know if I can provide any further information.

Thanks!

Isn’t this where libglvnd https://github.com/NVIDIA/libglvnd comes in ? The app decides which GPU to render on, calls libglvnd to load the dGPU GL libraries and switches on the dGPU (some bbswitch magic needed or a kernel module). Once the app shuts down, the driver can call the kernel module (or bbswitch) to switch off the dGPU or enter power saving mode. The driver should know this as it can tell the dGPU’s utilization via “nvidia-settings”

Hmm, interesting nshp. If I’m reading it correctly, it looks like the server is getting confused about which GPU should be the real X screen and which should be the GPU screen. I’ll have to see if I can replicate that problem myself. Does it work if you use

Load "intel"

rather than

Load "modesetting"

?

Liahkim112, libglvnd helps for the client side, but it still needs to talk to a server-side GLX implementation. This thread is about PRIME display synchronization, though. If you’d like to discuss PRIME render offload more, let’s please start a new thread for that.

With intel instead of modesetting, X just uses the intel driver and unloads modesetting and nvidia.

Xorg.log with intel: http://ix.io/1fai

The only xrandr provider is Intel as well.

Thanks for the update on the driver. I am using ubuntu. Is there any instruction to build xserver for ubuntu and not to break all the things?
It will be appreciated if anyone could guide me on building the xserver.

Cool, we’re getting close! Thanks for all the hard work and for this informative post, agoins. I have a couple of questions though:

  1. Does this add additional frames of latency between dGPU and the monitor? If so, are those theoretically avoidable? (e.g. by removing unnecessary buffer copies)
  2. Tying into that, if an application waits for vsync, will it block until vsync happens on the iGPU display, or will it wait on sync with some intermediate dGPU<->iGPU system memory buffer? This probably matters for applications which rely on blocking vsync for timing purposes, such as Qt's animation system, or mpv when syncing to the display.

If you don’t want things to break, you’ll really want to wait right now. I’m not even sure whether the version of Ubuntu you’re on ships with kernel 4.5, and even if it does, a floating ABI version for the xserver is a pretty fiddly thing to deal with.

The copy is done immediately after the last vblank, which will result in roughly 1 frame of additional latency compared to doing the copy immediately before the next vblank. Without PRIME Synchronization, when you get tearing, the top is from the most recent frame, and the bottom is from the last frame. With PRIME Synchronization, you’re effectively only seeing the bottom. Input lag is only introduced relative to what you saw above the tear-line before.

Pretty typical of vsync, due to the difficulty of predicting when you need to start relative to the next frame. There are some theoretical ways around this, but there are limitations with what we can do with the iGPU’s display engine. We don’t have any plans for it at this time.

Without PRIME Synchronization, it simply does a copy at 60 Hz with no linkage to the display; I think you’ll find that there is no noticeable input lag compared to before. Half the time, you were looking primarily at the previous frame. Now it’s consistent, and should fix the jerkiness along with the tearing. On displays with refresh rates above 60 Hz, it should be much better as it will now match the iGPU display’s refresh rate.

Applications cannot block directly on the iGPU’s vblank, so there is a mechanism inside the driver that facilitates it. They will swap after the dGPU finishes compositing into the video memory buffer, which happens as soon as possible after the iGPU’s vblank, and before the copy into the system memory buffer. From the application’s perspective, it shouldn’t make a difference.

Agreed. I’d be happy to help you (X.org provides something of a guide here: https://www.x.org/wiki/Development/BuildingX), but as fratti said, it will be fiddly and prone to update-related breakage until the ABI is frozen. I would recommend trying it on a distro install that you don’t care about before doing it on your primary system.

If you need some additional help with issues not directly related to PRIME Synchronization, please feel free to send me a PM.

Kernel is not an issue cause you can install any kernel precompiled for ubuntu. see here : Index of /mainline

Also the Nvidia driver 370 beta is well packaged for ubuntu at the graphics driver ppa. see here : Proprietary GPU Drivers : “Graphics Drivers” team

The only thing is building xserver that could be tricky.

Thanks for your kind answer,
There is an auto-xorg-git script from old xorg edgers ppa here : ~xorg-edgers/xorg-server/xorg-pkg-tools : files for revision 564
Maybe it helps to build it faster. I tried once and built some packages but i didn’t install them.
Your idea to install new fresh distro is better i think. I am going to have an extra partition on my ssd and install a new ubuntu and start.
I will contact you if there is a problem. Thanks again.

After reading this topic, it seems like this issue is PRIME related. Would this driver update possibly fix it?

The issues in that thread seem to stem from a known bug that manifests when mixing PRIME-driven iGPU heads with native dGPU heads. You most likely have a laptop where the built-in screen is hooked up to the iGPU, and the external monitor port is hooked up to the dGPU.

Fixes are underway, but they didn’t make it into this release. Sorry for the inconvenience.

For what it’s worth, if I try to start the Xserver (at current git 6e5bec2) with 370.23, it’ll instantly fail with

[  2191.550] (EE) Backtrace:
[  2191.550] (EE) 0: /usr/lib/xorg-server/Xorg (OsSigHandler+0x29) [0x5f9489]
[  2191.550] (EE) 1: /usr/lib/libpthread.so.0 (__restore_rt+0x0) [0x7ffbd99e307f]
[  2191.550] (EE) unw_get_proc_name failed: no unwind info found [-10]
[  2191.550] (EE) 2: /usr/lib/nvidia/xorg/modules/extensions/libglx.so (?+0x0) [0x7ffbd6bc7d91]
[  2191.551] (EE)
[  2191.551] (EE) Segmentation fault at address 0x78

However, I’m not doing anything related to this PRIME post, rather merely using bumblebee. Which works perfectly fine on Xserver 1.18.4, exactly same config and software (apart from recompiling xf86-input-evdev and xf86-video-intel).

At what commit is your 1.19-git?

(Also, not sure if it’s actually possible to tell a rootless Xserver to load modesetting, at least I get:

xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)

)

370.23 Beta is built against Git commit 2a79be9. This is mentioned in the OP, but not very prominently. I edited the original post to feature it more prominently and avoid further confusion. Additional ABI breaks have occurred since that commit, so your observed failure is expected.

Xserver 1.18.4 uses ABI 20, which has been frozen. The NVIDIA driver has official support for all currently frozen ABIs, as well as, unofficially, ABI 23 from Git commit 2a79be9. Later releases will be built against more recent commits.

What a shame. This is really the thing that has been missing from the driver for years. Please please please make this a plan. NVIDIA on laptops is dreadful without it. This is, without a doubt, the number one wished-for feature.

This thread is for discussing display offload. Please direct discussion of render offload to this thread instead.

No problem. I posted over there. Hopefully you actually follow up with tangible results. For me, and most others, the discussion about render offload is over. There’s not much more to say. Get this implemented, and you’ll have happy users. Continue to awkwardly avoid it, and you’ll be the bane of every high-end Linux laptop on the market. That’s right; you should feel important: the development decisions you make have a very tangible impact on folks. You can make a difference. We’re waiting.

Can I know what advantage I’ll get over my recent solution (bumblebee with primus bridge which works best for me so far as all I need to do is to prefix launchers for selected apps with “optirun”)?

You won’t have a hacky setup of two X-servers running.
Though if it works for you, there’s no need to switch. Currently with PRIME, the nvidia card will always be the one used for rendering.

Currently it looks like both the dGPU and the iGPU have double-buffering though, so the dGPU will copy a full frame into the iGPU’s backbuffer, which then swaps the buffers. To me it seems like the dGPU frontbuffer is equivalent to the iGPU backbuffer, which in turn will be swapped as soon as the dGPU frontbuffer copy is finished since no actual drawing on the iGPU occurs, and that there is an intermediate copy from dGPU frontbuffer to iGPU backbuffer (i.e. they’re not the same memory space). Is that just me misinterpreting things, or is there a technical reason for this?

You have to remember that there are also two memory spaces and tiling modes to contend with. The dGPU renders into a video-memory back buffer in a tiled format and then as soon as that’s done, it starts a de-tiling copy into a pitch linear iGPU back buffer in system memory. On modern GPUs, this copy can occur in the background while the dGPU starts on the next frame in video memory. Once the transfer to system memory is complete, the iGPU is told to execute a flip.