PRIME render offloading on Nvidia Optimus
As per aplattner's suggestion from [url]https://devtalk.nvidia.com/default/topic/957814/linux/prime-and-prime-synchronization/post/4953175/#4953175[/url], I've taken the liberty to open a new thread for discussing on PRIME GPU render offload feature on Optimus-based hardware. As you may know, Nvidia's current official support only allows GPU "Output" instead GPU "Offload" may be unsatisfactory as it translates into higher power consumption and heat production in laptops. [quote]"Output" allows you to use the discrete GPU as the sole source of rendering, just as it would be in a traditional desktop configuration. A screen-sized buffer is shared from the dGPU to the iGPU, and the iGPU does nothing but present it to the screen. [/quote] [quote]"Offload" attempts to mimic more closely the functionality of Optimus on Windows. Under normal operation, the iGPU renders everything, from the desktop to the applications. Specific 3D applications can be rendered on the dGPU, and shared to the iGPU for display. When no applications are being rendered on the dGPU, it may be powered off. [b]NVIDIA has no plans to support PRIME render offload at this time.[/b] [/quote] I did suggest in the PRIME and PRIME Sync thread on how PRIME render offload on Nvidia Linux could achieve feature parity with Optimus on Windows if possible. [quote]Isn't this where libglvnd [url]https://github.com/NVIDIA/libglvnd[/url] comes in ? The app decides which GPU to render on, calls libglvnd to load the dGPU GL libraries and switches on the dGPU (some bbswitch magic needed or a kernel module). Once the app shuts down, the driver can call the kernel module (or bbswitch) to switch off the dGPU or enter power saving mode. The driver should know this as it can tell the dGPU's utilization via "nvidia-settings"[/quote] In response, aplattner mentions the following : [quote]libglvnd helps for the client side, but it still needs to talk to a server-side GLX implementation[/quote] It would be really helpful for us if anyone from Nvidia at the very least inform how much work is needed should Nvidia plans to implement PRIME render offload and what are the needed components to attain render offload functionality on Linux with Optimus hardware. Thank you in advance, liahkim112
As per aplattner's suggestion from https://devtalk.nvidia.com/default/topic/957814/linux/prime-and-prime-synchronization/post/4953175/#4953175, I've taken the liberty to open a new thread for discussing on PRIME GPU render offload feature on Optimus-based hardware. As you may know, Nvidia's current official support only allows GPU "Output" instead GPU "Offload" may be unsatisfactory as it translates into higher power consumption and heat production in laptops.

"Output" allows you to use the discrete GPU as the sole source of rendering, just as it would be in a traditional desktop configuration. A screen-sized buffer is shared from the dGPU to the iGPU, and the iGPU does nothing but present it to the screen.

"Offload" attempts to mimic more closely the functionality of Optimus on Windows. Under normal operation, the iGPU renders everything, from the desktop to the applications. Specific 3D applications can be rendered on the dGPU, and shared to the iGPU for display. When no applications are being rendered on the dGPU, it may be powered off. NVIDIA has no plans to support PRIME render offload at this time.


I did suggest in the PRIME and PRIME Sync thread on how PRIME render offload on Nvidia Linux could achieve feature parity with Optimus on Windows if possible.
Isn't this where libglvnd https://github.com/NVIDIA/libglvnd comes in ? The app decides which GPU to render on, calls libglvnd to load the dGPU GL libraries and switches on the dGPU (some bbswitch magic needed or a kernel module). Once the app shuts down, the driver can call the kernel module (or bbswitch) to switch off the dGPU or enter power saving mode. The driver should know this as it can tell the dGPU's utilization via "nvidia-settings"


In response, aplattner mentions the following :
libglvnd helps for the client side, but it still needs to talk to a server-side GLX implementation


It would be really helpful for us if anyone from Nvidia at the very least inform how much work is needed should Nvidia plans to implement PRIME render offload and what are the needed components to attain render offload functionality on Linux with Optimus hardware.

Thank you in advance,
liahkim112

#1
Posted 08/17/2016 04:57 PM   
I agree. Please don't leave us in the dark about this.
I agree. Please don't leave us in the dark about this.

#2
Posted 08/17/2016 07:06 PM   
[quote]NVIDIA has no plans to support PRIME render offload at this time.[/quote] It seems difficult to clearly make the point to NVIDIA developers: [b]this is the single biggest defect in the NVIDIA driver that prevents NVIDIA cards from being usable on Linux[/b]. Most folks have laptops these days. People are starving for render offload, and it's simply absurd that it still has not been completed. The fact that there isn't a timeline for this is quite disheartening. Not many users find their way to this forum, which is unfortunate, but believe this: people want this functionality, and they want it bad. Please, get it together, and implement this. The year is 2016. It's starting to become ridiculous.
NVIDIA has no plans to support PRIME render offload at this time.


It seems difficult to clearly make the point to NVIDIA developers: this is the single biggest defect in the NVIDIA driver that prevents NVIDIA cards from being usable on Linux. Most folks have laptops these days. People are starving for render offload, and it's simply absurd that it still has not been completed. The fact that there isn't a timeline for this is quite disheartening. Not many users find their way to this forum, which is unfortunate, but believe this: people want this functionality, and they want it bad.

Please, get it together, and implement this. The year is 2016. It's starting to become ridiculous.

#3
Posted 08/20/2016 11:51 PM   
[quote=""][quote]NVIDIA has no plans to support PRIME render offload at this time.[/quote] It seems difficult to clearly make the point to NVIDIA developers: [b]this is the single biggest defect in the NVIDIA driver that prevents NVIDIA cards from being usable on Linux[/b]. Most folks have laptops these days. People are starving for render offload, and it's simply absurd that it still has not been completed. The fact that there isn't a timeline for this is quite disheartening. Not many users find their way to this forum, which is unfortunate, but believe this: people want this functionality, and they want it bad. Please, get it together, and implement this. The year is 2016. It's starting to become ridiculous.[/quote] I agree. Also there is big difference between running game in the current situation and prime render offload. Compiz (for instance) always use the dGPU to render everything so there is FPS drop in games.
said:
NVIDIA has no plans to support PRIME render offload at this time.


It seems difficult to clearly make the point to NVIDIA developers: this is the single biggest defect in the NVIDIA driver that prevents NVIDIA cards from being usable on Linux. Most folks have laptops these days. People are starving for render offload, and it's simply absurd that it still has not been completed. The fact that there isn't a timeline for this is quite disheartening. Not many users find their way to this forum, which is unfortunate, but believe this: people want this functionality, and they want it bad.

Please, get it together, and implement this. The year is 2016. It's starting to become ridiculous.


I agree. Also there is big difference between running game in the current situation and prime render offload. Compiz (for instance) always use the dGPU to render everything so there is FPS drop in games.

#4
Posted 08/21/2016 11:25 PM   
Not to mention that nouveau does this perfectly.
Not to mention that nouveau does this perfectly.

#5
Posted 08/22/2016 11:18 AM   
By implementing this functionality, we could also fix Optimus completely. The driver could then have profiles for games and applications similiar to Windows, to render them on the NVIDIA GPU automatically.
By implementing this functionality, we could also fix Optimus completely.

The driver could then have profiles for games and applications similiar to Windows, to render them on the NVIDIA GPU automatically.

#6
Posted 08/22/2016 01:42 PM   
For me, PRIME offloading is a "nice to have" kind of thing but not really a pressing need. Personally, I'll probably just use PRIME to always use the nvidia GPU once PRIME sync is fixed for the following reasons: [list] [.]nvidia's OpenGL implementation is generally speaking better and faster than Mesa[/.] [.]The dedicated nvidia GPU in my laptop is faster than the Intel GPU[/.] [.]I'm lazy and don't find myself in a spot where there are no power outlets often, so battery life isn't much of a concern[/.] [/list] It certainly would be nice though in situations where there is a graphics-intensive application running in windowed mode. That way, the compositing would be done by the integrated GPU, whereas the application is offloaded, so if the offloaded application drops a frame or two, the compositor won't be affected in how well the rest of the desktop behaves.
For me, PRIME offloading is a "nice to have" kind of thing but not really a pressing need. Personally, I'll probably just use PRIME to always use the nvidia GPU once PRIME sync is fixed for the following reasons:
  • nvidia's OpenGL implementation is generally speaking better and faster than Mesa
  • The dedicated nvidia GPU in my laptop is faster than the Intel GPU
  • I'm lazy and don't find myself in a spot where there are no power outlets often, so battery life isn't much of a concern

It certainly would be nice though in situations where there is a graphics-intensive application running in windowed mode. That way, the compositing would be done by the integrated GPU, whereas the application is offloaded, so if the offloaded application drops a frame or two, the compositor won't be affected in how well the rest of the desktop behaves.

#7
Posted 08/26/2016 07:59 PM   
I registered just to have a say on this; there are currently a few different [i]solutions[/i]. None of which are satisfactory! [.]Bumblebee - one of few solutions that works, mostly. But it's old.[/.] [.]Bumblebee + primus. Except primus is outdated, barely functional on newer distros[/.] [.]Output mode - Not a good solution for obvious reasons. I'm also completely unable to get this functioning on my GS70 Stealth unless it's to an external display[/.] [.]xrandr offload - same as above.[/.] [b]The only way for nvidia PRIME support to move forward is if Nvidia itself implements it! Nvidia have the source for their drivers, the specs, the know-how; so why the heck isn't it done already?[/b] It's been far, far too long. Linux is not a *marginal* OS anymore and deserves better support.
I registered just to have a say on this; there are currently a few different solutions. None of which are satisfactory!

  • Bumblebee - one of few solutions that works, mostly. But it's old.
  • Bumblebee + primus. Except primus is outdated, barely functional on newer distros
  • Output mode - Not a good solution for obvious reasons. I'm also completely unable to get this functioning on my GS70 Stealth unless it's to an external display
  • xrandr offload - same as above.

  • The only way for nvidia PRIME support to move forward is if Nvidia itself implements it! Nvidia have the source for their drivers, the specs, the know-how; so why the heck isn't it done already? It's been far, far too long. Linux is not a *marginal* OS anymore and deserves better support.

    #8
    Posted 09/09/2016 04:12 AM   
    I disagree. VirtualGL is broken, but primus still works very well with the latest stack.
    I disagree. VirtualGL is broken, but primus still works very well with the latest stack.

    #9
    Posted 09/09/2016 05:58 AM   
    Regardless of what current hacked solution more or less works, they're all really subpar in performance and each one has a modicum of issues. They're quite obviously _not_ the right way to be doing things. @luke-nukem is spot on: the only way this is going to happen is if NVIDIA implements it. Quite offensively, @aplattner told us to start a new thread about this, and then has completely neglected it. Is this how NVIDIA operates? Sequester & silence?
    Regardless of what current hacked solution more or less works, they're all really subpar in performance and each one has a modicum of issues. They're quite obviously _not_ the right way to be doing things. @luke-nukem is spot on: the only way this is going to happen is if NVIDIA implements it.

    Quite offensively, @aplattner told us to start a new thread about this, and then has completely neglected it. Is this how NVIDIA operates? Sequester & silence?

    #10
    Posted 09/09/2016 02:17 PM   
    Sorry I haven't been able to reply earlier. I mostly asked for this thread to keep render offload discussion out of the thread about display offload so people trying to get display offload to work could use that thread. Render offload is quite complicated, so I don't want to set any false expectations. It's something we're looking into, but I can't promise anything or comment on it beyond that.
    Sorry I haven't been able to reply earlier. I mostly asked for this thread to keep render offload discussion out of the thread about display offload so people trying to get display offload to work could use that thread.

    Render offload is quite complicated, so I don't want to set any false expectations. It's something we're looking into, but I can't promise anything or comment on it beyond that.

    Aaron Plattner
    NVIDIA Linux Graphics

    #11
    Posted 09/09/2016 06:07 PM   
    Thanks for your response @aplattner. It's nice to hear that you guys *are* interested in implementing it. I believe you will be successful! In case it wasn't already obvious, the overhead of bumblebee/virtualgl/primus isn't really acceptable. Intel Card: 36639 frames in 5.0 seconds = 7327.653 FPS NVIDIA Card w/VirtualGL: 19152 frames in 5.0 seconds = 3830.227 FPS NVIDIA CARD w/Primus: 29772 frames in 5.0 seconds = 5954.200 FPS # lspci|grep VGA 00:02.0 VGA compatible controller: Intel Corporation HD Graphics P530 (rev 06) 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M2000M] (rev a2)
    Thanks for your response @aplattner. It's nice to hear that you guys *are* interested in implementing it. I believe you will be successful!

    In case it wasn't already obvious, the overhead of bumblebee/virtualgl/primus isn't really acceptable.


    Intel Card:
    36639 frames in 5.0 seconds = 7327.653 FPS

    NVIDIA Card w/VirtualGL:
    19152 frames in 5.0 seconds = 3830.227 FPS

    NVIDIA CARD w/Primus:
    29772 frames in 5.0 seconds = 5954.200 FPS


    # lspci|grep VGA
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics P530 (rev 06)
    01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M2000M] (rev a2)

    #12
    Posted 09/09/2016 08:38 PM   
    [quote="aplattner"]Sorry I haven't been able to reply earlier. I mostly asked for this thread to keep render offload discussion out of the thread about display offload so people trying to get display offload to work could use that thread. Render offload is quite complicated, so I don't want to set any false expectations. It's something we're looking into, but I can't promise anything or comment on it beyond that.[/quote] I came to this thread only after reading the one regarding display offload. Don't get me wrong, we really, really appreciate the efforts put in to the drivers for us. But we're not okay with being *un-equal* or *second-class* to Windows in implementation support. Perhaps Windows makes off-loading easier? I don't know, I don't pretend to. But I do know that the Linux situation isn't acceptable. Let me attempt to describe the absurdity of this situation to you; I just spent 2.5 days tiddling with a few distros and their implementations (Ubuntu, openSUSE, Manjaro, Sabayon). * Ubuntu - Using their own solution which is made up of a few python scripts, a binary program (for gfx detection), some variables in various files in /etc/, and it's a log in/out situation. Basically we'd call it Ubuntu-PRIME. - You need to log out then in if you switch between Intel/Nvidia. - No vsync using Nvidia (since fixed in git, thanks guys!). - No power off for powersaving. - Bumblebee breaks Ubuntu-PRIME. - No easy offloading or dynamic switching. * openSUSE -A somewhat hostile to proprietary drivers distro. DKMS isn't standard and they don't see a need for it (Grrr). There are a few hacked together solutions; bumblebee through 3rd party repo, an [Ubuntu-PRIME](https://build.opensuse.org/project/show/home:bosim:suse-prime) like solution (which I've failed to get working), or using [nvidia-xrun](https://github.com/Witko/nvidia-xrun) which requires using a tty to use the script to start a new xserver using nvidia drivers. None, and I mean *none* of these are satisfactory. - Bumblebee as always, bad performance. Unmaintained (mostly), primusrun is ancient and fails with Tumbleweed (Leap seems okay due to older libs). - Bumblebee on Tumbleweed required installing a 3rd party build of the Mesa libs to be able to run Steam with it. - suse-prime, same as ubuntu-prime. - nvidia-xrun - try running an Unity built game with it, no mouse cursor, bare xserver, far too much work needed to get a satisfactory environment up. If you however, use a window manager and start your app from that, it seems okay. But still, you need to switch to a tty etc. * Manjaro - By far the easiest to use of the bunch, uses bumblebee - cuts performance. * Sabayon - Bumblebee problems as above. * primusrun and bumblebee no-longer play nice with Steam due to library problems. See openSUSE above. Seriously, it's a freaking big mess. About two years ago when i first got my laptop (an MSI GS70 Stealth), using bumblebee and primusrun was fine, it worked and it worked okay-ish with performance at about half of what it could be. Then they became unmaintained, slipped behind the amount of changes that happened in the Linux world such as new GCC and Glibs. I'm *lucky!!,* if I get bumblebee and primus working acceptably across all use cases; and that is getting harder (try using Steam with it on a modern/rolling distro). There is no way in hell I'm using Windows to get decent use out of my laptop, that would kill my productivity (I'm a comp-sci & soft-eng student), not to mention that Windows itself is atrocious with its UI (and I can only run W8+on this). Myself, and likely, a very many others, a growing amount, use Linux exclusively and also use it for gaming. This number will definitely continue to grow but only, *only* if things such as driver installation for playing games is painless. Distribution installation and setup itself is relatively painless and likely even easier than Windows, this has improved in leaps and bounds over the last decade. Granted, basic nvidia only installation is as easy as *next, next, next*, it's just Optimus support that is quite entirely lacking. Especially muxless, with external output connected to the nvidia chip. Heck, even output using `intel-virtual-out` relies on bumblebee. Literally the only way this is going to improve is if Nvidia itself improves it. Hackers don't have the knowledge needed, often have to rely on reverse engineering etc. Sorry to reiterate my points from earlier, I felt I hadn't really gotten my points across adequately. I really don't know how to impress upon Nvidia and the lovely folks working on the Nvidia drivers, how important proper PRIME and easy(bumblebee-like) offloading is. There's a huge amount of top-notch gaming laptops out there, and it sucks to be chained to Windows if you need to use the Nvidia gpu for anything requiring decent performance. In short, [b]Nvidia needs to make a promise, a commitment[/b] to supporting Linux [i]well beyond the bare essentials[/i]. It's situations like this that hold Linux back, and there is bugger-all even very well intentioned and skilled hackers can do when Nvidia is the one holding all the cards.
    aplattner said:Sorry I haven't been able to reply earlier. I mostly asked for this thread to keep render offload discussion out of the thread about display offload so people trying to get display offload to work could use that thread.

    Render offload is quite complicated, so I don't want to set any false expectations. It's something we're looking into, but I can't promise anything or comment on it beyond that.


    I came to this thread only after reading the one regarding display offload.
    Don't get me wrong, we really, really appreciate the efforts put in to the drivers for us. But we're not okay with being *un-equal* or *second-class* to Windows in implementation support. Perhaps Windows makes off-loading easier? I don't know, I don't pretend to. But I do know that the Linux situation isn't acceptable.

    Let me attempt to describe the absurdity of this situation to you; I just spent 2.5 days tiddling with a few distros and their implementations (Ubuntu, openSUSE, Manjaro, Sabayon).

    * Ubuntu - Using their own solution which is made up of a few python scripts, a binary program (for gfx detection), some variables in various files in /etc/, and it's a log in/out situation. Basically we'd call it Ubuntu-PRIME.
    - You need to log out then in if you switch between Intel/Nvidia.
    - No vsync using Nvidia (since fixed in git, thanks guys!).
    - No power off for powersaving.
    - Bumblebee breaks Ubuntu-PRIME.
    - No easy offloading or dynamic switching.

    * openSUSE -A somewhat hostile to proprietary drivers distro. DKMS isn't standard and they don't see a need for it (Grrr). There are a few hacked together solutions; bumblebee through 3rd party repo, an [Ubuntu-PRIME](https://build.opensuse.org/project/show/home:bosim:suse-prime) like solution (which I've failed to get working), or using [nvidia-xrun](https://github.com/Witko/nvidia-xrun) which requires using a tty to use the script to start a new xserver using nvidia drivers. None, and I mean *none* of these are satisfactory.
    - Bumblebee as always, bad performance. Unmaintained (mostly), primusrun is ancient and fails with Tumbleweed (Leap seems okay due to older libs).
    - Bumblebee on Tumbleweed required installing a 3rd party build of the Mesa libs to be able to run Steam with it.
    - suse-prime, same as ubuntu-prime.
    - nvidia-xrun - try running an Unity built game with it, no mouse cursor, bare xserver, far too much work needed to get a satisfactory environment up. If you however, use a window manager and start your app from that, it seems okay. But still, you need to switch to a tty etc.

    * Manjaro - By far the easiest to use of the bunch, uses bumblebee - cuts performance.

    * Sabayon - Bumblebee problems as above.

    * primusrun and bumblebee no-longer play nice with Steam due to library problems. See openSUSE above.

    Seriously, it's a freaking big mess. About two years ago when i first got my laptop (an MSI GS70 Stealth), using bumblebee and primusrun was fine, it worked and it worked okay-ish with performance at about half of what it could be. Then they became unmaintained, slipped behind the amount of changes that happened in the Linux world such as new GCC and Glibs. I'm *lucky!!,* if I get bumblebee and primus working acceptably across all use cases; and that is getting harder (try using Steam with it on a modern/rolling distro).

    There is no way in hell I'm using Windows to get decent use out of my laptop, that would kill my productivity (I'm a comp-sci & soft-eng student), not to mention that Windows itself is atrocious with its UI (and I can only run W8+on this).

    Myself, and likely, a very many others, a growing amount, use Linux exclusively and also use it for gaming. This number will definitely continue to grow but only, *only* if things such as driver installation for playing games is painless. Distribution installation and setup itself is relatively painless and likely even easier than Windows, this has improved in leaps and bounds over the last decade. Granted, basic nvidia only installation is as easy as *next, next, next*, it's just Optimus support that is quite entirely lacking. Especially muxless, with external output connected to the nvidia chip. Heck, even output using `intel-virtual-out` relies on bumblebee.

    Literally the only way this is going to improve is if Nvidia itself improves it. Hackers don't have the knowledge needed, often have to rely on reverse engineering etc.

    Sorry to reiterate my points from earlier, I felt I hadn't really gotten my points across adequately. I really don't know how to impress upon Nvidia and the lovely folks working on the Nvidia drivers, how important proper PRIME and easy(bumblebee-like) offloading is. There's a huge amount of top-notch gaming laptops out there, and it sucks to be chained to Windows if you need to use the Nvidia gpu for anything requiring decent performance.

    In short, Nvidia needs to make a promise, a commitment to supporting Linux well beyond the bare essentials. It's situations like this that hold Linux back, and there is bugger-all even very well intentioned and skilled hackers can do when Nvidia is the one holding all the cards.

    #13
    Posted 09/10/2016 06:49 AM   
    There are many, many more people around who are just endlessly frustrated, and have had the same experiences I have; [url=https://www.reddit.com/r/linux_gaming/comments/522103/nvidia_prime_support_is_atrocious/]See here..[/url]
    There are many, many more people around who are just endlessly frustrated, and have had the same experiences I have; See here..

    #14
    Posted 09/11/2016 04:10 AM   
    [quote=""]Thanks for your response @aplattner. It's nice to hear that you guys *are* interested in implementing it. I believe you will be successful! In case it wasn't already obvious, the overhead of bumblebee/virtualgl/primus isn't really acceptable. Intel Card: 36639 frames in 5.0 seconds = 7327.653 FPS NVIDIA Card w/VirtualGL: 19152 frames in 5.0 seconds = 3830.227 FPS NVIDIA CARD w/Primus: 29772 frames in 5.0 seconds = 5954.200 FPS # lspci|grep VGA 00:02.0 VGA compatible controller: Intel Corporation HD Graphics P530 (rev 06) 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M2000M] (rev a2)[/quote] glxgears has never been an accurate benchmark to measure performance with bumblebee. You will see completely different results when you play games.
    said:Thanks for your response @aplattner. It's nice to hear that you guys *are* interested in implementing it. I believe you will be successful!

    In case it wasn't already obvious, the overhead of bumblebee/virtualgl/primus isn't really acceptable.


    Intel Card:
    36639 frames in 5.0 seconds = 7327.653 FPS

    NVIDIA Card w/VirtualGL:
    19152 frames in 5.0 seconds = 3830.227 FPS

    NVIDIA CARD w/Primus:
    29772 frames in 5.0 seconds = 5954.200 FPS


    # lspci|grep VGA
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics P530 (rev 06)
    01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M2000M] (rev a2)


    glxgears has never been an accurate benchmark to measure performance with bumblebee. You will see completely different results when you play games.

    #15
    Posted 09/11/2016 10:07 PM   
    Scroll To Top

    Add Reply