Bug report: WGL calls permanently reduce performance on Quadro and Titan Xp

(Crosspost: https://stackoverflow.com/questions/46819225/wgl-calls-decrease-performance-of-memory-allocations-on-nvidia-systems)

We are experiencing significant performance problems with our C++ application that uses WGL for window creation (and OpenGL for rendering) on some systems with Nvidia graphics cards. As soon as a window is created, or more precisely, as soon as ChoosePixelFormat or SetPixelFormat is called, memory allocations are significantly slower for the entire lifetime of the process. This happens on a multitude of different systems that we have access to, but is most noticeable for the Quadro M6000 12 GB (driver version 385.69) and the Titan Xp (driver version 387.92) on Windows 8.1 Pro 64 bit and Windows 10 Enterprise 64 bit. To a lower degree, this effect is also measurable for GeForce cards and on Windows 7 Professional 64 bit.

I have made a small test application demonstrating the issue. We run an expensive job that involves a lot of memory allocations, then create a window with WGL, call ChoosePixelFormat, delete the window, and then run the same expensive memory allocation again. The memory allocations are timed and output to the console. On all systems we have tested this on (six different cards, three operating systems), the memory allocations were measurably (between 5 % and 20 %) slower after calling ChoosePixelFormat. However, when we put the Quadro or Titan Xp in the system, the memory allocations after ChoosePixelFormat took more than 300 % of the previous time.

Some test results (we swapped cards between systems to cover almost every combination, but we did not keep all benchmarks, but this is a meaningful subset):

Titan Xp, Windows 8.1 Pro 64 bit

Before Window Creation
Testing memory allocation performance 1 / 3: 5.20743 seconds
Testing memory allocation performance 2 / 3: 5.60933 seconds
Testing memory allocation performance 3 / 3: 5.4247 seconds

After Window Creation
Testing memory allocation performance 1 / 3: 18.0398 seconds
Testing memory allocation performance 2 / 3: 17.9902 seconds
Testing memory allocation performance 3 / 3: 17.9052 seconds


GTX 770, Windows 7 Professional 64 bit

Before Window Creation
Testing memory allocation performance 1 / 3: 4.66427 seconds
Testing memory allocation performance 2 / 3: 4.65927 seconds
Testing memory allocation performance 3 / 3: 4.62726 seconds

After Window Creation
Testing memory allocation performance 1 / 3: 5.69533 seconds
Testing memory allocation performance 2 / 3: 5.71333 seconds
Testing memory allocation performance 3 / 3: 5.72833 seconds


GTX 1080, Windows 8.1 Pro 64 bit

Before Window Creation
Testing memory allocation performance 1 / 3: 5.35666 seconds
Testing memory allocation performance 2 / 3: 5.37008 seconds
Testing memory allocation performance 3 / 3: 5.36607 seconds

After Window Creation
Testing memory allocation performance 1 / 3: 5.7112 seconds
Testing memory allocation performance 2 / 3: 5.69939 seconds
Testing memory allocation performance 3 / 3: 5.71902 seconds

I remembered an old trick that I would pull on my Optimus laptop such that self-written applications automatically use the discrete Nvidia GPU: rename the executable to wow.exe. I tried this and, of course, it works. The performance issues are gone, both in our actual application and the test application. On the Titan Xp, the memory allocations after calling ChoosePixelFormat are miraculously even faster than before calling it. I am pretty sure this happens because of special branching in the video driver done for World of Warcraft which circumvents some “features” that make our life hard. So now we could call it a day and ship our software named wow.exe, but at some point, customers might ask what the name stands for and we would have to come up with a clever acronym or rebrand. This is not a permanent solution, but a pretty bizarre debugging result. (“What video card vendors don’t want you to know: increase the performance of your 3D applications with this one simple trick!”)

When debugging the linked test application, you can see that during window creation, one or more threads are spawned by the video driver that live on after window destruction. We suspect that they are somehow involved in all of this. However, we have neither the time nor the budget to investigate this any further.

Apart from naming our software wow.exe, downgrading to GTX 1080s, or waiting for Nvidia to answer our support queries (reference# 171017-000005) - which they have not done once in four years -, what options do we have now? Has anyone ever encountered and successfully overcome this issue? Is there any way to circumvent ChoosePixelFormat and SetPixelFormat when creating a window and OpenGL (4.5 core) context with WGL on Windows 7/8.1/10? (We also tried setting up the window with GLFW, but I’m pretty sure this calls WGL under the hood - the results were the same.) Also, we would be highly interested in any comment from someone with a Maxwell or Pascal Quadro card or Titan Xp who does not experience these issues. Maybe a small change in our setup could do the trick.

PS: Although this issue sounds a bit bizarre, we are not the first ones to encounter it. It has been described before, but apparently, no one at Nvidia found the time to reply or tackle the issue.

Yeh Nvidia are notoriously bad with actual OpenGL Support Support (actually answering tickets and queries).

mOfl, Thanks for reporting this issue with repro steps. There is an internal bug tracking this issue now.