[Bug] Fixed-function pipeline draw calls occasionally fail to fully render on GeForce GTX 1060, causing flickering

I am the primary developer of a game which uses a UI and animation library that utilizes the OpenGL fixed-function pipeline underlyingly. This has functioned without issue for years but when I recently upgraded my development machine to one with a GeForce GTX 1060, previously stable parts of the game began to suffer very intrusive visual flicker.

(I realize the use of the fixed-function pipeline has been deprecated for some time, but the library is deeply integrated into the game’s existing codebase and served its function perfectly for many years until now – and presumably legacy code should not abruptly stop working properly on modern GPUs.)

The crux of the problem is that intermittently (sometimes once every few minutes, but other times as many as several per second), a single frame will be rendered incorrectly in very a characteristic way as seen in this screenshot.

All three images in the screenshot are from the same completely static test scene. There is no moving geometry or reordering of draw calls between frames; every frame should be 100% identical to every other and yet as you can see, they are not. From time to time, parts of the scene are skipped outright, even when draw calls which happen after them are not and even when the parts which are absent from one frame are found between the same pair of glBegin() and glEnd() statements as parts which are properly rendered.

For example, each multi-word text label in that screenshot is stored by the UI library as an array of vertices describing textured quads and drawn with the following straightforward code:

glBegin(GL_QUADS);
for (int i = startChar * 8; i < endChar * 8; i+=2)
{
    glTexCoord2f(texCoord[i], texCoord[i+1]);
    glVertex3f(x + vertCoord[i], y + vertCoord[i+1], z);
}
glEnd();

The contents of these arrays never change and all draw commands in that scene are executed in the exact same order every frame.

This problem persists whether vsync is enabled or not and whether the framerate is capped arbitrarily low or allowed to reach over 1000 fps, and so it cannot be an issue of frame render time. Several programmers together now have failed to find any error in the rendering code which can account for this behavior (although one curious pattern we’ve noticed is that when glyphs fail to render, it is typically in blocks of 8 glyphs at a time (ie. 8 quads), as can we seen in that screenshot; I don’t know what the possible significance of this might be.)

I’ve tested the game’s code on multiple different AMD and Intel GPUs and it performs completely without the aforementioned errors on every one, but exhibits identical symptoms when tested on a GTX 1070 as it does on my own computer, leading me to believe this is actually a driver issue of some kind.

I have a moderately stripped-down Java project which can consistently reproduce this behavior within a few seconds of launching, but it is unfortunately tied to many in-dev game assets that I’d prefer not to post publicly at this time. Is there any way to submit this privately?

(And here’s a system info report from my development machine, in case this is helpful:)

NVIDIA System Information report created on: 10/04/2017 16:27:45
System name: DESKTOP-81KOD46

[Display]
Operating System: Windows 10 Home, 64-bit
DirectX version: 12.0
GPU processor: GeForce GTX 1060 6GB
Driver version: 385.69
Direct3D API version: 12
Direct3D feature level: 12_1
CUDA Cores: 1280
Core clock: 1506 MHz
Memory data rate: 8008 MHz
Memory interface: 192-bit
Memory bandwidth: 192.19 GB/s
Total available graphics memory: 10202 MB
Dedicated video memory: 6144 MB GDDR5
System video memory: 0 MB
Shared system memory: 4058 MB
Video BIOS version: 86.06.0E.00.DE
IRQ: Not used
Bus: PCI Express x16 Gen3
Device Id: 10DE 1C03 872C1043
Part Number: G410 0030

[Components]

nvui.dll 8.17.13.8569 NVIDIA User Experience Driver Component
nvxdplcy.dll 8.17.13.8569 NVIDIA User Experience Driver Component
nvxdbat.dll 8.17.13.8569 NVIDIA User Experience Driver Component
nvxdapix.dll 8.17.13.8569 NVIDIA User Experience Driver Component
NVCPL.DLL 8.17.13.8569 NVIDIA User Experience Driver Component
nvCplUIR.dll 8.1.970.0 NVIDIA Control Panel
nvCplUI.exe 8.1.970.0 NVIDIA Control Panel
nvWSSR.dll 6.14.13.8569 NVIDIA Workstation Server
nvWSS.dll 6.14.13.8569 NVIDIA Workstation Server
nvViTvSR.dll 6.14.13.8569 NVIDIA Video Server
nvViTvS.dll 6.14.13.8569 NVIDIA Video Server
NVSTVIEW.EXE 7.17.13.8569 NVIDIA 3D Vision Photo Viewer
NVSTTEST.EXE 7.17.13.8569 NVIDIA 3D Vision Test Application
NVSTRES.DLL 7.17.13.8569 NVIDIA 3D Vision Module
nvDispSR.dll 6.14.13.8569 NVIDIA Display Server
NVMCTRAY.DLL 8.17.13.8569 NVIDIA Media Center Library
nvDispS.dll 6.14.13.8569 NVIDIA Display Server
PhysX 09.17.0524 NVIDIA PhysX
NVCUDA.DLL 6.14.13.8569 NVIDIA CUDA 9.0.191 driver
nvGameSR.dll 6.14.13.8569 NVIDIA 3D Settings Server
nvGameS.dll 6.14.13.8569 NVIDIA 3D Settings Server

I don’t think it is possible to guess what the error is in your case without serious debugging. Two things you should try:

  1. Query the latest GL error with glGetError(). If you run into an error and simply continue, you will continue with undefined behavior and anything could happen.
  2. Your card supports ARB_pipeline_statistics_query. Surround your draw calls with all relevant queries to see whether the amount of primitives etc. changes between a correct and a faulty frame.

You will often encounter bugs in your GL applications when switching to a new video card or driver if you used functionality outside the GL specification. “Undefined behavior” might as well mean everything works the way you expect it, but if you change cards, it suddenly stops working.