Out of memory error thrown by the driver instead of OpenGL

We have an application that couples a CUDA simulation with geospatial rendering of the results in OpenGL. Due to the size of the data, we often reach the video memory limit, which is 2 GB for the GTX 680 cards we currently use. The major problem we have with this is that we have almost no possibility to react to this situation.

The system will run out of memory when resizing or creating a texture or buffer object, which we do fairly frequently. When running out of video memory, one of four things will happen:

  1. The OpenGL call reports GL_OUT_OF_MEMORY which we receive synchronously through the GL_KHR_debug callback. In this case, we can throw an exception and handle the situation in a satisfying way.
  2. The driver pops a message: “Request for more GPU memory than is available”, the application will crash hard without any way to safely exit with pending changes etc.
  3. Nothing happens at first, the allocation fails silently and the context dies. Next time any object of the context is used, arbitrary OpenGL error messages are reported, such as framebuffer incomplete, texture has size 0 etc.
  4. The application freezes at the next OpenGL synchronization point such as glFinish or glMapBuffer.

The very, very bad thing is that option 1 will almost never occur. Unfortunately, the events here are sorted by increasing probability, with options 2 and 4 being a hard crash and an absolute no-go for an application.

Why is the handling of out of memory errors so inconsistent with the Nvidia driver? The driver must not overrule the OpenGL error reporting to instantly kill the entire application. And, well, considering events 3 and 4, the GL_OUT_OF_MEMORY error should be thrown at some point. What good is the error reporting callback if it dies with/before the context?

We use recent drivers (361.75) on Windows 8.1 Professional. The OpenGL context we request is 4.4 Core Profile, if that makes any difference.

Some of these errors are not under the display drivers control but are coming from inside the operating system itself. For example if a single rendering command sent to the OS uses more resources than the OS can allocate for it inside the kernel(!) mode driver at this time, there is nothing to do about some of the consequences the OS takes.
That can be failing silently, returning a fatal error which forces the OpenGL driver to abort (your case 2), or maybe just shut down the driver.

That’s why you get different kinds of error messages. The OpenGL driver is able to catch out of memory cases in the user mode part, though on the kernel mode side if there is an out of memory condition the OS detects, OS counter measures against system failures get more drastic.

There’s little to do about the fact that your applied workload exceeds the capabilities of your chosen hardware, other than trying to put less burden on the graphics board memory by doing smaller things more often or use one or more workstation class boards with more VRAM which can handle the CUDA simulation and geospatial rendering you require.

Thank you for your response! It makes sense that the driver or even OpenGL is not in full control of the OS and how it will handle failure cases. It does not make life easier for a developer, though. Could you maybe shed a bit light on why the out-of-memory error happens in the first place in an OpenGL application, given that not everything is needed at once? The driver is obviously able to swap data in the background, since we are constantly using more memory than the physical memory limit of the GPU, but never at the same time. This decreases performance, of course, but it works very conveniently. Except sometimes it does not, although there might be 1 GB of OpenGL textures marked as non-resident at the moment and you only want to resize a buffer object to 20 MB. From the outside it appears quite non-deterministic when stuff is just swapped and when the system is out of memory. Does this have to do with fragmentation of the VRAM?

The obvious workaround to all of this is to upgrade to Maxwell cards with significantly more VRAM, but academic research is not where the money grows …

I’ve explained case 2. All other cases really depend on the other things which happen on the VRAM on a case by case basis. For example it’s unclear how much memory your CUDA simulation might block at the same time, etc. (You can let nvidia-smi.exe dump memory statistics while running your app to analyze that.)

Yes, there can be fragmentation. The GPU can also address PCI-E memory to some amount and overall workloads can be bigger than the installed VRAM, and yes there is also the possibility to swap to make things resident.

But in the end bigger boards which fit the application requirements are the only viable solution if you ever get an abort message with “Request for more GPU memory than is available” from the drivers.

Ok, thank you very much for your input! I will check out nvidia-smi, this sounds intriguing.