Using GL buffers from a second render thread

I’m currently moving my WIP renderer to start the render on a second thread. All the scene management stuff is done on the main thread, including buffer creation and I simply start a loop on the second thread which repeatedly calls rtContextLaunch2D() to perform rendering in the background.

This works fine when I’ve created the output buffer on the main thread with rtCreateBuffer(). However, when I do this using a GL buffer for output that I’ve created on the main thread and shared with OptiX using rtBufferCreateFromGLBO the launch fails with:

rtContextLaunch2D(Context(0x22c05d0), 0, (128 128)): Invalid value (Details: Function "RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)" caught exception: Encountered a CUDA error: cudaDriver().CuGraphicsGLRegisterBuffer( &result, buffer, flags ) returned (1): Invalid value)

I’m not surprised this doesn’t work, but I’m not sure how I would go about fixing it?

Just re-reading the docs, and the caveats:

Currently, the OptiX host API is not guaranteed to be thread-safe. While it may be successful in some applications to use OptiX contexts in different host threads, it may fail in others. OptiX should therefore only be used from within a single host thread.

Does that mean calling optix from multiple threads might not work even if I’m not calling optix functions concurrently?

If you serialized the OptiX calls I would expect this to work. It’s just that calling arbitrary OptiX functions, for example, to fill the scene graph with objects isn’t thread safe, yet.

What’s probably not working in your case is that either the OpenGL context isn’t current in the thread which registers the image or you do not have multiple shared OpenGL contexts among the threads which would make the texture object visible in the current thread with the second OpenGL context when doing the register image call.

If you serialized the OptiX calls, I would try to make the OpenGL context current in the second thread around any createBufferFromGLBO() which does the initial register image call and any explict unregister/register pair.

Thanks Detlef! After doing a bit more reading I got this to work by creating a second context on the render thread sharing resources with the main thread’s context (where the buffers are created).

Do you have any idea of what the performance implications of this are? OpenGL’s pipeline is a bit of a mystery a lot of the time. Imagine in the main thread I have a window that’s just drawing a fullscreen quad which is textured by my shared buffer to display the render. Then in the second, render thread I’m just looping rtContextLaunch2D to progressively refine a render which is rendered directly into the shared buffer. Is there an optimal setup for this kind of workflow, or other considerations I need to make?

The main functionality of sharing OpenGL contexts is to make the object lookup tables unique to be able to access the same objects in multiple contexts. Having multiple OpenGL contexts shared in different threads comes with some locking overhead inside the OpenGL driver. The potential performance impact of that depends on the use case.
It’s a common technique to use one thread for asynchronous resource management, e.g. loading textures on the fly, and a main rendering thread using these resources. That needs some care to make sure things currently used for rendering aren’t touched in the resource handling thread.

For your use case I wouldn’t expect much of an overhead. You would need to be careful to wait for the renderer to have finished displaying the last texture and some critical section around the texture update function to block the renderer from accessing the texture while you’re uploading it.

Now, with all that said, what you’re doing is only going to work on single GPU systems.

When setting up an OptiX context to use more than one device, the final output buffers are not on the device but in pinned host memory. Means there is no way to use an OpenGL PBO to do OpenGL interop on that buffer.

An architecture which would support completely asynchronous ray tracing with OptiX would need to handle that case differently. Like having a ray tracing thread which does all OptiX handling and a rasterizer (main) thread which cares about all display with OpenGL in the app.
Then you would just need to have a critical section in which you copy the OptiX result buffer to some shared location (on the host) and signal to the display thread that there is a new image to be uploaded which then locks that memory during the texture upload. And so forth.

That’s obviously going to be slower than real OpenGL interop, but, for example, in a Windows system with one graphics board in WDDM mode for display and one or preferably more in TCC mode for ray tracing (TCC can’t do OpenGL interop either), that would be optimal.
The GUI would be full speed and the ray tracing wouldn’t be limited by Windows WDDM timeouts (TDR) when doing arbitrary complex ray tracing algorithms.

Also with NVLINK capable boards, the TCC mode will allow to use peer-to-peer access across the NVLINK bridges in OptiX automatically which increases your possible scene size.

This is more of a rendering than real-time view on the problem.

Thanks Detlef that’s very helpful. Sounds like I will want to support both those paths.