Modern GL Interop

I am little bit confused regards the correct way to perform GL/CUDA interop these days as there is a lot of seemingly old info out there.

I see a mention from this:

That “cudaD3D[9|10|11]SetDirect3DDevice/cudaGLSetGlDevice are no longer required”

I started to write something like this:

	uint32_t deviceCount;
	int32_t devices[8];


	if (cudaGLGetDevices(&deviceCount, devices, 8, cudaGLDeviceListAll) != cudaSuccess)
		assert(0);

	if (cudaGLSetGLDevice(devices[0]) != cudaSuccess)
		assert(0);

Which seems to execute without error if it is in the thread with the GL context current.

I have my GL rendering not in the main thread, and also would like to use additional CUDA devices that are not connected to the GL displaying GPU. I have no real idea how to set this up properly or shut it down properly. The samples in the CUDA SDK appear to use out of date methods?

Could someone please explain the correct approach?

There are cuda sample codes that demonstrate proper CUDA/OpenGL interop. You might want to study those.

[url]http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-opengl[/url]

You want to establish the GL context first. Thereafter, a cudaSetDevice() call or the like should associate the CUDA context with the OpenGL context.

I did actually look at those but comments about deprecated functions have left me confused as about what is the modern way to do this?

The simpleGL.cu example calls cudaGLSetGLDevice() but in the headers for cudaGLSetGLDevice() it’s stated in the comments:

  • \brief Sets a CUDA device to use OpenGL interoperability
  • \deprecated This function is deprecated as of CUDA 5.0.
  • This function is deprecated and should no longer be used. It is
  • no longer necessary to associate a CUDA device with an OpenGL
  • context in order to achieve maximum interoperability performance.
  • \param device - Device to use for OpenGL interoperability

So it appears to me the samples are out of date? So I guess that means I should ignore that sample and not call cudaGLSetGLDevice() but call cudaSetDevice() after I have made my GL context current?

Also the samples call cudaDeviceReset() on terminating to ensure profiling data is captured - so I assume this should be called for proper shutdown before I close a GL context? Or if I wanted to strip down and build back up my GL context etc.

Also reading this: CUDA Pro Tip: Always Set the Current Device to Avoid Multithreading Bugs | NVIDIA Technical Blog

In the case where I have a CPU task pool system, I assume if I wanted every thread in my task pool to be able to submit work to CUDA they have to call cudaSetDevice() once each only. Then as long as each worker thread only deals with one CUDA device (which could be the same or different devices), then they no longer need to continue calling cudaSetDevice() each time they issue CUDA commands?

So cudaSetDevice() and cudaDeviceReset() are smart startup/shutdown functions that try and do the right thing depending on which thread or GL context is bound etc?

Also this is the most up to date presentation I can find on CUDA/GL interop so I assume this is the correct way to approach it?

Thanks.