Use D3D11 textures with TextureSampler in Optix 4.1.1

Hi,

I would like to know what is the best way of using D3D11 textures with TextureSampler in Optix 4.1.1 ? It is obvious that Optix + D3D11 is not officially supported anymore. So I use CUDA as a bridge. Is there a way to make it as efficient as possible?

It seems, TextureSampler::setBuffer() doesn’t accept a buffer that was created as RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, it just assert crashes with error code unknown:

_buffer = _context->createBuffer(RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, RT_FORMAT_UNSIGNED_BYTE4, _width, _height);

_buffer->validate(); // no assert crash
_textureSampler->setBuffer(_buffer); // assert crash error unknown.

Therefore it seemed I was not able to share CUDA buffers. Then, I tried to copy from CUDA to optix Buffer.

cudaMalloc((void**) &cuData, 4 * _width * _height);
cudaMemset(cuData, 255, 4 * _width * _height);

// Note here RT_BUFFER_INPUT_OUTPUT also leads to assert crash in setBuffer, it seems not documented.
_buffer = _context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_UNSIGNED_BYTE4, _width, _height);
void* devicePointer = _buffer->getDevicePointer(_context->getOptixDeviceOrdinal());
cudaMemcpy(devicePointer, cuData, 4 * _width * _height, cudaMemcpyDeviceToDevice);
this->_textureBuffer->validate();
_textureSampler->setBuffer(_buffer);

However the result was black. As I set per byte 255, it should have been white.

I also tried RT_BUFFER_INPUT | RT_BUFFER_COPY_ON_DIRTY with MarkDirty. But the result was black as well.

So finally I tried copy from CPU with map and unmap buffers. it works…
But D3D11 → CUDA → CPU → Optix, such a workflow is too slow for me.

Maybe I did something wrong.
Thanks for any suggestions,

Yashiz

Just to be clear. My original purpose was to integrate optix into a game framework (D3D11), but the issues I reported above were CUDA + optix, which were the solutions I tried. Hopefully, this post was not got ignored because of “D3D11 is not supported any more”

My machine:
Windows 7
CUDA 8
Optix 4.1.1
GTX 1080 Ti + Driver 22.21.13.8494

Thanks.

I find my answers in the forum + doc. From my understanding, the issues above are “by design”:

Texture buffers cannot be RT_BUFFER_INPUT_OUTPUT. They must be RT_BUFFER_INPUT. (forum)
RT_BUFFER_INPUT — Only the host may write to the buffer. (doc)
RT_BUFFER_GPU_LOCAL — Can only be used in combination with RT_BUFFER_INPUT_OUTPUT. (doc)

You can register the DX resource as a CUDA buffer, cudaGraphicsD3D11RegisterResource, then use map that resource and pass it to an OptiX buffer using BufferObj::setDevicePointer. I use that to write directly to a DX11 texture that I can blit to the screen. The buffer is created as RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL. I seem to recall reading somewhere that only GPU_LOCAL buffers supported CUDA buffers. The whole register → map → set pointer → unmap → unregister process gets a bit verbose, but it works and is a lot faster than copying over the CPU.

Hi papaboo,

Thank you for the help. It is good to know that register → map → set pointer → unmap → unregister is a lot faster than copying over the CPU.

No problem and good luck.

I’m using it for my backbuffer and in simple scenes I saw a doubling of the FPS compared to when I was copying via the CPU, so there’s definitely performance to be had.

I still have a CPU path though for debugging, just to be on the safe side.

Hi @papaboo
Hi @yashiz

“register → map → set pointer → unmap → unregister”

I tried to implement that with OptiX 5.0.0
but yet it does not work. What do you think I have done wrong?

My guess is that your problem is that you’re not using RT_BUFFER_GPU_LOCAL. That one is needed to work around OptiX multi-GPU support not playing nice with CUDA pointers. I’m still on 4.1, so I don’t know if that is strictly needed for CUDA interop anymore, but at least it’s a place to start.

Yes. It seems the problem is RT_BUFFER_GPU_LOCAL, if you want to share D3D11 buffers between optix with CUDA, then the buffers must be RT_BUFFER_GPU_LOCAL. So for optix sampler buffers, they can not be shared.

thank you both for the answers.

so I need to use RT_BUFFER_GPU_LOCAL, but sampler then cannot be used. instead I can simply use the buffer as any other OptiX buffer.

Do you know a way to get a device pointer from a cudaArray? (when using cudaGraphicsSubResourceGetMappedArray from a ID3D11Texture2D; cannot use cudaGraphicsResourceGetMappedPointer: it returns on that texture2D: 33 = cudaErrorInvalidResourceHandle)

cudaGraphicsResourceGetMappedPointer can only be used for buffers not textures

Here is a confirmation:
[url]CUDA Runtime API :: CUDA Toolkit Documentation

BTW. For samplers, I didn’t share my D3D11 textures but copied them over PCIE, which is OK especially for multiple-GPU setups. Maybe you could also copy data inside GPU to make it faster for single GPU, for example, copy from your cudaArray to a cuda buffer.