Use D3D11 textures with TextureSampler in Optix 4.1.1
Hi, I would like to know what is the best way of using D3D11 textures with TextureSampler in Optix 4.1.1 ? It is obvious that Optix + D3D11 is not officially supported anymore. So I use CUDA as a bridge. Is there a way to make it as efficient as possible? It seems, TextureSampler::setBuffer() doesn't accept a buffer that was created as RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, it just assert crashes with error code unknown: _buffer = _context->createBuffer(RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, RT_FORMAT_UNSIGNED_BYTE4, _width, _height); _buffer->validate(); // no assert crash _textureSampler->setBuffer(_buffer); // assert crash error unknown. Therefore it seemed I was not able to share CUDA buffers. Then, I tried to copy from CUDA to optix Buffer. cudaMalloc((void**) &cuData, 4 * _width * _height); cudaMemset(cuData, 255, 4 * _width * _height); // Note here RT_BUFFER_INPUT_OUTPUT also leads to assert crash in setBuffer, it seems not documented. _buffer = _context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_UNSIGNED_BYTE4, _width, _height); void* devicePointer = _buffer->getDevicePointer(_context->getOptixDeviceOrdinal()); cudaMemcpy(devicePointer, cuData, 4 * _width * _height, cudaMemcpyDeviceToDevice); this->_textureBuffer->validate(); _textureSampler->setBuffer(_buffer); However the result was black. As I set per byte 255, it should have been white. I also tried RT_BUFFER_INPUT | RT_BUFFER_COPY_ON_DIRTY with MarkDirty. But the result was black as well. So finally I tried copy from CPU with map and unmap buffers. it works.... But D3D11 -> CUDA -> CPU -> Optix, such a workflow is too slow for me. Maybe I did something wrong. Thanks for any suggestions, Yashiz
Hi,

I would like to know what is the best way of using D3D11 textures with TextureSampler in Optix 4.1.1 ? It is obvious that Optix + D3D11 is not officially supported anymore. So I use CUDA as a bridge. Is there a way to make it as efficient as possible?

It seems, TextureSampler::setBuffer() doesn't accept a buffer that was created as RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, it just assert crashes with error code unknown:

_buffer = _context->createBuffer(RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, RT_FORMAT_UNSIGNED_BYTE4, _width, _height);

_buffer->validate(); // no assert crash
_textureSampler->setBuffer(_buffer); // assert crash error unknown.

Therefore it seemed I was not able to share CUDA buffers. Then, I tried to copy from CUDA to optix Buffer.

cudaMalloc((void**) &cuData, 4 * _width * _height);
cudaMemset(cuData, 255, 4 * _width * _height);

// Note here RT_BUFFER_INPUT_OUTPUT also leads to assert crash in setBuffer, it seems not documented.
_buffer = _context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_UNSIGNED_BYTE4, _width, _height);
void* devicePointer = _buffer->getDevicePointer(_context->getOptixDeviceOrdinal());
cudaMemcpy(devicePointer, cuData, 4 * _width * _height, cudaMemcpyDeviceToDevice);
this->_textureBuffer->validate();
_textureSampler->setBuffer(_buffer);

However the result was black. As I set per byte 255, it should have been white.

I also tried RT_BUFFER_INPUT | RT_BUFFER_COPY_ON_DIRTY with MarkDirty. But the result was black as well.

So finally I tried copy from CPU with map and unmap buffers. it works....
But D3D11 -> CUDA -> CPU -> Optix, such a workflow is too slow for me.

Maybe I did something wrong.
Thanks for any suggestions,

Yashiz

#1
Posted 10/20/2017 02:37 PM   
Just to be clear. My original purpose was to integrate optix into a game framework (D3D11), but the issues I reported above were CUDA + optix, which were the solutions I tried. Hopefully, this post was not got ignored because of "D3D11 is not supported any more" My machine: Windows 7 CUDA 8 Optix 4.1.1 GTX 1080 Ti + Driver 22.21.13.8494 Thanks.
Just to be clear. My original purpose was to integrate optix into a game framework (D3D11), but the issues I reported above were CUDA + optix, which were the solutions I tried. Hopefully, this post was not got ignored because of "D3D11 is not supported any more"

My machine:
Windows 7
CUDA 8
Optix 4.1.1
GTX 1080 Ti + Driver 22.21.13.8494

Thanks.

#2
Posted 10/23/2017 07:44 AM   
I find my answers in the forum + doc. From my understanding, the issues above are "by design": Texture buffers cannot be RT_BUFFER_INPUT_OUTPUT. They must be RT_BUFFER_INPUT. (forum) RT_BUFFER_INPUT — Only the host may write to the buffer. (doc) RT_BUFFER_GPU_LOCAL — Can only be used in combination with RT_BUFFER_INPUT_OUTPUT. (doc)
I find my answers in the forum + doc. From my understanding, the issues above are "by design":

Texture buffers cannot be RT_BUFFER_INPUT_OUTPUT. They must be RT_BUFFER_INPUT. (forum)
RT_BUFFER_INPUT — Only the host may write to the buffer. (doc)
RT_BUFFER_GPU_LOCAL — Can only be used in combination with RT_BUFFER_INPUT_OUTPUT. (doc)

#3
Posted 10/26/2017 09:57 AM   
You can register the DX resource as a CUDA buffer, cudaGraphicsD3D11RegisterResource, then use map that resource and pass it to an OptiX buffer using BufferObj::setDevicePointer. I use that to write directly to a DX11 texture that I can blit to the screen. The buffer is created as RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL. I seem to recall reading somewhere that only GPU_LOCAL buffers supported CUDA buffers. The whole register -> map -> set pointer -> unmap -> unregister process gets a bit verbose, but it works and is a lot faster than copying over the CPU.
You can register the DX resource as a CUDA buffer, cudaGraphicsD3D11RegisterResource, then use map that resource and pass it to an OptiX buffer using BufferObj::setDevicePointer. I use that to write directly to a DX11 texture that I can blit to the screen. The buffer is created as RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL. I seem to recall reading somewhere that only GPU_LOCAL buffers supported CUDA buffers. The whole register -> map -> set pointer -> unmap -> unregister process gets a bit verbose, but it works and is a lot faster than copying over the CPU.

#4
Posted 10/28/2017 08:12 PM   
Hi papaboo, Thank you for the help. It is good to know that register -> map -> set pointer -> unmap -> unregister is a lot faster than copying over the CPU.
Hi papaboo,

Thank you for the help. It is good to know that register -> map -> set pointer -> unmap -> unregister is a lot faster than copying over the CPU.

#5
Posted 10/30/2017 08:43 AM   
No problem and good luck. I'm using it for my backbuffer and in simple scenes I saw a doubling of the FPS compared to when I was copying via the CPU, so there's definitely performance to be had. I still have a CPU path though for debugging, just to be on the safe side.
No problem and good luck.

I'm using it for my backbuffer and in simple scenes I saw a doubling of the FPS compared to when I was copying via the CPU, so there's definitely performance to be had.

I still have a CPU path though for debugging, just to be on the safe side.

#6
Posted 10/30/2017 08:58 PM   
Scroll To Top

Add Reply