Error when running CUDA Interop

Hello,
I got a run time error in my CUDA Optix interop program:
" Invalid context (Details: Function “RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)” caught exception: Cannot map CUDA interop buffers, [14614601])"

In my program, I first generate the vertex and normal from CUDA. Then, I give their device pointers to the Optix program for rendering. Below is part of my code that maps the CUDA arrays to Optix buffers.

m_context = Context::create();

int d;
cudaGetDevice(&d);

outputBuffer = m_context->createBufferForCUDA(RT_BUFFER_OUTPUT, RT_FORMAT_UNSIGNED_BYTE4, width, height);
buffer->setDevicePointer(d, reinterpret_cast<CUdeviceptr>(d_output));
m_context["output_buffer"]->set(outputBuffer);

Buffer vbuffer = m_context->createBufferForCUDA( RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, num_vertices );
vbuffer->setDevicePointer(d, reinterpret_cast<CUdeviceptr>(d_vertex));

Buffer nbuffer = m_context->createBufferForCUDA(RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, num_vertices );
vbuffer->setDevicePointer(d, reinterpret_cast<CUdeviceptr>(d_normal));

mesh[ "vertex_buffer" ]->setBuffer( vbuffer );
mesh[ "normal_buffer" ]->setBuffer( nbuffer );

Can anyone give some clue about this error message?
Thanks!

Please always add the exact system configuration when asking about errors:
OS version, OS bitness, display driver version, installed GPU(s), OptiX version, CUDA Toolkit version, host compiler version and edition.

You created an OptiX context and didn’t specify the devices to be used. If you have multiple CUDA devices installed that might not work the way you expect. (OptiX 3.8.0 Programming Guide chapter 7)

Is the cudaGetDevice() result matching the device your OptiX context is running on?
Mind that the first setDevicePointer() argument is the OptiX device number, which is not necessarily identical to the CUDA device number on a multi-GPU system.
Please have a look into the CollisionOptiX.cpp example code and look for the function named GetOptixDeviceOrdinal.

I would be careful with the name “vertex_buffer”. Some acceleration structure builders have special cases which know how to handle triangle data and they use “vertex_buffer” and other buffer names you’ll find inside the OptiX Programming Guide as default names, which might interact slower when having data on the device already.

Thank you for the reply!

Below is my system configuration:
LinuxMint 17.1 (rebecca) 64bit.
NVIDIA driver version 346.46.
Nvidia GeForce GTX 760
OptiX 3.8.0
CUDA 7.0
C++ compiler: gcc 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

I guess incorrect GPU device is not the issue, because I only have one GPU. cudaGetDevice() returns 0. The GetOptixDeviceOrdinal function also returns 0 and only detects one OptiX device.

I was able to run the program and show the correct image successfully by loading the “vertex” and “normal” array from the host memory without running CUDA. So I guess something was wrong with my interop.

It would be great if you could provide further suggestions. Thanks!

Thanks for the additional information.

Which acceleration structure (AS) builder did you use?
If Sbvh, Trbvh or TriangleKdTree, please try Bvh.

These are what I used for acceleration structure: builder(“Trbvh”), traverser(“Bvh”)
I tried to use Bvh for both builder and traverser, but it still gives the error :(

I guess the acceleration structure is correct, because I previously runs the program successfully without using CUDA OptiX interop.

My last question was meant to isolate if there is any issue with the “vertex_buffer” name for builders which have optimizations for triangle meshes as mentioned in the last paragraph of comment #2. See OptiX 3.8.0 Programming Guide chapter 3.5.3 Table 4.

Since it also doesn’t work with a builder which doesn’t do that and there is no OptiX SDK example which does the same, it would be necessary to have a minimal reproducer in failing state for further analysis. You could attach it here or, if confidential information is involved, send it to the OptiX team via the help e-mail you’ll find inside the OptiX Release Notes.

Thanks a lot for your help and patience. I finally get the interop working.
It turns out that I attached the device pointer to the output buffer before the device memory is allocated.
Also, there is a typo in the code I posted:

In

Buffer nbuffer = m_context->createBufferForCUDA(RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, num_vertices );
vbuffer->setDevicePointer(d, reinterpret_cast<CUdeviceptr>(d_normal));

, the vbuffer should be nbuffer.