Optix 4 and CUDA interop, problems switching from Optix 3.8

I have some problems getting the device pointer of an OptiX buffer after updating from OptiX 3.8 to OptiX 4.0.

To create the buffer, this method is used:

void addOutputBufferFormatUser(optix::Context context,
                               std::string name,
                               unsigned int width,
                               unsigned int height)
{
    Buffer buffer;
    GLuint vbo = 0;

    glGenBuffers(1, &vbo);

    glBindBuffer(GL_ARRAY_BUFFER, vbo);
    size_t element_size = sizeof(PointXYZADIJK);

    glBufferData(GL_ARRAY_BUFFER, element_size * width * height, 0, GL_STREAM_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);

    buffer = context->createBufferFromGLBO(RT_BUFFER_OUTPUT, vbo);

    buffer->setFormat(RT_FORMAT_USER);
    buffer->setElementSize(element_size);
    buffer->setSize(width, height);

    m_context[name]->setBuffer(buffer);
    notify(WARN ,"addOutputBufferFormatUser() created buffer with name %s.", name.c_str());
}

Called by this line:

addOutputBufferFormatUser(m_context, "output_buffer_xyzaijk", m_width, m_height);

Now I would like to transfer the buffer to CUDA by calling:

void
postTrace() const
{
    void* pointerToBuffer = NULL;
    Buffer buffer = m_optiXInterface->getBuffer("output_buffer_xyzaijk");

    if (buffer->get() == NULL)
    {
        notify(FATAL ,"postTrace() buffer null.");
    }

    buffer->getDevicePointer(0, &pointerToBuffer);

    checkCudaErrors(cudaMemcpy(d_rayArray, (PointXYZADIJK*)pointerToBuffer, sizeof(PointXYZADIJK), cudaMemcpyDeviceToDevice));

    // Unmap buffer:
    rtBufferUnmap(buffer->get());
}

Or my older code:

void
postTrace() const
{
    RTresult res = RT_SUCCESS;

    void* pointerToBuffer = NULL;
    Buffer buffer = m_optiXInterface->getBuffer("output_buffer_xyzaijk");

    if (buffer->get() == NULL)
    {
        notify(FATAL ,"postTrace() buffer null.");
    }

    // Read OptiX buffer:
    res = rtBufferGetDevicePointer(buffer->get(), 0, &pointerToBuffer);

    if (res == 0)
    {
        checkCudaErrors(cudaMemcpy(d_rayArray, (PointXYZADIJK*)pointerToBuffer, sizeof(PointXYZADIJK), cudaMemcpyDeviceToDevice));
        notify(WARN ,"postTrace() could copy buffer to cuda.");
    }
    else
    {
        notify(WARN ,"postTrace() could not copy buffer to cuda: result %u.", res);
    }

    // Unmap buffer:
    rtBufferUnmap(buffer->get());
}

Both functions return this error:
display <Invalid value (Details: Function “RTresult _rtBufferGetDevicePointer(RTbuffer, int, void**)” caught exception: Cannot get device pointers from non-CUDA interop buffers.

As mentioned earlier, OptiX 3.8 works perfectly fine. I read about OptiX 4.0 being more strict with interop data but the error message does not ring a bell for me. Any help is highly appreciated!

When you say OptiX 4.0, have you tried that with the most recent version OptiX 4.1.1 as well?

Please provide these system information when reporting OptiX issues to reduce turnaround times:
OS version, installed GPU(s) and amount of VRAM, display driver version, OptiX version (major.minor.micro), CUDA toolkit used to generate the PTX device code for you OptiX application.

Hi and tank you for your response.

I did not test with 4.1.1 yet and will try it now. Do I understand correctly that you don’t see any particular reason why I get this error and this might be just a bug in the OptiX/CUDA SDK I’m using?

System Information:
Problem spotted on Linux x64 openSuse 42.2, kubuntu 14.04.03
Driver 375.39
GPU Nvidia M5000M, 8GB VRAM
OptiX SDK 4.0.2 linux 64
CUDA V8.0.44

If this worked in OptiX 3.8 and not in OptiX 4.0.2 then testing the newest OptiX version is the first step to isolate if this might have been an error in OptiX 4.0.2 which was potentially fixed in a later version. If yes, we’re done. If not, we’d need a complete reproducer in failing state to analyze further.

Hi,
I loaded the 4.1.1 libs into my program and the error stayed the same.

To be more precise:
CUDA V7.0.27, OptiX 3.8.0, 375.39 - Works
CUDA V8.0.44, OptiX 4.0.2, 375.39 - Doesn’t work
CUDA V8.0.44, OptiX 4.1.1, 375.39 - Doesn’t work
CUDA V9.0.176,OptiX 4.1.1, 384.81 - Doesn’t work

Thank you for your support with this issue since I unfortunately cannot see the problem.

Ok, thanks for testing.

4.1.1 is definitely the favorable version over 4.0.2 from performance and fixes point of view.
CUDA 9.0 is not officially supported by OptiX 4 versions.

Does it work when not using OpenGL interop to create the buffer?
That wouldn’t be needed if only OptiX and CUDA are working on that buffer.

What’s the rtBufferUnmap() paired with?

Would you be able to provide a minimal standalone reproducer for this issue?
I’m not using Linux myself, so I would need to file a bug report for someone in the team to investigate.

Another possible way to provide a reproducer to us without the need for the whole application would be an OptiX API Capture. The last post in this thread contains an explanation how to generate one:
[url]https://devtalk.nvidia.com/default/topic/803116/?comment=4436953[/url]

We’d need the whole oac00000 folder as archive. ZIP archive extensions would need to be renamed or our e-mail server would block it, which also happens when attachments are bigger than 10 MB.
I can setup a temporary FTP account for exchange of any reproducer files and send it to your e-mail address under which you registered here when needed.

I was able to reproduce this with a minimal unit test. There is no need for a reproducer anymore.

OptiX 4.0.0 and beyond do not allow to get device pointers from graphics interop buffers and that error message is to be expected.
I’ll ask around for the background of this change.

Thank you very much!
I edited the code to create an OptiX buffer without the VBO and it works. The reason why the buffer was created from OpenGL is that it’s then used in an OpenGL post-processing pipeline.

So if there is a way to get device pointers from graphics interop buffers as it worked previously, it would be really valuable to us.

Many thanks!

The new behavior is correct and that it worked before was a bug and pure chance.

The problem is that the OpenGL-CUDA (OptiX) interop buffer is mapped and unmapped around an rtContextLaunch, which means after the unmap, there isn’t actually a valid device pointer behind that OpenGL-OptiX interop buffer on OptiX side which could be returned by rtBufferGetDevicePointer().

See comments inside the CUDA Driver API manual [url]CUDA Driver API :: CUDA Toolkit Documentation

cuGraphicsMapResources:
"The resources in resources may be accessed by CUDA until they are unmapped. The graphics API from which resources were registered should not access any resources while they are mapped by CUDA. If an application does so, the results are undefined."

cuGraphicsUnmapResources:
Once unmapped, the resources in resources may not be accessed by CUDA until they are mapped again.

You could possibly use OpenGL-CUDA interop directly around your cudaMemcpy().
[url]Programming Guide :: CUDA Toolkit Documentation