Hi again,
First of all, I’m using Optix 5.1 and CUDA 9.1, but also tried Optix 6.0 with a similar test. I took a look to the release notes of the 6.0 and I think there is no much difference in terms of what we are trying to achive.
This is the piece of code I used in my test. It worked fine, but this is the previous step before bringing the TextureSampler into play.
// Declare buffers (note the type is float2 and the second one is RT_BUFFER_INPUT)
bufferOrigin = m_context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT2, 2, 2);
bufferDestination = m_context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT2, 2, 2);
m_context["bufferOrigin"]->set(bufferOrigin);
m_context["bufferDestination"]->set(bufferDestination);
// Populate data in the first buffer
optix::float2* data = (optix::float2*)(bufferOrigin->map());
for (int i = 0; i < 4; i++)
data[i] = make_float2(i,i);
bufferOrigin->unmap();
// First call to an Optix Program (this is important because the data we have just wrote to the buffer
// with the map() / unmap() won't be transfered to de GPU until we call an Optix Progeam for the first time,
// so it is pointless to do a cudaMemcpy yet since the data is not in the GPU)
m_context->launch(m_printProgramIndex, 100, 100);
// Now we get the pointers
cudaSetDevice(m_idGpuCuda);
float2* bufferOriginDevicePtr = (float2*)(bufferOrigin->getDevicePointer(m_idGpuCuda));
float2* bufferDestinationDevicePtr = (float2*)((bufferDestination)->getDevicePointer(m_idGpuCuda));
// dev2dev copy
cudaError_t lastError = cudaMemcpy(bufferDestinationDevicePtr, bufferOriginDevicePtr, sizeof(float2) * 4, cudaMemcpyDeviceToDevice);
// call to the print program (I do it this way since the buffer is RT_BUFFER_INPUT, so it is not possible
// to get the values back to the CPU)
m_context->launch(m_printProgramIndex, 100, 100);
The RT_Program I use to print buffer values from the GPU is this:
rtBuffer<optix::float2, 2> bufferOrigin;
rtBuffer<optix::float2, 2> bufferDestination;
RT_PROGRAM void printProgram() {
rtPrintf("Buffer origin at [1,1] : x = %f , y = %f \n", bufferOrigin[make_uint2(1, 1)].x, bufferOrigin[make_uint2(1, 1)].y);
rtPrintf("Buffer destination at [1,1] : x = %f , y = %f \n", bufferDestination[make_uint2(1, 1)].x, bufferDestination[make_uint2(1, 1)].y);
}
After running the code, console output is this:
Buffer origin at [1,1] : x = 3.000000 , y = 3.000000
Buffer destination at [1,1] : x = 0.000000 , y = 0.000000
Buffer origin at [1,1] : x = 3.000000 , y = 3.000000
Buffer destination at [1,1] : x = 3.000000 , y = 3.000000
Lines 1 and 2 belong to the first call to the RT_PROGRAM, before the cudaMemcpy. The result is the expected, if you look how the data is populated, the destination buffer is 0.0f since it hasn’t been populated with data yet. Lines 3 and 4 are the interesting ones, after the cudaMemcpy. As you can see, the destination buffer holds the correct value now.
Please, tell me if you can reproduce this. As I said, I’m quite limited in time until friday, so I’m sorry if I don’t replay as soon as I wolud like to do it ;) .