[SOLVED] 2nd __constant__buffer in OptiX 7?

I’m currently going through the OptiX 7 SDK source code and docs preparing for upgrading my application (based on Detlef’s Introduction samples).
(For implementations I of course wait until the MDL- and Denoiser- samples and Detlef’s port of optixIntro_07 sample are present).

In my understanding “optixLauch” can update the “params” constant variable (or other custom name defined by OptixPipelineCompileOptions::pipelineLaunchParamsVariableName).
And if passing NULL to pipelineParams and 0 to pipelineParamsSize it simply lets the constant buffer unaffected, keeping the values from last time a launch specified the parameters and a valid buffer was present; right?

In my current app I use 2 optix::Buffer objects:
One very small one only for the frame id update during pathtracer accumumlation. And a bigger one, which holds a lot of
global data (this one is only updated on accumulation frame 0).

Is there a way in OptiX 7 to define a second constant buffer variable (instead of merging the two buffers) ?
Defining the second buffer as an offset in the small one would cause a lot of overhead on each variable access of its contents, cause
it would only be present as device pointer instead of the actual content; and having all data in that one buffer would waste some upload time, cause during accumulation frames that data never will change)

So my question: Can I access another device buffer outside the global “params” struct ?

Thank you.

No, not those parameters directly, the memory is constant after all, but it can write to buffers pointed to by CUdeviceptr inside that parameter block.
See here: https://devtalk.nvidia.com/default/topic/1063868/optix/pick-ray-in-optix7/post/5390375

Think of this parameter block as all variables in the OptiX context global scope of previous versions.
Note that this parameter block is also just a CUdeviceptr and you need to change the contents in there from the host with, for example, cuMemcpyHtoD(Async) when using the CUDA driver API.

That’s not the right approach.
The original issue, where changing variables instead of buffer contents between launches was slow, doesn’t exist in OptiX 7 at all.
Every input and output in device memory is accessed via CUdeviceptr in OptiX 7 anyway and these buffers are managed explicitly by the developer.
Means, you don’t need to have the frame ID in a separate buffer because that constant parameter block already is a buffer, and the accumulation buffer needs to be a CUdeviceptr inside that parameter block.
All your other global values need to go into that parameter block as well. (Mind the CUDA alignment rules.)

The code below explains how that would look like and how updating only the frame ID per launch works on the host:

struct SystemData
{
  // 8-byte alignment
  OptixTraversableHandle topObject;

  // Using a CUdeviceptr here to allow for different buffer formats without too many casts on host side.
  CUdeviceptr outputBuffer;
  int2        resolution; // The rendering resolution and outputBuffer dimension.

  // 4-byte alignment 
  unsigned int iterationIndex;
};

// On device: 
extern "C" __constant__ SystemData sysData;

extern "C" __global__ void __raygen__tracer()  
{
  const uint2 theLaunchDim   = make_uint2(optixGetLaunchDimensions()); 
  const uint2 theLaunchIndex = make_uint2(optixGetLaunchIndex());
  ...
  // Example, assuming sysData.resolution == theLaunchDim.
  const unsigned int index = theLaunchDim.x * theLaunchIndex.y + theLaunchIndex.x;

  float4* buffer = reinterpret_cast<float4*>(sysData.outputBuffer);
  buffer[index] = make_float4(radiance, 1.0f);
}

  // On the host in your progressive rendering loop a single iteration looks like this:
  ...
  // Note that this copy to one element of the parameter block requires a sync or different source pointers per iteration if the rendering is fully asynchronous!
  CU_CHECK( cuStreamSynchronize(m_cudaStream) );

  // Only updating the iterationIndex field every frame if nothing else changed.
  CU_CHECK( cuMemcpyHtoDAsync(reinterpret_cast<CUdeviceptr>(&m_d_systemData->iterationIndex), &m_systemData.iterationIndex, sizeof(unsigned int), m_cudaStream) );

  OPTIX_CHECK( m_api.optixLaunch(m_pipeline, m_cudaStream, reinterpret_cast<CUdeviceptr>(m_d_systemData), sizeof(SystemData), &m_sbt, m_systemData.resolution.x, m_systemData.resolution.y, /* depth */ 1) );

...

Thank you very much, Detlef !

That makes sense now. I did not know about the option to be able to access portions of that block in an OptiX 7 kernel.