Recompile Question

Hi!

In my application (using OptiX 6.0.0, RTX mode ON, CUDA 10.0 on Win10PRO 64bit, GTX 1050 2GB, driver 419.17) I use a conditional branch for launching a small OptiX kernel running some transformations of the data on the GPU.

The input data of that kernel is the transform matrix of an object from previous animation frame and a vertex buffer (float3) from that frame; the output data has same data type and will be used in a different kernel later on (using the method as described in
optixBuffersOfBuffers SDK sample; zero buffer ids will be recognized).
The output is automatically destroyed (and replaced) when the animation geometry changes again.
This all (geometry replace/destroy/mark dirty) runs without memory leak.
The transformation cannot be done in previous frame, cause only the actual modified objects (but also including those which were not even hit on prev. frame) are needed, which is not clear in the prev. frame.
UPDATE: The transformation can be done in next frame, so the kernel is not anymore required for this case.

Since validation errors occur, if some parts of the context are not completely updated, I do the destroy of the other buffers (normal/tangent/texcoord) of the GeometryTriangles instance after the small kernel has run.

The input and output buffers of that small kernel (program “VertexTransformKernel”) always change and after the kernel finished; it makes no difference, whether their [“variables”] are valid or not. But validation requires me to add this variable update (which works fine):

VertexTransformKernel["currentVertexBuffer"]->set(globalEmptyBuffer1Dvertex);   // globalEmptyBuffer1Dvertex has size 0
VertexTransformKernel["outputVertexBuffer"]->set(globalEmptyBuffer1Dvertex);

Otherwise validation fails. Ok, its only a variable setting, no decleration, but is it really “free” ?

Since any context modifications + variable declarations changes could cause an expensive recompile in the non-RTX “mega kernel” architecture, is this still valid in RTX mode ?

In the documentation http://raytracing-docs.nvidia.com/optix/api/html/group__rt_context_launch.html#ga74b43a03fecf235fcb4e0c9f4d7c8fbd
“[…]If the context has not yet been compiled, or if the context has been modified since the last compile, rtContextLaunch will recompile the kernel internally[…]”

What modifications (other than declaring a variable) cause this recompile?

(For example sometimes the geometry update of a light source sometimes takes much longer than on many other frames, although the light source is updated on each frame) UPDATE: after some modifications (removing QueryVariable) I cannot reproduce this anymore

Would running a CUDA kernel (using device pointers to the optix buffers) instead be a better architectural option for the “VertexTransformKernel”?

Thanks for any advice.

The rtContextSetUsageReportCallback output should tell you if there was a recompile.

Declaring new variables or buffers between launches will still be expensive, though OptiX 6.0.0 uses separate compilation and linking and a shader disk cache. Not sure about buffer reassignments.

Actually it’s even quite expensive to just set variables between launches. There is a scene size dependent overhead in OptiX when doing that.
It’s faster to put all unconditionally set variables between launches into a user defined struct, put that as single element into an input buffer, and just update that buffer between launches!
This is true even when just updating the iteration index between launches in a Monte Carlo renderer.

Detlef, thank you very much for your answer. I’ll try to put all such variables into an input buffer.

I nearly put all the variables in such a buffer and update it only for the first accumulation frame. And also the frame iteration index I put in an additional small buffer (as you said). The result is really great:
On a test scene (1280x720, denoised; SSS cube + MDL sphere + diffuse CUBE + plane + env map) its really saving about 20% of the time after 500 frames are done.

Very much appreciated.