Graph Organization of Thousands of Independent Dynamic Objects

Hi,

I am trying to set up a scene with 20,000 quads that are moving independently every frame with the same Lambertian material. Based on the examples in the SDK, I can set up the organization of the node graph in 2 ways:

  1. Create a Geometry with only one primitive. Add the Geometry to a graph consisting of a Geometry Group and Geometry Instance to represent one instance. To get multiple instances that move independently, use a Transform node for each independent object and refer to the graph of geometry information. In this way, every object will be transformed independently, but the geometry information is defined once and reused. Each Transform node must be assigned to one Group node so the graph has a root where traversal can be started.

The graph for this organization looks like this:
Group → [Transform_1 Transform_2 Transform_3 . . . Transform_20000] → Geometry Group → Geometry Instance → Geometry (containing 1 Primitive)

Group and Geometry Group both use an Acceleration structure and I have tried different types of BVHs.

When the geometry moves, I get each Transform and call getMatrix(). I then update the matrix with the movement I want and then call setMatrix. After all Transforms have been updated, the Group Acceleration structure is marked dirty.

This method is very similar to how the teapot instance SDK example works.

  1. Create a Geometry with one primitive for each object instance. Add the Geometry to a graph consisting of a Geometry Group and Geometry Instance. To get each primitive instance to move independently, use variable buffers that represent the parameters to the geometric definition of each primitive. For example, if there are N sphere primitives, a buffer of N radius values and a buffer of N center positions would be needed to represent the geometry. The root of the graph is the Geometry Group where traversal can be started.

The graph for this organization looks like this:
Geometry Group → Geometry Instance → Geometry (containing 20000 Primitives)

When the geometry moves, I update the variable buffers for the quad and the Geometry Group Acceleration structure is marked dirty.

This method is very similar to how the whirlgig SDK example works.

Comparing these 2 ways to organize my graph, I have seen that the performance of 1) with the Transform nodes to be very slow both to build and to draw each frame. The performance of 2) with multiple Primitives in a Geometry and buffers with geometry information to be very fast to build and to draw each frame. In fact the performance difference is close to a factor 100 between the 2 methods.

Why is the performance difference so significant? Shouldn’t they be somewhat similar? Is there something I am doing incorrectly with the way I am using my Transform nodes?

Please let me know if I can provide an additional information to help you answer my questions and understand my situation.

Thanks,
Chris

That’s to be expected.
Node graph traversal during rendering in the individual 20,000 Transforms case will be paramount, while the example with the single Geometry primitive will be fast. 20,000 quads is not a big geometry.
Additionally if you change all Transform matrices per frame that is much more data than 20,000 vertex attributes, a lot more OptiX API calls, plenty of CPU cache misses due to scattered memory accesses, and if you do not provide the inverse matrices as well there will be a considerable amount of matrix inversion calculations on the CPU.

Use Lbvh or Trbvh in the second case for fastest acceleration structure builds.

In the first case use a builder supporting the “refit” acceleration structure property on the root Group and enable that. The graph topology doesn’t change, only the bounding box needs to be adjusted.