hello
I am writing a continuous collision detection algorithm, based on the
Optix ray tracer. Each particle and each vertex of the collider has a
start point, constant velocity over a time step dt. The collider is a
triangular mesh, so I want to detect the barycentric coordinates and the
moment of collision between the segment (path of particle) and the
moving triangle.
The algorithm is simple. I have a ray generator program that works
pretty much like a regular ray tracer.
The result buffer format is float4, containing the moment of collision,
the triangle id and the barycentric coords alpha beta.
the bounding box program considers both the start and end points of
vertices for each triangle, so the resulting AABB is much bigger than
the static case.
for simplicity in this thread, i will assume the triangles are
translating only. subtracting this movement vector from the particle
path, we can perform a simple segment X triangle intersection.
the results so far seem ok, but the performance is below the expected.
I also have a CPU based collider, which performs closest point detection
and uses binary tree to sort the triangles. It has several optimization
tricks that i dont know if they are implemented in Optix, but i believe
they are. For example, shortcuts between leaves and the next tree node…
I would like to know if this result seems correct or is there something
silly that I am missing out here…?
the collider has about 10k triangles and 5k vertices, and we tested with
20k and 200k particles.
with 20k, the CPU collider runs at 3 fps. The Optix collider, at 1fps in
avg.
with 200k, the fps is too small but we did not notice any difference in
the performance ratio.
maybe the scene is still too small? even so, shouldnt i get a little
better performance?
A few other things I am doing:
Sorting the particles to improve ray coherency.
Setting the “use_fast_math” option.
Setting the tracer to “refit” only.
Using Trbvh acceleration.
Disabling all exceptions and prints. (I have checked that there are no
exceptions being thrown)
Compacting the positions in a float3 buffer. Changing everything to
float4 buffers is a little bothersome, but still i dont think it would
make much of a difference… or would?
is there anything else I could obviously try but i am missing out here?
thank you.