Leveraging RTX hardware capabilities with OptiX 7.0

Hello everyone!

I’m new to OptiX in special and GPU programming in general, so below question might appear trivial to some ;)

Throughout one of my courses at university, I’m currently working on a direct volume rendering similar to what Wald et al. mentioned in their “RTX Beyond Ray Tracing […]” Paper (see (1)). Due to limited access to RTX capable hardware, I’m developing on a “non-RTX” graphics card (GeForce GTX 960, driver version 436.30, CUDA 10.1, OptiX 7.0). Evaluation will later take place on a RTX capable device (the exact card is still to be determined).

For clarification: Are there any explicit steps to be taken to ensure my code makes use of RTX hardware capabilities during evaluation? To my understanding of both the docs and some discussions on this board (e.g. (2)), all I have to do is to stick to the OptiX built-in triangles (i.e. do not use geometry types other than “OPTIX_BUILD_INPUT_TYPE_TRIANGLES”). OptiX then “transparently” makes use of RTX capabilities where possible. Is there anything else to do?
Also, is there a way to verify the use of the RTX hardware acceleration during runtime? I am aware of querying “OPTIX_DEVICE_PROPERTY_RTCORE_VERSION” using “optixDeviceContextGetProperty()”. But this only yields the device’s capabilities - not the capabilities’ use.

Thanks for your help!
David

(1) http://www.sci.utah.edu/~wald/Publications/2019/rtxPointQueries/rtxPointQueries.pdf
(2) https://devtalk.nvidia.com/default/topic/1064480/optix/api-related-to-triangle-mesh/

That is correct.
The RT cores inside the RTX Turing boards accelerate two parts of ray tracing, the BVH traversal and the triangle intersection.

Both will run fully on the RT cores (in contrast to running on the Streaming Multiprocessors (SM)) when the BVH hierachy has two levels. Means a maximum of two acceleration structures (AS) from root node to leaf triangle geometry, which means for the scene structure an Instance AS (IAS) and Geometry AS (GAS) with triangles. The transform in the instances is hardware accelerated.

That’s basically all, but OptiX also supports only one GAS and also multiple IAS levels. The latter will run the BVH traversal only partially on the RT cores. (The overall maximum traversal depth can be queried via OPTIX_DEVICE_PROPERTY_LIMIT_MAX_TRAVERSABLE_GRAPH_DEPTH.)
Also whenever there is motion-blur inside the scene in the instances or geometry, the BVH traversal will become more expensive.
For custom geometric primitives, only the BVH traversal will be hardware accelerated but then calls back into your intersection shaders running on the SMs.
There are other things which call back into the SMs like anyhit programs.
Have a closer look at the available optixTrace() flags which can control some of the program domain invocations.

I’d recommend to watch the “OptiX Performance Tools and Tricks” presentation for more information:
https://devtalk.nvidia.com/default/topic/1062216/optix/optix-talks-from-siggraph-2019/

Not really. It’s happening automatically if you’re building the scene according to the above structure.

Mind that a GTX 960 is an entry-level board of a three GPU generations older architecture. It will be far from representative of the possible performance of even the smallest RTX board in everything you throw at it.

Thanks for your detailed explanation, Detlef! Also, thanks for pointing out above presentation to me! I will let you know in case I encounter any follow up questions.

That’s true. Unfortunately, I currently can’t help it - so I’ll try to make the best out of it ;)

I now do have a follow up question. I’m still looking for some sort of evidence that my code uses the RTX hardware where available.

Does your answer include the Nsight ecosystem (e.g. Nsight VSE and Nsight Compute), Detlef? I had a look at the “OptiX Performance Tools and Tricks” and the “OptiX Profiling with Nsight Compute” talks (for the latter, see (1)). Both talks mention the NSight ecosystem as source of information on - casually speaking - “what one’s code does during runtime”. Unfortunately, I’m not quite sure what to look for in the tools’ quite verbose output (in hope, that the tools are applicable at all). Can you please tell me whether Nsight VSE or Nsight Compute output some RTX related figures which can be used to deduce the usage of the RTX hardware? Thank you!

(1) NVIDIA SIGGRAPH 2018

A happy new year, everybody!

Just wanted to re-raise above question. Any help/feedback/… is appreciated!

Tricky. There isn’t really a way to not use the RTX hardware with OptiX 7.
At least BVH traversal will always be part of that, and triangle intersection in addition when using built-in triangle primitives.

When benchmarking an application, the resulting number of rays per second would be one of the indicators, but these numbers greatly depend on the ray divergence and amount of time spent in your own functions, means everything outside the BVH traversal and triangle intersection.

You could implement a custom triangle intersection routine and compare benchmarks using that and the built-in triangles and see how the per-function Nsight profiles change in relation.

That’s an approach I haven’t considered yet. Thanks for your input, Detlef!