(bare with me for a long introduction)
A big part of my interest in GPU computing is on photon Monte Carlo (MC) simulations. We have an NIH funded software project called mcx (http://mcx.space) and we have been optimizing it and adding new features (in the past, we also got a lot of valuable help from folks in this forum, thank you!)
The core of the MC simulation is essentially a ray-tracer - a photon is first launched from a location, does a bunch of scattering and absorption events, and then exits the domain somewhere, then we start the next one until all photons are simulated. CUDA/GPU allows this to be done every efficiently using massive threads.
However, there are some major differences between our “ray-tracer” and the typical ray-tracing in rendering tasks, namely
-
each ray (a photon packet) typically experiences many (hundreds) scattering events before exiting - using the terminology I heard from graphics talks, it is mostly performing “sub-surface scattering” (but it certainly can handle less scattering or transparent media - in that case it behaves just like a typical graphics ray-tracer).
-
the optical properties are typically associated with volumetric elements (voxels or tetrahedral elements) instead of like in graphics rendering tasks where optical materials are associated with surfaces (triangles).
-
we have a very efficient acceleration structure - where the photon is either bounded within a voxel (only need testing intersections with the 6 facets) or bounded by a known tetrahedron (only testing ray-triangle intersection with 4 triangles) for each photon movement.
-
we need to save volumetric data along photon trajectories - either in a voxel grid or in a tetrahedral mesh - to represent light intensity (fluence/fluence rate) in 3D space. In comparison, most graphics renderer only cares about the RGBs on a 2D camera pixel space.
My codes were written in CUDA (and separately, OpenCL). These have been working quite well, and I can see hundreds to thousands fold speedup compared to a CPU thread, which I am quite happy. But the recent buzzes in RTX and tracer core from NVIDIA caught my attention again, making me keep wondering if I can get significant speed improvement by somehow porting my volumetric ray-tracing code using the new ray-tracer hardware.
However, from what I read (extremely limited), the interface to the ray-tracer seems to be limited to rendering APIs. Given the above major differences (different optical property attachment, different output format, different acceleration structure), I don’t really see a clear pathway to port my cuda/opencl code to OpenGL, Vulkan or OptiX - again my understanding to these programs are extremely basic.
So, my questions here for everyone are
-
can my code (CUDA/OpenCL) directly benefit from the new ray-tracing hardware without major changes?
-
if not, is there any way I can modify my CUDA/OpenCL code in order to use the new hardware and do the ray-tracing (but my kind of ray-tracing) more efficiently?
-
if I can not keep the CUDA/OpenCL framework in order to use real-time ray-tracing functionality, then, which of these models (OpenGL, Vulkan, OptiX) will likely give me sufficient flexibility to implement my MC-like ray-tracing?
-
Is there a metric that I can measure, for example, ray-voxel or ray-triangle intersection testing per second, in order to get me an ideal how my cuda code is doing in comparison to those reported real-time ray-tracer benchmarks? I’ve heard several mega-rays per second but don’t really know how many scattering/reflection on average for each ray in those benchmarks.
sorry for the long question, but I think some comments along this line will really help me understand how feasible to advance MC using the new hardware resources.