[Solved]triaId in optix and other

Hi, I have two questions but they are not completely unrelated. I very much would appreciate to have an answer for the first one.

  1. I want to find out what is the most appropriate API for developing what I want (prime or not)

I have a mesh made of triangles only.

Looking at the SDK tutorial and samples, I found out that the any_hit and closest_hit are very convenient for the ray operation I want to do (monte carlo sampling).

However, for doing that I need to access the coordinates of my mesh during the any_hit / closest_hit programs (I need uniform sampling on the surface of each triangle). I saw some examples where I can pass the coordinates of the vertex through buffer. However, I did not find any way to access the id of the triangle inside the any_hit / closest_hit programs.

If I do not get that information, there is no way I can access the correct vertex information. (Or am I wrong ?) So my question are:

Q1a) how can I access the id of the triangle that the ray hit in the any_hit routine ?

Q1b) how can I access the id of the triangle that is the closest to the hay in the closest_hit routine ?

I know Q1a) and Q1b) are not relevant if I use optix prime (I had a look at the prime* examples, there as a structure with the triaId) but it looks far less flexible. Since I need to also store data related to each triangle during the any_hit operation, I definitely need the triangle id.

  1. I have a question regarding the way optix works and I have two idea for the implementation of what I want to do.

Approach 2a): analytical sampling

  • Starting from an initial triangle, a ray is going to intersect another triangle
  • at the point of intersection, multiple (typically 1000) other ray are going to be generated with less weight (and other operation on ray or intersecting triangle)
  • the ray stops when its weight (or depth level) is less (or more) than a value

This method uses quite recursive.
I estimate the number of time the number time a ray is going to generate other ray to be about 10 in average, but it will diverge very significantly.

So my question is:

Q2a) If the multiple number of reflections for one thread is 100, will the other have to wait for that thread to finish so all the other ones starts the full sequence again ?

Approach 2b): sampling using random number

  • Starting from an initial triangle, a ray is going to intersect another triangle
  • at the point of intersection, the ray is going to be reflected, the direction being based on a probability, but with less weight (and other operation on ray or intersecting triangle)
  • the ray stops when its weight (or depth level) is less (or more) than a value

This method is far less recursive.
So I assume I need a stack size which will be smaller so I will be able to launch much more thread (?), also Optix will be faster (?)

The advantage of the method 2a) is that I can actually express in an analytical way all the reverse sample distributions I want so I do not really need random number sampling and I will converge faster toward an accurate solution than with method 2b)

The disadvantage is that I am afraid the speed will be limited by the big stack requirement and the fact I can launch less threads at the same time on my GPU (?)

Am I correct with my affirmations ? please correct me if I am wrong

Eventually my question is: which one of the two method is going to be faster ? I assume the only way to answer my question is to implement both methods. But if there is an obvious answer I would like to know.

Regarding your first question:

The triangle id is used in the intersection program. If you want to pass data from the intersection program to the any/closest hit program, you need to declare a variable using an attribute:

rtDeclareVariable(uint, triangleId, attribute triangleId, );

Write the triangleId within the intersection program:

RT_PROGRAM void mesh_intersect(int index)
{
    ...

    if (rtPotentialIntersection(t))
    {
        triangleId = index;
        rtReportIntersection(material_buffer[index]);
    }
}

You can then access triangleId inside of your any/closest hit program.

OK, thank you for the clear answer.

Regarding the second part of your question, both of your solutions are equally recursive (they both have a recursion depth of 10 on average), so they both will need approximately the same stack size. Also, both will use the same number of threads, that being the number of threads indicated by your call to rtContextLaunch*D(). However, your second method will run much faster as a result of tracing far fewer rays.

The downside of the second method is that the results from a single launch will be less accurate, but you could solve that problem by shooting many primary rays from your initial triangle and averaging the results.

I am not sure I fully agree, but that maybe because I was not clear enough. In the second case, when a ray is leaving the traingle, there is no need to keep track of the previous event (all the information is carried in the ray: just a single intensity variable).

In the first case however, the program needs to keep track of the previous position in the for loops (if I am looping all over a half sphere with discrete time step. for example using this: http://corysimon.github.io/articles/uniformdistn-on-sphere/ ) so it can uniformly discretize the space

If you still disagree, please let me know, I must have miss something then.

That definitely was my intention as the boundary condition at the wall are not deterministic but are defined in a probabilistic way.

More important, I see how to use the 2nd approach with optiX prime (I notice that can also run on CPU which is important for my application).

I mean that I can:

  • create a collection of ray that goes from some triangle to others
  • compute the intersections
  • make some computation at the triangle using the information contained in the ray (as id from source triangle as well as id of intersected triangle)
  • generate a new bunch of ray, whose the origin is the impact point of those ray.
  • continue until all the ray do not carry energy.
    The downside of this is that all the GPU stays busy until the last ray fade out.

That is the reason why I wanted to know how was behaving the regular OptiX, thus that question in my initial post:

It’s possible you are referring to iterative ray tracing, where the ray payload carries the new intersection and direction back to the camera program, and all rays are cast by the camera program. The key difference is where the call to rtTrace is located. If rtTrace is in the closest hit program, then the ray tracing is recursive and requires a larger stack size. If rtTrace is only in the ray generation program, then the ray tracing is iterative and can work with a smaller stack.

Within a warp, yes. In the more general case, I’m not sure, but you can force this behavior with rtContextSetTimeoutCallback. Also, be aware that future OptiX versions might not behave like the current one in this regard.

I was indeed aiming at implementing rtTrace in the closest hit program. I actually have to do that for the 1st approach.

Thanks for the clarification