This would often result in a few hundred work items, each responsible for a few million triangle hit tests.
PID = Unique Process ID;
for(i= PID; i < totalTriangles; i+= totalThreads)
for(n=0; n < totalRays; n++)
// do computation
I've read that access to the global memory is slow compared to all the rest, but I don't think it is reasonable to copy the mesh data to work group local sets before processing. Am I missing something here, or is it best to just have the work items access the triangle data from global memory?
You must Log In to send a PM.
Please Log In | Register to add a comment.
Not a member? Register Now