This would often result in a few hundred work items, each responsible for a few million triangle hit tests.
PID = Unique Process ID;
for(i= PID; i < totalTriangles; i+= totalThreads)
for(n=0; n < totalRays; n++)
// do computation
I've read that access to the global memory is slow compared to all the rest, but I don't think it is reasonable to copy the mesh data to work group local sets before processing. Am I missing something here, or is it best to just have the work items access the triangle data from global memory?
You must Log In to add a comment.
New Private Message
Follow Us On
Copyright © 2014 NVIDIA Corporation