Hi,
I’m interested in exploring the viability of using OptiX for calculating viewfactors as part of an enclosure radiation simulation code. I adapted the simplePrimePP example from the SDK to take in a mesh file, and use it to create an OptiX Prime model. Then, I generate rays for each face and execute a closest hit query. The results seem to be correct, but I’m concerned that I might not be getting the most out of my GPU.
In some recent presentations (http://i.imgur.com/kjqOy6k.png and http://imgur.com/Z5hRxe7), there are stats that indicate a GTX Titan (which is what I’m testing on) is capable of ~ 300M rays/sec, but my timings are coming in at about 50-60M rays/sec. I appreciate that marketing numbers are often “theoretical maxima”, but getting only ~20% of the performance that other people are claiming makes me think I might be approaching the problem in the wrong way!
Some additional information about the performance tests:
-using float3 for ray origins and directions, no double precision
-test meshes are tessellated spheres, looking similar to this: http://i.imgur.com/ioeJ6fR.png
-rays are generated to have some spatial locality (to try and minimize thread divergence when traversing BVH). Here’s a little animation indicating how the rays are numbered on one example triangle: http://i.imgur.com/Nk3anAz.gif
-timings are best-case, not counting any transfers, just the query execution:
//
// Execute query
//
Query query = model->createQuery(RTP_QUERY_TYPE_CLOSEST);
query->setRays( rays.count(), Ray::format, rays.type(), rays.ptr() );
query->setHits( hits.count(), Hit::format, hits.type(), hits.ptr() );
cudaDeviceSynchronize();
gettimeofday(&then, NULL);
query->execute( 0 );
query->finish();
gettimeofday(&now, NULL);
double query_time = time_elapsed(then, now);
A sphere with 131072 triangles and 1024 rays / tri takes 2.30383 seconds on this machine, w/ cuda 7.5, OptiX 3.9, driver version 352.93, on a GTX Titan.
I’ve read through chapters 9 and 11 in the OptiX programming guide to see if there are any obvious ways I might be slowing things down, but nothing leaps out at me. Can anyone offer some insight on where I might look for problems, and how to effectively profile an Optix Prime application to find potential problems? I have experience with CUDA, but this is my first time working with OptiX.
Thanks