( NOTE: Initially posted this on the openGL forum. But later thought it better here. sorry!? )
Hi there.
I’m coding for modern GPU’s. ( OpenGL 4.x etc… )
The mesh data I’m sending to the graphics card is pretty much directly output from Maya. I’m guessing they will have poor vertex-cache-ordering.
I’m hoping to use a vertex-cache-optimization pre-pass on the meshes to gain some performance. As suggested here
http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
The meshes have 500000+ triangles in them. ( Rendered as triangles via index buffer )
And we do a number of shadow passes too. Which puts more requirements on vertex through-put.
The thing is…
I have tried to use TomF’s algorithm (as described in the link) and it actually made things slower! :( And I definitely performed all steps.
ie
- index buffer re-ordering
- rebuild vertex buffers using the new index ordering to achieve near-linear access
Any idea why this would be? I’m guessing the assumptions that Tom made back in 2006 do not hold for modern GPUs?
NOTE:
I used both of these implementations
- http://gameangst.com/wp-content/uploads/2009/03/forsythtriangleorderoptimizer.cpp
- Google Code Archive - Long-term storage for Google Code Project Hosting.
And both achieved the same slowdown. ( from 31fps to 29fps )
So I’m guessing both have a consistent ( and therefore hopefully correct ) implementation.
My Question:
How should I approach this problem given modern-day architectures?
How should I be ordering the data to achieve the best performance on the card?
Is it worth doing anything at all?
Thanks a lot! :)
Brian