Radix Sort using Compute Shaders Bitonic vs Radix

I’m currently building a water particle system using smoothed particle hydrodynamics for a game I’m making for a school project, currently I have about 8192 particles running at around 1300 FPS (1x 580 GTX) with a basic metaball pixel shader for a blobbing effect.

I’m currently using Simon Green’s Particle Simulation using CUDA’s grid method for my optimization of the particles to great effect, however my biggest bottle neck is the sort required.

Currently I’m copying the cell data to the CPU and doing a sequential Radix sort, and to increase performance even farther I’m looking at doing a GPU sort.
I noticed the June 2010 DirectX sample has a very similar fluid simulation demo that I took a look at and saw they are using a GPU bitonic sort and matrix transform. I also found Designing Efficient Sorting Algorithms for Manycore GPUs that suggests a radix sort on the GPU.

Does anyone know if a radix sort has been done in a compute shader. I have found some good resources for how to do it in CUDA, however I’m wondering about the differences between the two.
Also is it worth trying to figure it out? or would the bitonic sort from the sample be just as good/better for a lot less effort (as there is both code I can look at and a nice explaintion on Microsofts website)?

I’m not aware of any DX Compute implementations of radix sort. I would recommend trying the bitonic sort from the Microsoft sample, this will certainly be faster than transferring the data to the CPU and doing a CPU sort!

Alternatively, if you’re running on Fermi hardware, doing grid binning with atomics is pretty fast these days.

Also watch out NVIDIA patented Radix Sort(at least a particular formulation of it) on the GPU!!! I wasnt very impressed when I noticed that:-) Not sure what sort of royalties they charge…

http://www.google.com/patents/about?id=gJzKAAAAEBAJ&dq=radix+sort+NVIDIA

David

(excuse for the out-dated reply but just for the record)

There is a DX11 DirectCompute and OpenCL radix sort in my experiments folder at github here:

You can build it under Windows by clicking on the build/vs2008.bat or vs2010.bat batch file,
and open the solution in Visual Studio project.
Check out the OpenCL_DX11_primitives_test_* and OpenCL_DX11_radixsort_benchmark_* projects

Thanks,
Erwin

Hey, hello I couldnt find the projects you mention, I did fine downloading and building by clicking the .bat, but there are 28 projects non of them called or containing the radix, did you change it? thanks in advance.

I have found the files, I’m new at github ;) thanks.