For a CUDA application I’m in need for a fast sorting algorithm to sort coordinates lexicographically. I know CUDPP provides a sorter, in the form of a RADIX sort combined with a merge sort.
Previous GPGPU sorters were generally based on sorting networks, which have complexity Nlog^2 N and quite some overhead.
What complexity has the CUDPP sorter? What are the advantages and disadvantages compared to a sorting network?
The best sorting algorithm depends somewhat on your application - whether you need to sort key/value pairs or just keys, the type of your data, whether you can sort incrementally etc. etc.