Best sorting algorithm using CUDA Sorting networks, merge or radix sort?
For a CUDA application I'm in need for a fast sorting algorithm to sort coordinates lexicographically. I know CUDPP provides a sorter, in the form of a RADIX sort combined with a merge sort.

Previous GPGPU sorters were generally based on sorting networks, which have complexity Nlog^2 N and quite some overhead.

What complexity has the CUDPP sorter? What are the advantages and disadvantages compared to a sorting network?
For a CUDA application I'm in need for a fast sorting algorithm to sort coordinates lexicographically. I know CUDPP provides a sorter, in the form of a RADIX sort combined with a merge sort.



Previous GPGPU sorters were generally based on sorting networks, which have complexity Nlog^2 N and quite some overhead.



What complexity has the CUDPP sorter? What are the advantages and disadvantages compared to a sorting network?

#1
Posted 02/01/2008 09:55 AM   
The best sorting algorithm depends somewhat on your application - whether you need to sort key/value pairs or just keys, the type of your data, whether you can sort incrementally etc. etc.

I believe the fastest published CUDA sort for 32-bit integers is Scott LeGrand's radix sort, which is described in his GPU Gems 3 article (chapter 32):
[url="http://forums.nvidia.com/index.php?showtopic=58488"]http://forums.nvidia.com/index.php?showtopic=58488[/url]

The code for a version of this is included in the particles demo in the SDK.
The best sorting algorithm depends somewhat on your application - whether you need to sort key/value pairs or just keys, the type of your data, whether you can sort incrementally etc. etc.



I believe the fastest published CUDA sort for 32-bit integers is Scott LeGrand's radix sort, which is described in his GPU Gems 3 article (chapter 32):

http://forums.nvidia.com/index.php?showtopic=58488



The code for a version of this is included in the particles demo in the SDK.

#2
Posted 02/01/2008 01:23 PM   
Scroll To Top