Suppose I have an array of size 1 million, and the goal is to select the top 1000 largest elements in the array. I am thinking if there is any efficient way to solve it in GPU.
I’ve searched the forum and make a summary on what I find.
sort and selection. It overkill the problem since only 1000 out of 1 million elements return.
make a heap data structure in GPU, and heapify in parallel.
Solution 1 is not efficient. Regarding solution 2, my concern is how to make a heap data structure in
GPU, and heapify in parallel might also have race condition.
finding the minimum, min, within the array of 1m elements
select a pivot equal to min + value and count the number of elements less than the pivot
iterate on the pivot value until i have the number of elements less than the pivot, greater than (but not by too much) or equal to 1000
gather the elements less than the pivot
and this is an algorithm that can easily be parallelized across multiple thread blocks - i.e. you can use multiple thread blocks to simultaneously work on the problem/ algorithm