It’s a templated function. templating is a C++ concept, you can do a google search on “C++ template” to learn more about it.
The template parameter (sortDir) is something the compiler determines (needs to be able to determine) at compile time.
This particular template parameter indicates the implied direction of the underlying sorted data: ascending or descending. Most binary searches depend on the data being sorted.
It is a uint that is treated as a boolean, so if you pass 1 for the template parameter, you are indicating to the routine (binarySearchInclusive) that the underlying data is sorted in an ascending fashion, and if you pass 0 it is sorted in a descending fashion.
ascending data:
binarySearchInclusive<1>(…);
descending data:
binarySearchInclusive<0>(…);
Note that this question has essentially nothing to do with CUDA.
the “,” are negligible, just to show the sequences
and I have another array of numbers (splitters), these splitters for the 2-way merge sort
[6 18 7 23 22 55 92 290]
how can I use the previous binary search kernel to find the position of each number of the splitter in each sequence
for example the number have the positions {2,2,1,0}
18 > {5,5,2,0}
how can I do this in code ??
__global__ static void K_way_merge(int *a, int *splitters, int *splitters_indexs, int k, int seq_size){
int D = k + 1;
const int tid = threadIdx.x;
splitters[tid] = a[D*tid + k + k*(tid / (seq_size/D))]; // here k=2
__syncthreads();
int spliter_size = NumOfChuncks * (seq_size / D); // the size of each sequence, in this example size=8 //and number of chuncks is 4
//check if the splitter size is power of two and sort it
if ((spliter_size&(spliter_size - 1)) == 0){
bitonicSort << <1, spliter_size, spliter_size*sizeof(int) >> >(splitters, spliter_size); //sort splitters when its power of two
__syncthreads();
}
else{
// when splitters is not power of two, expand the size to the next power of two and sort them
int expanded_size = nextPowerOfTwo(spliter_size);
for (int i = spliter_size; i <expanded_size; i++){
splitters[i] = MAXINT;
}
__syncthreads();
bitonicSort << <1, expanded_size, spliter_size*sizeof(int) >> >(splitters, expanded_size);
__syncthreads();
}
//////////////////////////////////
//the modification is here
splitters_indexs[threadIdx.x] = binarySearchExclusive<1>(splitters[threadIdx.x], a, 8, 8); // the
}