Trying to do a multiple find max in parallel

Hello !

First of all, let me describe the use of this function :

__device__ void check(int *dTarget, int *dtoCheck)

dTarget = [ [x0,x1,x2], [x3,x4,x5] ]
dToCheck = [ [x0,x1,x2], [x3,x4,x5] ]

The objective is to check if in each sub array in toCheck the max value is the same index as the corresponding max index in dTarget such as :

dTarget = [ [0,0,1], [1,0,0] ]
dToCheck = [ [0.25,1,5], [7,7,15] ]

maxIndex([0,0,1]) = 2
maxIndex([0.25,1,5]) = 2 Correct !

maxIndex([1,0,0]) = 0
maxIndex([7,7,15]) = 2 Wrong !

Offcourse it can be way more value in each sub array, and a lot of sub arrays (1000 - 10 000).

My first thought was to give to each single thread the job of taking the corresponding subarray in each matrix, then check if indexes are correct. But if i do so, with this representation i think that global memory access isn’t coalesced right ? The initial reprensation should be :

dTarget = [ [x0,x3][x1,x4],[x2,x5]]
dToCheck = [ [x0,x3][x1,x4],[x2,x5]]

in order to get coalesced access memory within each thread in a block right ?

Maybe a better strategy could be adopted ?

PS : i still have no code, i’m just thinking about this solution.

Thanks a lot,