I am unable to understand the following line that I read here
cudaMallocPitch() pads the allocation to get best performance for the memory subsystem of a given piece of hardware.
My question is what is pitch linear memory (though linear memory I know)? and how is the padding going to improve the performance (i.e memory bandwidth or fetching rate)?
Pitched linear memory is just a linear memory allocation calculated from the 2D sizes you provide, with padding added as required to ensure row major access will be correctly aligned for coalesced memory access.
Yes. You still want coalesced reads for optimum performance. It’s just that the hardware on the G200 GPUs makes the penalty for not coalescing much lower (it automatically works out the minimum set of coalesced reads required to satisfy the half-warp’s request). One of the key requirements for coalescing is alignment.
But after pitched memory copy to the device, is the data still in format of 2D array or just 1D linear array? When I want to access the 2D array on device, how can I use 2D threads structure (threadIdx.x, threadIdx.y) to index and process the data?