How can I allocate 2-dimensional array on the device memory?

Hi.

I’m a super :"> newbie and just start short writings for test using CUDA.

Now I faced a problem…

I just want to allocate 2 dimensional array on the device memory and access values, like this

[codebox]int iArray[768][1024];

iArray[0][0] = 100;

…[/codebox]

However, I know cudaMalloc function provides only linear memory allocation,

it is hard to map one dimensional array into two dimensional array.

I think it’s very simple to solve, but i can’t find the solution at here and on the programming guide.

I found the “cudaArray” structure and cudaMallocArray function, but I’m not sure this structure and function are suitable for my purpose.

And I can’t access values in a cudaArray object.

Is there any one how to solve this?

I’m really appreciate if you help me… ^^;

    [*]Step 1:

    [indent]CUDA memory allocation, allocates memory for the number of elements irrespective of the dimension. Consider how multi-dimensional arrays are stored in any memory - they are contiguous. There for simply allocate sizeof(datatype) * arraywidth * array height

    Remember, a 2D array is addressed using two pointers so you may have to do some jiggery pokery with the m-alloc.

    [/indent]

    [*]Step 2:

    [indent]Pass a double pointer to your array as a parameter to your function, you will then be able to access your array via double brackets

    [/indent]

    [*]Easy Alternative:

    [indent]Flatten your arrays and calculate the element index using gridDim, blockDim, blockIdx and threadIdx, if you have one thread associated with a single element of the array. Alternatively pass in the dimensions as parameters:

    ElementIndex = Array Width * Y co-ordinate + X co-ordinate

    Simple enough?

    [/indent]

Thanks for your detailed reply, yumming.

And I apologize for my poor expression ability in english. >.<

Since I couldn’t find method allocates multi-dimensional array, I’ve done using those mapping.

But, as I mentioned before, It is too complicated to me… (sorry, again…)

Does not CUDA provide multi-dimensional memory allocation method?

Would you tell me …?

And What does cudaArray structure do?

According to the programming guide, it is used by some operations for texture.

Does it exist only for Texture Operation???

Understand that memory at the hardware level, is accessed in a linear fashion - multi-dimensional access is an illusion provided by the compiler

While your learning CUDA, I’d only the fundamentals from the Runtime API as much as possible until your confident with it i.e. don’t bother with textures yet.

Practice makes perfect, get used to addressing a flattened 2D array.

What I suggest you do is make used of shared memory which CAN be multi-dimensional:

[codebox]

global void MyKernel(int* flatArray)

{

extern shared int my2DArray[32][32]; //size need to be coded a development time though

my2DArray[threadIdx.x][threadIdx.y] = flatArray[blockDim.x * threadIdx.y + threadIdx.x];

}

[/codebox]

This example copies elements from a flat 1D array to a 2D shared array. It is important in this circumstance that 1 thread copies 1 element.

I hope this helps

Well, actually you can “convert” a linear to a multi-dimensional array if the dimensions are known at compile-time (actually gcc can probably do it even when they are variable, but that is not valid C in general I think).

int linear[10*32]; // linear array

int (*twodim)[32] = linear; // interpreted as a two-dimensional [10][32] array.

assert(&twodim[2][4] == &linear[2*32 + 4]); // these are the same

Note that nobody uses this syntax (except possibly in function arguments where you can also use the more readable “int twodim[32]”) because it is simply way more confusing than just doing it “the stupid way”.

In step 2, u mention pass a double pointer to your array as a parameter to your function, but in your sample code:

__global__ void MyKernel(int* flatArray)

{

extern __shared__ int my2DArray[32][32]; //size need to be coded a development time though

my2DArray[threadIdx.x][threadIdx.y] = flatArray[blockDim.x * threadIdx.y + threadIdx.x];

}

isnt that single pointer in the parameter field? why?