how to assign shared memory size with variable blockDim.x blockDim.y and blockDim.z
Can I assign a shared memory like the following? apparently not. I got error messages.


__shared__ float a_d[blockDim.x*blockDim.y*blockDim.z];


I want to make it the size of block. Can I use the variables for it?

Or, I have to use a specific number for the size.

Any comments are welcome and thanks in advance.
Can I assign a shared memory like the following? apparently not. I got error messages.





__shared__ float a_d[blockDim.x*blockDim.y*blockDim.z];





I want to make it the size of block. Can I use the variables for it?



Or, I have to use a specific number for the size.



Any comments are welcome and thanks in advance.

#1
Posted 09/28/2010 08:41 PM   
Can I assign a shared memory like the following? apparently not. I got error messages.


__shared__ float a_d[blockDim.x*blockDim.y*blockDim.z];


I want to make it the size of block. Can I use the variables for it?

Or, I have to use a specific number for the size.

Any comments are welcome and thanks in advance.
Can I assign a shared memory like the following? apparently not. I got error messages.





__shared__ float a_d[blockDim.x*blockDim.y*blockDim.z];





I want to make it the size of block. Can I use the variables for it?



Or, I have to use a specific number for the size.



Any comments are welcome and thanks in advance.

#2
Posted 09/28/2010 08:41 PM   
No, because the size must be known at compile time.
What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:
[font="Courier New"]extern __shared__ float a_d[];[/font]
and add the required size as third configuration parameter of the kernel invocation:
[font="Courier New"]my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();[/font]

Note that this only works for one variable size array.
No, because the size must be known at compile time.

What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:

extern __shared__ float a_d[];

and add the required size as third configuration parameter of the kernel invocation:

my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();



Note that this only works for one variable size array.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#3
Posted 09/29/2010 01:24 AM   
No, because the size must be known at compile time.
What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:
[font="Courier New"]extern __shared__ float a_d[];[/font]
and add the required size as third configuration parameter of the kernel invocation:
[font="Courier New"]my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();[/font]

Note that this only works for one variable size array.
No, because the size must be known at compile time.

What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:

extern __shared__ float a_d[];

and add the required size as third configuration parameter of the kernel invocation:

my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();



Note that this only works for one variable size array.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#4
Posted 09/29/2010 01:24 AM   
Hmmmm... continuation of the equation. Something is still not right.

so, I set in my main code,

dim3 dimGrid2(129,129),dimBlock2(1,1,33);

Kernel<<<dimGrid2,dimBlock2,33*sizeof(REAL)>>>
....


__global__ void Kernel(REAL *COEF,REAL *P,REAL *PN,REAL *RHS,
int imax,int jmax,int kmax, REAL *cost)
{

__shared__ REAL COEFs0[];
__shared__ REAL COEFs1[];
__shared__ REAL COEFs2[];

}

so that i can dynamically assign same size of arrays in the kernel.

Oh. I didn't find the third parameter usage in the book "Programming Massively Parallel Processors" but I looked up again in documents from NVIDIA and now I know there is even a fourth parameter you can set... Thanks very much!



[quote name='tera' post='1123796' date='Sep 28 2010, 06:24 PM']No, because the size must be known at compile time.
What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:
[font="Courier New"]extern __shared__ float a_d[];[/font]
and add the required size as third configuration parameter of the kernel invocation:
[font="Courier New"]my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();[/font]

Note that this only works for one variable size array.[/quote]
Hmmmm... continuation of the equation. Something is still not right.



so, I set in my main code,



dim3 dimGrid2(129,129),dimBlock2(1,1,33);



Kernel<<<dimGrid2,dimBlock2,33*sizeof(REAL)>>>

....





__global__ void Kernel(REAL *COEF,REAL *P,REAL *PN,REAL *RHS,

int imax,int jmax,int kmax, REAL *cost)

{



__shared__ REAL COEFs0[];

__shared__ REAL COEFs1[];

__shared__ REAL COEFs2[];



}



so that i can dynamically assign same size of arrays in the kernel.



Oh. I didn't find the third parameter usage in the book "Programming Massively Parallel Processors" but I looked up again in documents from NVIDIA and now I know there is even a fourth parameter you can set... Thanks very much!







[quote name='tera' post='1123796' date='Sep 28 2010, 06:24 PM']No, because the size must be known at compile time.

What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:

extern __shared__ float a_d[];

and add the required size as third configuration parameter of the kernel invocation:

my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();



Note that this only works for one variable size array.

#5
Posted 09/29/2010 03:46 PM   
Hmmmm... continuation of the equation. Something is still not right.

so, I set in my main code,

dim3 dimGrid2(129,129),dimBlock2(1,1,33);

Kernel<<<dimGrid2,dimBlock2,33*sizeof(REAL)>>>
....


__global__ void Kernel(REAL *COEF,REAL *P,REAL *PN,REAL *RHS,
int imax,int jmax,int kmax, REAL *cost)
{

__shared__ REAL COEFs0[];
__shared__ REAL COEFs1[];
__shared__ REAL COEFs2[];

}

so that i can dynamically assign same size of arrays in the kernel.

Oh. I didn't find the third parameter usage in the book "Programming Massively Parallel Processors" but I looked up again in documents from NVIDIA and now I know there is even a fourth parameter you can set... Thanks very much!



[quote name='tera' post='1123796' date='Sep 28 2010, 06:24 PM']No, because the size must be known at compile time.
What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:
[font="Courier New"]extern __shared__ float a_d[];[/font]
and add the required size as third configuration parameter of the kernel invocation:
[font="Courier New"]my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();[/font]

Note that this only works for one variable size array.[/quote]
Hmmmm... continuation of the equation. Something is still not right.



so, I set in my main code,



dim3 dimGrid2(129,129),dimBlock2(1,1,33);



Kernel<<<dimGrid2,dimBlock2,33*sizeof(REAL)>>>

....





__global__ void Kernel(REAL *COEF,REAL *P,REAL *PN,REAL *RHS,

int imax,int jmax,int kmax, REAL *cost)

{



__shared__ REAL COEFs0[];

__shared__ REAL COEFs1[];

__shared__ REAL COEFs2[];



}



so that i can dynamically assign same size of arrays in the kernel.



Oh. I didn't find the third parameter usage in the book "Programming Massively Parallel Processors" but I looked up again in documents from NVIDIA and now I know there is even a fourth parameter you can set... Thanks very much!







[quote name='tera' post='1123796' date='Sep 28 2010, 06:24 PM']No, because the size must be known at compile time.

What you can do is use the dynamical shared memory allocation feature. Declare a_d as follows:

extern __shared__ float a_d[];

and add the required size as third configuration parameter of the kernel invocation:

my_kernel<<<gridsize, blocksize, blocksize.x*blocksize.y*blocksize.z*sizeof(float)>>>();



Note that this only works for one variable size array.

#6
Posted 09/29/2010 03:46 PM   
Scroll To Top