free shared mem can shared memory be freed?
I wonder whether we can free shared memory within a kernel..
I wonder whether we can free shared memory within a kernel..

#1
Posted 03/23/2012 11:59 PM   
[quote name='thanasio' date='23 March 2012 - 05:59 PM' timestamp='1332547195' post='1387081']
I wonder whether we can free shared memory within a kernel..
[/quote]

Given that you cannot allocate shared memory within a kernel, what would this be used for?
[quote name='thanasio' date='23 March 2012 - 05:59 PM' timestamp='1332547195' post='1387081']

I wonder whether we can free shared memory within a kernel..





Given that you cannot allocate shared memory within a kernel, what would this be used for?

#2
Posted 03/24/2012 03:33 PM   
[quote name='seibert' date='24 March 2012 - 03:33 PM' timestamp='1332603196' post='1387317']
Given that you cannot allocate shared memory within a kernel, what would this be used for?
[/quote]

Good point ..
i have a 3-pass algorithm and each pass includes 4 shared mem buffers.
After each pass i syncthreads and load data from global memory to the shared memory buffers.
I have a feeling that there is a performance hit when i rewrite on the shared memory, probably due to optimizations that the compiler does in the preceding passes
(ie. unloading shared mem in registers etc.)

So i would like to try sth like

[u]1st pass[/u]
declare shared mem
load from global to shared
do sth with shared data
delete shared
syncthreads
[u]2nd pass[/u]
declare shared mem
load from global to shared
do sth with shared data
delete shared
syncthreads
[u]3d pass[/u]
declare shared mem
load from global to shared
do sth with shared data
delete shared
syncthreads
[quote name='seibert' date='24 March 2012 - 03:33 PM' timestamp='1332603196' post='1387317']

Given that you cannot allocate shared memory within a kernel, what would this be used for?





Good point ..

i have a 3-pass algorithm and each pass includes 4 shared mem buffers.

After each pass i syncthreads and load data from global memory to the shared memory buffers.

I have a feeling that there is a performance hit when i rewrite on the shared memory, probably due to optimizations that the compiler does in the preceding passes

(ie. unloading shared mem in registers etc.)



So i would like to try sth like



1st pass

declare shared mem

load from global to shared

do sth with shared data

delete shared

syncthreads

2nd pass

declare shared mem

load from global to shared

do sth with shared data

delete shared

syncthreads

3d pass

declare shared mem

load from global to shared

do sth with shared data

delete shared

syncthreads

#3
Posted 03/24/2012 11:31 PM   
[code]
__global__ void kernel(...)
{
{
// 1st pass
__shared__ ...
...
__syncthreads();
}
{
// 2nd pass
__shared__ ...
...
__syncthreads();
}
{
// 3rd pass
__shared__ ...
...
__syncthreads();
}
}
[/code]

IIRC the compiler will reuse the same shared memory for the three declarations. If it doesn't, or if you have a more complicated kernel structure which cannot be separated into consecutive passes, you can also use a union to overlap any variables in (shared) memory.


__global__ void kernel(...)

{

{

// 1st pass

__shared__ ...

...

__syncthreads();

}

{

// 2nd pass

__shared__ ...

...

__syncthreads();

}

{

// 3rd pass

__shared__ ...

...

__syncthreads();

}

}




IIRC the compiler will reuse the same shared memory for the three declarations. If it doesn't, or if you have a more complicated kernel structure which cannot be separated into consecutive passes, you can also use a union to overlap any variables in (shared) memory.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#4
Posted 03/25/2012 10:33 AM   
Scroll To Top