Do I need threadfence?
I know that __threadfence() blocks until memory writes are visible to all threads etc.

But consider the following scenario:
1. If each thread only writes and reads into its own portion of shared or global memory, do I need to call threadfence before reading?

2. If I try to read a global or shared variable after a (possibly different) thread has written to it using an atomic operation, do I need threadfence().

3. I can't use atomics with volatile variables. I assume this is because atomics flush the cache anyway. Is this true?

I'm under the impression that threadfence is not required under the above scenarios but have no hard data to confirm it.

Thank you.
I know that __threadfence() blocks until memory writes are visible to all threads etc.



But consider the following scenario:

1. If each thread only writes and reads into its own portion of shared or global memory, do I need to call threadfence before reading?



2. If I try to read a global or shared variable after a (possibly different) thread has written to it using an atomic operation, do I need threadfence().



3. I can't use atomics with volatile variables. I assume this is because atomics flush the cache anyway. Is this true?



I'm under the impression that threadfence is not required under the above scenarios but have no hard data to confirm it.



Thank you.

#1
Posted 07/08/2011 08:27 PM   
[quote name='akavo' date='08 July 2011 - 12:27 PM' timestamp='1310156868' post='1262013']
1. If each thread only writes and reads into its own portion of shared or global memory, do I need to call threadfence before reading?
[/quote]
No because no contention here.

[quote]
2. If I try to read a global or shared variable after a (possibly different) thread has written to it using an atomic operation, do I need threadfence().
[/quote]
No because atomic operation (read-modify-write) is stronger than threadfence.
threadfence guaratees order of executions of the thread who issues threadfence, it does not affect behaviour of OTHER threads.
[quote name='akavo' date='08 July 2011 - 12:27 PM' timestamp='1310156868' post='1262013']

1. If each thread only writes and reads into its own portion of shared or global memory, do I need to call threadfence before reading?



No because no contention here.





2. If I try to read a global or shared variable after a (possibly different) thread has written to it using an atomic operation, do I need threadfence().



No because atomic operation (read-modify-write) is stronger than threadfence.

threadfence guaratees order of executions of the thread who issues threadfence, it does not affect behaviour of OTHER threads.

Department of Mathematics, Tsing Hua university, R.O.C.
Lung Sheng Chien

#2
Posted 07/09/2011 01:02 AM   
[quote name='LSChien' date='09 July 2011 - 01:02 AM' timestamp='1310173374' post='1262089']
No because no contention here.


No because atomic operation (read-modify-write) is stronger than threadfence.
threadfence guaratees order of executions of the thread who issues threadfence, it does not affect behaviour of OTHER threads.
[/quote]

Great, thanks.
[quote name='LSChien' date='09 July 2011 - 01:02 AM' timestamp='1310173374' post='1262089']

No because no contention here.





No because atomic operation (read-modify-write) is stronger than threadfence.

threadfence guaratees order of executions of the thread who issues threadfence, it does not affect behaviour of OTHER threads.





Great, thanks.

#3
Posted 07/09/2011 11:33 PM   
[quote name='LSChien' date='09 July 2011 - 07:32 AM' timestamp='1310173374' post='1262089']

threadfence guaratees order of executions of the thread who issues threadfence, it does not affect behaviour of OTHER threads.
[/quote]

Quick question here - When you say threadfence doesnot effect behavior of OTHER threads - does it mean that a threadfence actually only makes the CALLING thread itself till its write is visible to everyone else and doesnot really make other threads to wait?

If that is the case then is there a way to actually implement a race free scheme - where in a group of threads (grpB) need read something that another grp (grpA) of threads write. Also before initializing there read sequence each thread of grpB is spinning on a variable(flag) which ONE of the grpA threads set. - hence grpB should wait for the write of grpA to be visible.

Does CUDA programming construct gives us such ability?
[quote name='LSChien' date='09 July 2011 - 07:32 AM' timestamp='1310173374' post='1262089']



threadfence guaratees order of executions of the thread who issues threadfence, it does not affect behaviour of OTHER threads.





Quick question here - When you say threadfence doesnot effect behavior of OTHER threads - does it mean that a threadfence actually only makes the CALLING thread itself till its write is visible to everyone else and doesnot really make other threads to wait?



If that is the case then is there a way to actually implement a race free scheme - where in a group of threads (grpB) need read something that another grp (grpA) of threads write. Also before initializing there read sequence each thread of grpB is spinning on a variable(flag) which ONE of the grpA threads set. - hence grpB should wait for the write of grpA to be visible.



Does CUDA programming construct gives us such ability?

#4
Posted 04/09/2012 08:56 AM   
[quote name='sidxavier' date='09 April 2012 - 04:56 AM' timestamp='1333961814' post='1393599']
Quick question here - When you say threadfence doesnot effect behavior of OTHER threads - does it mean that a threadfence actually only makes the CALLING thread itself till its write is visible to everyone else and doesnot really make other threads to wait?

If that is the case then is there a way to actually implement a race free scheme - where in a group of threads (grpB) need read something that another grp (grpA) of threads write. Also before initializing there read sequence each thread of grpB is spinning on a variable(flag) which ONE of the grpA threads set. - hence grpB should wait for the write of grpA to be visible.

Does CUDA programming construct gives us such ability?
[/quote]
Depend if your threads are on the same warp (or half-warp it's architecture dependent)! You need that all your grpA threads to be on the same half-warp to be safe, and same for grpB threads, but threads of grpA and grpB should not be in same Warp, but should be in the same block! Anyway it's architecture-dependent, and would not engage you to think this way!

I would have done a quick simple Reduce to count the threads that have finished writing, to ensure they are all synchronized (especially if they are in different blocks).
[quote name='sidxavier' date='09 April 2012 - 04:56 AM' timestamp='1333961814' post='1393599']

Quick question here - When you say threadfence doesnot effect behavior of OTHER threads - does it mean that a threadfence actually only makes the CALLING thread itself till its write is visible to everyone else and doesnot really make other threads to wait?



If that is the case then is there a way to actually implement a race free scheme - where in a group of threads (grpB) need read something that another grp (grpA) of threads write. Also before initializing there read sequence each thread of grpB is spinning on a variable(flag) which ONE of the grpA threads set. - hence grpB should wait for the write of grpA to be visible.



Does CUDA programming construct gives us such ability?



Depend if your threads are on the same warp (or half-warp it's architecture dependent)! You need that all your grpA threads to be on the same half-warp to be safe, and same for grpB threads, but threads of grpA and grpB should not be in same Warp, but should be in the same block! Anyway it's architecture-dependent, and would not engage you to think this way!



I would have done a quick simple Reduce to count the threads that have finished writing, to ensure they are all synchronized (especially if they are in different blocks).

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

#5
Posted 04/11/2012 02:46 PM   
Scroll To Top