Racecheck and threadfence

Hi,

I have what looks like false positive within racecheck, but I’d like to check my understanding.

I have a kernel that I have reduced to run with a single warp in each block (i.e. dimensions 32x1), and use shared variables essentially as warp-level variables. If I fence reads and writes using __threadfence_block(), racecheck prints huge numbers of warnings about hazards, but simply replacing these calls with __syncthreads() makes the warnings go away. Other than racecheck warnings, I don’t see any behaviour difference between the two approaches.

My understanding from the docs is that __threadfence_block() is sufficient to ensure that all threads have a consistent view of shared memory after it is called. Is this incorrect, or is this a false positive from racecheck?

(The normal version of this code needs to run with more than one warp per block (indexing into shared memory using warp index), so I’d like to check that using __threadfence_block() is still sufficient to fence reads and writes for all threads of a single warp. Sadly warps diverge so I cannot use __syncthreads() in this case.)

Thanks!

Hi sjb3d, in CUDA 5.0/5.5, racecheck only recognizes CTA-wide synchronization (i.e. __syncthreads()) as ensuring ordering between reads and writes. So __threadfence_block() will typically cause racecheck to report messages about hazards.

Based on your description, it sounds like you are trying to use warp level programming. To particularly address the case of differentiation between inter warp and intra warp races, racecheck has an internal filtering mechanism built in. If you run racecheck with the “–print-level error” option, racecheck should report true inter-warp races, and the intra-warp hazard reports are suppressed.

That said __threadfence_block() should be sufficient to ensure writes to shared memory are visible to all threads in the block.

If handling of __threadfence_block() is a functionality you would like to see in the future, and you are a registered developer, could you file a bug with the words RFE in the subject along with an example application ?

Hi vyas, thanks for the response and confirming that my understanding of __threadfence_block() is correct!

With “–print-level error” as suggested (using the 5.5 RC), 0 errors are reported for my case (with 2361204 warnings suppressed). It would be useful for me to use this tool to find intra-warp races, so I’ll see if I can report a proper repro case when I have the time. Thanks!