Hi everyone,
If several threads from different blocks read the same global memory. Will it occur some errors?
thanks.
PS: when I read the SDK, I can’t understand the following code in particles example.
in the file of particle_kernel.cu around line 183, the highlight line in the following code.
[codebox]global void
reorderDataAndFindCellStartD(uint2* particleHash, // particle id sorted by hash
float4* oldPos,
float4* oldVel,
float4* sortedPos,
float4* sortedVel,
uint* cellStart)
{
int index = __mul24(blockIdx.x,blockDim.x) + threadIdx.x;
uint2 sortedData = particleHash[index];
// Load hash data into shared memory so that we can look
// at neighboring particle's hash value without loading
// two hash values per thread
__shared__ uint sharedHash[257];
sharedHash[threadIdx.x+1] = sortedData.x;
if (index > 0 && threadIdx.x == 0)
{
// first thread in block must load neighbor particle hash
<b>volatile uint2 prevData = particleHash[index-1];</b>
sharedHash[0] = prevData.x;
}
__syncthreads();
if (index == 0 || sortedData.x != sharedHash[threadIdx.x])
{
cellStart[sortedData.x] = index;
}
// Now use the sorted index to reorder the pos and vel data
float4 pos = FETCH(oldPos, sortedData.y); // macro does either global read or texture fetch
float4 vel = FETCH(oldVel, sortedData.y); // see particles_kernel.cuh
sortedPos[index] = pos;
sortedVel[index] = vel;
}[/codebox]
why use the volatile qualifier? is it necessary?
thanks again.