Partial read from global memory when spin locking?

Hi everyone! I am implementing the streamScan paper, TLDR, each block (using one thread), spin locks in global memory until the address that is being spin-locked updates with the value from the previous block.

I implemented the algorithm and seems to run fine, all the tests pass all the time, although I did not stress tests it properly yet. Speaking of it with a colleague of mine, I started to wonder if is there any chance that the thread is reading from global memory might read up a partial value since might happen at the same time the other thread is reading it?

Let me clarify better the algorithm, I have an intermediate array which is sized to have the same size as number of blocks, block N, spin-lock on index N-1 of this array. This array has been initialized with a specific value, for example I am using the maximum value of a uint32. When the value changes, so is not maxInt anymore, i read that value and return it and the spin-lock ends. Is there any chance that I get a partial read if read and write happens at the same time? Anyone can elaborate a little on the matter?

I am aware of race condition when writing, so you might not get correct result, if two threads are writing at the same exact time let say the value A and the other the value B to the same address in global memory I either get A or B, or is there the chance i get a bit soup of half A and half B?

The mechanism you describe can certainly be made to work properly. You would probably want the global memory location to be marked volatile.

When multiple threads are writing or reading to/from a naturally aligned location in global memory, and the datatype being read or written is the same by all threads, and the datatype corresponds to one of the supported types for single-instruction thread access (up to 16 bytes per thread):

[url]Programming Guide :: CUDA Toolkit Documentation

then it should work. That is, any thread that reads the location should only get a value that was written to that location, not some partial value.

Thank you for getting back to me, yes the system works, I mean I am implementing a paper where the technique is highlighted so I expected it to work, I am interested in understanding well the details.

So if my understanding is correct, if some condition are respected, (memory alignment etc) that will make memory read-write into a single memory transaction and not multiple ones, I can consider the read and write of those basic data type atomic? (Not talking about increment etc, just read and write)
So unless a read or write is splitted in multiple memory transactions due to alignment a read cannot happen if there is a write happening?