from the snippet below, can anyone please tell me why the second atomicAdd raises misalignment errors? I’m on a 6.1 compute GPU, so the 64 bit atomic add should be ok. (vs2015sp3, win7)
global void why(
double *dd,
float *ff
)
{
float thingFloat = 3.0f;
atomicAdd(ff, thingFloat); // works fine
double thingDouble = 3.0f;
atomicAdd(dd, thingDouble); // raises misaligned atomic error
Also, I would always recommend adding a cudaDeviceSynchronize() after the kernel call in a test code like that.
As written, your code will allow application tear-down to begin while the kernel is still executing. This can lead to unpredictable results.
I ran your code on linux, CUDA 8.0.61 and can’t reproduce any issues. Other than the comment about cudaDeviceSynchronize(), I don’t believe there should be any issues with your code.