The topic title says it all…
How is this possible? A 4KB object would be perfectly aligned at that address, let alone a measly 32-bit float. I don’t see how something could possibly be more aligned :-)
Configuration - GTX 1080 in a machine running Windows 10, Visual Studio 2013, CUDA 8.0, Nsight 5.2.0.16321.
This is a debug build of the kernel.
Here’s the output:
GPU State:
Address Size Type Mem Block Thread blockIdx threadIdx PC Source
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
708001000 4 mis atom g 1 0 {1,0,0} {0,0,0} [bla bla function name]_9f58e5bb9atomicAddEPjj+000128 c:\program files\nvidia gpu computing toolkit\cuda\v8.0\include\device_functions.hpp:1564
Summary of access violations:
c:\program files\nvidia gpu computing toolkit\cuda\v8.0\include\device_functions.hpp(1564): error MemoryChecker: #misaligned=1 #invalidAddress=0
================================================================================
Memory Checker detected 1 access violations.
error = misaligned atomic (global memory)
gridid = 59
blockIdx = {1,0,0}
threadIdx = {0,0,0}
address = 0x708001000
accessSize = 4