Suppose we have some variable in constant or global memory, declared with constant and/or device qualficators. Reading the programming guide, we can find that these variable are
If I’m not wrong I’ve understand that, since these variable are on device, host can’t use them directly, but it has to transfer data from device memory to host memory.
First function does exactly what I expect: copies count byte from symbol to dst. If symbol is a device or constant variable and dst is a pointer to host memory space, in dst I have a copy of symbol and I can read and manage the data.
Second function seems to do the same thing, masking the process of copy of the data. I have only to specify the symbol and the pointer in host memory space, and all work like previous functions.
So, why we have two distinct functions that do the same thing? When I have to use the first, and when the second?
cudaGetSymbolAddress() just gives you the address of the variable, it does not copy any data. cudaMemcpyFromSymbol() corresponds to cudaGetSymbolAddress() followed by cudaMemcpy() from the address returned.
Ok, so the address returned is in device address space right? And what can I do with this? I can’t read content of the variable through this address right?
Can someone give me an example of usage of cudaGetSymbolAddress?
Generally, for device data it is prefereable to use cudaMalloc to dynamically allocate the pointer. That way you can control when memory is allocated and freed and your code is more reusable because isn’t tied to specific global variables. There’s no way to dynamically allocate constant memory, though. However, one can argue that with the L1/L2 cache hierarchy, constant memory is rarely needed anymore.