__constant__ and __device__ memory access

Hi all,

I’m studying CUDA for my thesis and I have a little problem with the way to access constant and device memory from the host.

Basically, my problem is to well understand what is the difference between these two functions:

    [*]

    cudaError_t cudaMemcpyFromSymbol(void ∗ dst, const char ∗ symbol, size_t count, size_t offset = 0, enum cudaMemcpyKind kind = cudaMemcpyDeviceToHost)
    

    [*]

    cudaError_t cudaGetSymbolAddress(void ∗∗ devPtr, const char ∗ symbol)
    

Suppose we have some variable in constant or global memory, declared with constant and/or device qualficators. Reading the programming guide, we can find that these variable are

If I’m not wrong I’ve understand that, since these variable are on device, host can’t use them directly, but it has to transfer data from device memory to host memory.

First function does exactly what I expect: copies count byte from symbol to dst. If symbol is a device or constant variable and dst is a pointer to host memory space, in dst I have a copy of symbol and I can read and manage the data.

Second function seems to do the same thing, masking the process of copy of the data. I have only to specify the symbol and the pointer in host memory space, and all work like previous functions.

So, why we have two distinct functions that do the same thing? When I have to use the first, and when the second?

Thanks a lot!

cudaGetSymbolAddress() just gives you the address of the variable, it does not copy any data. cudaMemcpyFromSymbol() corresponds to cudaGetSymbolAddress() followed by cudaMemcpy() from the address returned.

Ok, so the address returned is in device address space right? And what can I do with this? I can’t read content of the variable through this address right?

Can someone give me an example of usage of cudaGetSymbolAddress?

Thanks.

You cannot dereference a device pointer on the host. But you can use cudaMemcpy to copy memory to/from the pointer.

__device__ float g_d_data;

.........

float *d_data;

cudaGetSymbolAddress(&d_data, "g_d_data");

cudaMemcpy(d_data, h_data, cudaMemcpyDefault);

Generally, for device data it is prefereable to use cudaMalloc to dynamically allocate the pointer. That way you can control when memory is allocated and freed and your code is more reusable because isn’t tied to specific global variables. There’s no way to dynamically allocate constant memory, though. However, one can argue that with the L1/L2 cache hierarchy, constant memory is rarely needed anymore.

Ok thanks. But I can’t figure out a scenario where is a good thing the use of cudaGetSymbolAddress…