CUDA architecture Macro
Hi. I would like to use some of the architecture specifying macro, which I believe is "__CUDA_ARCH__". I was trying the following code, but the macro appears not to be defined. How can I get this to work?

[code]
#if __CUDA_ARCH__ < 200
// cache-shared memory configuring not possible, no cache
printf("Cache configuration not possible\n");
#else
cudaError_t errorCachePref1 = cudaFuncSetCacheConfig("...", cudaFuncCachePreferShared);
cudaError_t errorCachePref2 = cudaFuncSetCacheConfig("...", cudaFuncCachePreferShared);
printf("Cache configuration done\n");
#endif
}
[/code]

I read a few things on other forums that __CUDA_ARCH__ is not defined on the C compiler side, and is defined only on the CDA compiler side. The file I am using is a .cu file, and is supposed to be compiled with nvcc, so I am not sure what I should do to make this work.

Thanks
Hi. I would like to use some of the architecture specifying macro, which I believe is "__CUDA_ARCH__". I was trying the following code, but the macro appears not to be defined. How can I get this to work?





#if __CUDA_ARCH__ < 200

// cache-shared memory configuring not possible, no cache

printf("Cache configuration not possible\n");

#else

cudaError_t errorCachePref1 = cudaFuncSetCacheConfig("...", cudaFuncCachePreferShared);

cudaError_t errorCachePref2 = cudaFuncSetCacheConfig("...", cudaFuncCachePreferShared);

printf("Cache configuration done\n");

#endif

}




I read a few things on other forums that __CUDA_ARCH__ is not defined on the C compiler side, and is defined only on the CDA compiler side. The file I am using is a .cu file, and is supposed to be compiled with nvcc, so I am not sure what I should do to make this work.



Thanks

#1
Posted 04/27/2012 05:21 PM   
Use [url="http://developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/docs/online/group__CUDART__DEVICE_g5aa4f47938af8276f08074d09b7d520c.html#g5aa4f47938af8276f08074d09b7d520c"]cudaGetDeviceProperties()[/url] to query the compute capability at runtime. It is returned in the [font="Courier New"]major[/font] and [font="Courier New"]minor[/font] fields.

The compute capability cannot be known in host code at compile time because it is possible to compile fat binaries with object code for multiple GPU architectures, where the right one is selected at run time.
Use cudaGetDeviceProperties() to query the compute capability at runtime. It is returned in the major and minor fields.



The compute capability cannot be known in host code at compile time because it is possible to compile fat binaries with object code for multiple GPU architectures, where the right one is selected at run time.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#2
Posted 04/27/2012 06:28 PM   
Thanks I believe this is the better solution. /smile.gif' class='bbc_emoticon' alt=':smile:' />
Thanks I believe this is the better solution. /smile.gif' class='bbc_emoticon' alt=':smile:' />

#3
Posted 04/27/2012 08:04 PM   
Scroll To Top