The message “CUDA_ARCH is undefined.” is emitted by [font=“Courier New”]cudafe++[/font] (the program that splits host and device code), not by the host compiler. In order to figure out where kernels and device functions end, it needs to completely parse the device routines even when it extracts the host code. And CUDA_ARCH is indeed undefined in host code.
The solution is simple, even though it may look silly first: Just protect your printf statement with a [font=“Courier New”]#ifdefCUDA_ARCH[/font]. It won’t change the generated code, but it allows parsing of the host code (where this statement will be dropped at a later stage later anyway) to proceed.
As far as I understand the compilation process, tera’s explanation is right on the money. As an addendum, one reason CUDA_ARCH is undefined in host code is because for fatbinary compilation targeting multiple device architectures, host code is only compiled once, so it can’t be associated with any particular CUDA architecture.
The recommended way to check for the CUDA architecture in device code is something like this:
In general CUDA architecture versions follow an onion-layer model, so the use of architectural features is usually best guarded by >= comparisons against CUDA_ARCH.
I also downloaded the latest version of the driver and toolkit, and it still works only that way. It makes me curious what DrAnderson42 meant by having “tested it,” but in any case, it works now.
Equally off topic, to avoid misunderstandings: “right on the money” is an idiom meaning “exactly right”. I realize it might be best to avoid the use of potentially confusing idioms when writing in a forum with international audience.