__CUDA_ARCH__ undefined?!
What's wrong here? I'm using this line to compile:
# nvcc -gencode=arch=compute_20,code=\"sm_20,compute_20\" test.cu -o test

and I get "__CUDA_ARCH__ is undefined."

[code]
#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>

__global__ void no_op()
{
printf("%d\n", __CUDA_ARCH__);
}

int main()
{
no_op<<<1, 1>>>();
cudaThreadSynchronize();

return 0;
}

[/code]
What's wrong here? I'm using this line to compile:

# nvcc -gencode=arch=compute_20,code=\"sm_20,compute_20\" test.cu -o test



and I get "__CUDA_ARCH__ is undefined."





#include <cuda.h>

#include <stdio.h>

#include <stdlib.h>



__global__ void no_op()

{

printf("%d\n", __CUDA_ARCH__);

}



int main()

{

no_op<<<1, 1>>>();

cudaThreadSynchronize();



return 0;

}


#1
Posted 04/06/2012 03:38 PM   
Very strange. I use __CUDA_ARCH__ without any problems in a large project.
But your minimal example fails! (I tested it to make sure).
Very strange. I use __CUDA_ARCH__ without any problems in a large project.

But your minimal example fails! (I tested it to make sure).

#2
Posted 04/06/2012 04:07 PM   
[quote name='DrAnderson42' date='06 April 2012 - 09:07 AM' timestamp='1333728438' post='1392694']
Very strange. I use __CUDA_ARCH__ without any problems in a large project.
But your minimal example fails! (I tested it to make sure).
[/quote]

Gee, really?!! Maybe I should be updating my toolkit? I'm getting:
[code]
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Sun_Mar_20_16:47:09_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221
[/code]

EDIT: it's just unreal, because everything else seems to work with this compiler! For example, inline assembly.
[quote name='DrAnderson42' date='06 April 2012 - 09:07 AM' timestamp='1333728438' post='1392694']

Very strange. I use __CUDA_ARCH__ without any problems in a large project.

But your minimal example fails! (I tested it to make sure).





Gee, really?!! Maybe I should be updating my toolkit? I'm getting:



# nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2011 NVIDIA Corporation

Built on Sun_Mar_20_16:47:09_PDT_2011

Cuda compilation tools, release 4.0, V0.2.1221




EDIT: it's just unreal, because everything else seems to work with this compiler! For example, inline assembly.

#3
Posted 04/06/2012 04:14 PM   
It is _sometimes_ defined, but not in device code! If I add these three compiler directives, I get the warning message!

[code]
#if __CUDA_ARCH__ == 200
#warning "hello"
#endif

__global__ void no_op()
{
printf("%d\n", __CUDA_ARCH__);
}
[/code]
It is _sometimes_ defined, but not in device code! If I add these three compiler directives, I get the warning message!





#if __CUDA_ARCH__ == 200

#warning "hello"

#endif



__global__ void no_op()

{

printf("%d\n", __CUDA_ARCH__);

}

#4
Posted 04/06/2012 04:39 PM   
The message "__CUDA_ARCH__ is undefined." is emitted by [font="Courier New"]cudafe++[/font] (the program that splits host and device code), not by the host compiler. In order to figure out where kernels and device functions end, it needs to completely parse the device routines even when it extracts the host code. And __CUDA_ARCH__ is indeed undefined in host code.

The solution is simple, even though it may look silly first: Just protect your printf statement with a [font="Courier New"]#ifdef __CUDA_ARCH__[/font]. It won't change the generated code, but it allows parsing of the host code (where this statement will be dropped at a later stage later anyway) to proceed.
The message "__CUDA_ARCH__ is undefined." is emitted by cudafe++ (the program that splits host and device code), not by the host compiler. In order to figure out where kernels and device functions end, it needs to completely parse the device routines even when it extracts the host code. And __CUDA_ARCH__ is indeed undefined in host code.



The solution is simple, even though it may look silly first: Just protect your printf statement with a #ifdef __CUDA_ARCH__. It won't change the generated code, but it allows parsing of the host code (where this statement will be dropped at a later stage later anyway) to proceed.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#5
Posted 04/06/2012 05:43 PM   
As far as I understand the compilation process, tera's explanation is right on the money. As an addendum, one reason __CUDA_ARCH__ is undefined in host code is because for fatbinary compilation targeting multiple device architectures, host code is only compiled once, so it can't be associated with any particular CUDA architecture.

The recommended way to check for the CUDA architecture in device code is something like this:

[code]
#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 200)
[/code]
In general CUDA architecture versions follow an onion-layer model, so the use of architectural features is usually best guarded by >= comparisons against __CUDA_ARCH__.
As far as I understand the compilation process, tera's explanation is right on the money. As an addendum, one reason __CUDA_ARCH__ is undefined in host code is because for fatbinary compilation targeting multiple device architectures, host code is only compiled once, so it can't be associated with any particular CUDA architecture.



The recommended way to check for the CUDA architecture in device code is something like this:





#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 200)


In general CUDA architecture versions follow an onion-layer model, so the use of architectural features is usually best guarded by >= comparisons against __CUDA_ARCH__.

#6
Posted 04/06/2012 06:21 PM   
Yes, tera, njuffa, it works exactly as you say.

I also downloaded the latest version of the driver and toolkit, and it still works only that way. It makes me curious what DrAnderson42 meant by having "tested it," but in any case, it works now.

Many thanks. :)
Yes, tera, njuffa, it works exactly as you say.



I also downloaded the latest version of the driver and toolkit, and it still works only that way. It makes me curious what DrAnderson42 meant by having "tested it," but in any case, it works now.



Many thanks. :)

#7
Posted 04/07/2012 02:39 AM   
Thanks njuffa, RezaRob3. No Money involved though. /smile.gif' class='bbc_emoticon' alt=':smile:' /> I wonder if Nvidia would offer me a job...
Thanks njuffa, RezaRob3. No Money involved though. /smile.gif' class='bbc_emoticon' alt=':smile:' /> I wonder if Nvidia would offer me a job...

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#8
Posted 04/07/2012 07:00 AM   
[quote name='tera' date='07 April 2012 - 12:00 AM' timestamp='1333782044' post='1392954']
Thanks njuffa, RezaRob3. No Money involved though. /smile.gif' class='bbc_emoticon' alt=':smile:' /> I wonder if Nvidia would offer me a job...
[/quote]
tera, I don't know you at all personally, but have seen your name pop up here often. I learn from you even when you're not responding to me directly.
Thank you.:)

EDIT: I hope you get your wish if you love Nvidia that much!
PS: My apology for being OFFTOPIC this once.
[quote name='tera' date='07 April 2012 - 12:00 AM' timestamp='1333782044' post='1392954']

Thanks njuffa, RezaRob3. No Money involved though. /smile.gif' class='bbc_emoticon' alt=':smile:' /> I wonder if Nvidia would offer me a job...



tera, I don't know you at all personally, but have seen your name pop up here often. I learn from you even when you're not responding to me directly.

Thank you.:)



EDIT: I hope you get your wish if you love Nvidia that much!

PS: My apology for being OFFTOPIC this once.

#9
Posted 04/07/2012 08:20 AM   
Equally off topic, to avoid misunderstandings: "right on the money" is an idiom meaning "exactly right". I realize it might be best to avoid the use of potentially confusing idioms when writing in a forum with international audience.
Equally off topic, to avoid misunderstandings: "right on the money" is an idiom meaning "exactly right". I realize it might be best to avoid the use of potentially confusing idioms when writing in a forum with international audience.

#10
Posted 04/07/2012 08:22 PM   
Yes, well in my large code, all of the checks take the form of:
[code]
#if (__CUDA_ARCH__ >= 200)
[/code]

And they are are all in __global__ functions (which I thought was the only requirement). Which is why I was confused when running your test.
Yes, well in my large code, all of the checks take the form of:



#if (__CUDA_ARCH__ >= 200)




And they are are all in __global__ functions (which I thought was the only requirement). Which is why I was confused when running your test.

#11
Posted 04/09/2012 01:56 PM   
Scroll To Top