NVCC forces c++ compilation of .cu files
I started programming Cuda a couple of months ago at my University. Our task was to optimize a video encoder and our precode was written in C. This caused a lot of trouble when including .cu files, since NVCC 2.x actually compiles these files as c++ files. c++ compilation enforces a different syntax for function symbols in object code, which resolved in linking errors of the type "undefined reference" when linking .cu object code with c compiled code (in the cases where the .c and .cu source files share headers). The situation can be remedied by putting all shared function declarations inside extern "C" {}.

While the latest NVCC documentation includes options for steering this aspect of compilation, e.g. the "--host-compilation" flag, these are deprecated with the latest NVCC version. In my current situation I am developing on an open source project written entirely in C, and while I am able to compile it successfully the c++ compiler is a lot stricter than the c compiler and generates errors in the native code.

Therefore - is it still possible to generate c compiled host code with the latest version of NVCC?
I started programming Cuda a couple of months ago at my University. Our task was to optimize a video encoder and our precode was written in C. This caused a lot of trouble when including .cu files, since NVCC 2.x actually compiles these files as c++ files. c++ compilation enforces a different syntax for function symbols in object code, which resolved in linking errors of the type "undefined reference" when linking .cu object code with c compiled code (in the cases where the .c and .cu source files share headers). The situation can be remedied by putting all shared function declarations inside extern "C" {}.



While the latest NVCC documentation includes options for steering this aspect of compilation, e.g. the "--host-compilation" flag, these are deprecated with the latest NVCC version. In my current situation I am developing on an open source project written entirely in C, and while I am able to compile it successfully the c++ compiler is a lot stricter than the c compiler and generates errors in the native code.



Therefore - is it still possible to generate c compiled host code with the latest version of NVCC?

#1
Posted 11/29/2011 08:52 AM   
The CUDA runtime interface makes extensive use of C++ features, so it is not possible to compile code written for the runtime API (as opposed to the driver API) with a C compiler. Even the [font="Courier New"]--host-compilation C[/font] compiler option didn't change to the C compiler, it just adjusted a few settings so that the C++ compiler handled code more C-like. The option probably is deprecated for this reason: It doesn't actually do what it pretends to do.

I'd suggest to move all your CUDA (host and device) code to a separate file. That way all your non-CUDA code can still be compiled with the C compiler, and [font="Courier New"]extern "C"[/font] declarations in the CUDA part can be used to get the interface right (and well-defined).
The CUDA runtime interface makes extensive use of C++ features, so it is not possible to compile code written for the runtime API (as opposed to the driver API) with a C compiler. Even the --host-compilation C compiler option didn't change to the C compiler, it just adjusted a few settings so that the C++ compiler handled code more C-like. The option probably is deprecated for this reason: It doesn't actually do what it pretends to do.



I'd suggest to move all your CUDA (host and device) code to a separate file. That way all your non-CUDA code can still be compiled with the C compiler, and extern "C" declarations in the CUDA part can be used to get the interface right (and well-defined).

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#2
Posted 11/29/2011 11:00 AM   
[quote name='tera' date='29 November 2011 - 08:00 AM' timestamp='1322564425' post='1334607']
I'd suggest to move all your CUDA (host and device) code to a separate file. That way all your non-CUDA code can still be compiled with the C compiler, and [font="Courier New"]extern "C"[/font] declarations in the CUDA part can be used to get the interface right (and well-defined).
[/quote]

I was facing the same problem adapting a C project to use CUDA, just like [b]memstick[/b] reported. I've been looking for a proper solution and this suggestion by [b]tera[/b] did the trick for me. I've just put [font="Courier New"]extern "C"[/font] before the function definition in my [font="Courier New"]cudaWrapper.cu[/font] file and the linking part among C and CUDA object files worked fine.
To be more specific, I did this in my C file:
[code]// File: calc.c
extern int func_B( int x, int y);

int func_A()
{
// code, code, then
c = func_B( a, b );

// more code, then
return result;
}
[/code]

And this, in my CUDA file:
[code]// File: cudaWrapper.cu
extern "C" int func_B( int x, int y )
{
// C and CUDA code, then
return result;
}
[/code]

So, no more annoying linking messages about "undefined references" to [font="Courier New"]func_B[/font] defined in the CUDA file.


Thank you very much. To both of you.
[quote name='tera' date='29 November 2011 - 08:00 AM' timestamp='1322564425' post='1334607']

I'd suggest to move all your CUDA (host and device) code to a separate file. That way all your non-CUDA code can still be compiled with the C compiler, and extern "C" declarations in the CUDA part can be used to get the interface right (and well-defined).





I was facing the same problem adapting a C project to use CUDA, just like memstick reported. I've been looking for a proper solution and this suggestion by tera did the trick for me. I've just put extern "C" before the function definition in my cudaWrapper.cu file and the linking part among C and CUDA object files worked fine.

To be more specific, I did this in my C file:

// File: calc.c

extern int func_B( int x, int y);



int func_A()

{

// code, code, then

c = func_B( a, b );



// more code, then

return result;

}




And this, in my CUDA file:

// File: cudaWrapper.cu

extern "C" int func_B( int x, int y )

{

// C and CUDA code, then

return result;

}




So, no more annoying linking messages about "undefined references" to func_B defined in the CUDA file.





Thank you very much. To both of you.

#3
Posted 11/29/2011 02:03 PM   
[quote name='emac' date='29 November 2011 - 06:03 AM' timestamp='1322575405' post='1334673']
I was facing the same problem adapting a C project to use CUDA, just like [b]memstick[/b] reported. I've been looking for a proper solution and this suggestion by [b]tera[/b] did the trick for me. I've just put [font="Courier New"]extern "C"[/font] before the function definition in my [font="Courier New"]cudaWrapper.cu[/font] file and the linking part among C and CUDA object files worked fine.
To be more specific, I did this in my C file:
[code]// File: calc.c
extern int func_B( int x, int y);

int func_A()
{
// code, code, then
c = func_B( a, b );

// more code, then
return result;
}
[/code]

And this, in my CUDA file:
[code]// File: cudaWrapper.cu
extern "C" int func_B( int x, int y )
{
// C and CUDA code, then
return result;
}
[/code]

So, no more annoying linking messages about "undefined references" to [font="Courier New"]func_B[/font] defined in the CUDA file.


Thank you very much. To both of you.
[/quote]

This a very timely thread because I just came upon a similar problem. I've inherited a lot of legacy 'C' code and I have to call some of the routines from my CUDA Kernel. The main problem that I run into is that the developers had implemented some simulated classes and inheritance and actually used the C++ reserved word 'this' as parameter names (C++ actually does this something like this internally). Anyway, nvcc when given a .cu file seems to be calling GCC as a C++ file for C++ compilation. This just happened today, because in the last few days, we had changed the makefiles to use nvcc for everything, but there were only .c files. Everything compiled, linked and ran fime. Now we are trying to add .cu to the SUFFIXES so that it will recognize those files and the exact same files now come up with complaints about 'this'. I can restrict my CUDA Kernel code to just a few files, some pure CUDA, and some with shared host/device code and that also contain host only code.

Is there a way to force nvcc to invoke the host compiler (gcc) as a 'C' only compiler and not a C++ compiler?

Thanks,
Mike
[quote name='emac' date='29 November 2011 - 06:03 AM' timestamp='1322575405' post='1334673']

I was facing the same problem adapting a C project to use CUDA, just like memstick reported. I've been looking for a proper solution and this suggestion by tera did the trick for me. I've just put extern "C" before the function definition in my cudaWrapper.cu file and the linking part among C and CUDA object files worked fine.

To be more specific, I did this in my C file:

// File: calc.c

extern int func_B( int x, int y);



int func_A()

{

// code, code, then

c = func_B( a, b );



// more code, then

return result;

}




And this, in my CUDA file:

// File: cudaWrapper.cu

extern "C" int func_B( int x, int y )

{

// C and CUDA code, then

return result;

}




So, no more annoying linking messages about "undefined references" to func_B defined in the CUDA file.





Thank you very much. To both of you.





This a very timely thread because I just came upon a similar problem. I've inherited a lot of legacy 'C' code and I have to call some of the routines from my CUDA Kernel. The main problem that I run into is that the developers had implemented some simulated classes and inheritance and actually used the C++ reserved word 'this' as parameter names (C++ actually does this something like this internally). Anyway, nvcc when given a .cu file seems to be calling GCC as a C++ file for C++ compilation. This just happened today, because in the last few days, we had changed the makefiles to use nvcc for everything, but there were only .c files. Everything compiled, linked and ran fime. Now we are trying to add .cu to the SUFFIXES so that it will recognize those files and the exact same files now come up with complaints about 'this'. I can restrict my CUDA Kernel code to just a few files, some pure CUDA, and some with shared host/device code and that also contain host only code.



Is there a way to force nvcc to invoke the host compiler (gcc) as a 'C' only compiler and not a C++ compiler?



Thanks,

Mike

#4
Posted 11/29/2011 08:35 PM   
[quote name='mdowns' date='29 November 2011 - 09:35 PM' timestamp='1322598951' post='1334877']
This a very timely thread because I just came upon a similar problem. I've inherited a lot of legacy 'C' code and I have to call some of the routines from my CUDA Kernel. The main problem that I run into is that the developers had implemented some simulated classes and inheritance and actually used the C++ reserved word 'this' as parameter names (C++ actually does this something like this internally). Anyway, nvcc when given a .cu file seems to be calling GCC as a C++ file for C++ compilation. This just happened today, because in the last few days, we had changed the makefiles to use nvcc for everything, but there were only .c files. Everything compiled, linked and ran fime. Now we are trying to add .cu to the SUFFIXES so that it will recognize those files and the exact same files now come up with complaints about 'this'. I can restrict my CUDA Kernel code to just a few files, some pure CUDA, and some with shared host/device code and that also contain host only code.

Is there a way to force nvcc to invoke the host compiler (gcc) as a 'C' only compiler and not a C++ compiler?

Thanks,
Mike
[/quote]

Hi Mike,

I do not think it is possible to compile .cu files as a pure .c file with NVCC; I've been spending a lot of time trying to do it without success. In the end the solution was to wrap all project header includes within the .cu files in extern "C"{ <insert your headers here> }. As tera mentioned above, a trick if you're stuck in a big C project is to wrap all your cuda code into a separate .cu file (no host code) compiling that (and only that file) with nvcc. The rest you can compile with gcc as normal. Just remember this extern "C" symbol around your program's header files in the .cu file. Else, as I mentioned in my first post, the functions declared there will get compiled with c++ symbol semantics (not syntax, sorry). In the final stage the linker will then not be able to link the common C and C++ functions because of different symbol names.
[quote name='mdowns' date='29 November 2011 - 09:35 PM' timestamp='1322598951' post='1334877']

This a very timely thread because I just came upon a similar problem. I've inherited a lot of legacy 'C' code and I have to call some of the routines from my CUDA Kernel. The main problem that I run into is that the developers had implemented some simulated classes and inheritance and actually used the C++ reserved word 'this' as parameter names (C++ actually does this something like this internally). Anyway, nvcc when given a .cu file seems to be calling GCC as a C++ file for C++ compilation. This just happened today, because in the last few days, we had changed the makefiles to use nvcc for everything, but there were only .c files. Everything compiled, linked and ran fime. Now we are trying to add .cu to the SUFFIXES so that it will recognize those files and the exact same files now come up with complaints about 'this'. I can restrict my CUDA Kernel code to just a few files, some pure CUDA, and some with shared host/device code and that also contain host only code.



Is there a way to force nvcc to invoke the host compiler (gcc) as a 'C' only compiler and not a C++ compiler?



Thanks,

Mike





Hi Mike,



I do not think it is possible to compile .cu files as a pure .c file with NVCC; I've been spending a lot of time trying to do it without success. In the end the solution was to wrap all project header includes within the .cu files in extern "C"{ <insert your headers here> }. As tera mentioned above, a trick if you're stuck in a big C project is to wrap all your cuda code into a separate .cu file (no host code) compiling that (and only that file) with nvcc. The rest you can compile with gcc as normal. Just remember this extern "C" symbol around your program's header files in the .cu file. Else, as I mentioned in my first post, the functions declared there will get compiled with c++ symbol semantics (not syntax, sorry). In the final stage the linker will then not be able to link the common C and C++ functions because of different symbol names.

#5
Posted 11/29/2011 10:11 PM   
[quote name='memstick' date='29 November 2011 - 02:11 PM' timestamp='1322604677' post='1334937']
Hi Mike,

I do not think it is possible to compile .cu files as a pure .c file with NVCC; I've been spending a lot of time trying to do it without success. In the end the solution was to wrap all project header includes within the .cu files in extern "C"{ <insert your headers here> }. As tera mentioned above, a trick if you're stuck in a big C project is to wrap all your cuda code into a separate .cu file (no host code) compiling that (and only that file) with nvcc. The rest you can compile with gcc as normal. Just remember this extern "C" symbol around your program's header files in the .cu file. Else, as I mentioned in my first post, the functions declared there will get compiled with c++ symbol semantics (not syntax, sorry). In the final stage the linker will then not be able to link the common C and C++ functions because of different symbol names.
[/quote]

Memstick, Thanks. I'll try that and if it doesn't work completely, then I guess I'll go through to make sure that everything can be compiled with a C++ compiler.
[quote name='memstick' date='29 November 2011 - 02:11 PM' timestamp='1322604677' post='1334937']

Hi Mike,



I do not think it is possible to compile .cu files as a pure .c file with NVCC; I've been spending a lot of time trying to do it without success. In the end the solution was to wrap all project header includes within the .cu files in extern "C"{ <insert your headers here> }. As tera mentioned above, a trick if you're stuck in a big C project is to wrap all your cuda code into a separate .cu file (no host code) compiling that (and only that file) with nvcc. The rest you can compile with gcc as normal. Just remember this extern "C" symbol around your program's header files in the .cu file. Else, as I mentioned in my first post, the functions declared there will get compiled with c++ symbol semantics (not syntax, sorry). In the final stage the linker will then not be able to link the common C and C++ functions because of different symbol names.





Memstick, Thanks. I'll try that and if it doesn't work completely, then I guess I'll go through to make sure that everything can be compiled with a C++ compiler.

#6
Posted 11/29/2011 11:36 PM   
[quote name='mdowns' date='29 November 2011 - 04:36 PM' timestamp='1322609782' post='1334977']
Memstick, Thanks. I'll try that and if it doesn't work completely, then I guess I'll go through to make sure that everything can be compiled with a C++ compiler.
[/quote]

Another solution, which I think would be a lot easier, is to simply set a couple of make rules so that .c files are compiled by gcc, and .cu files by nvcc. E.g.

[CODE]
.c.o:
gcc $(CFLAGS) -c $>

.cu,o:
nvcc $(NVFLAGS) -c $>
[/CODE]

Then link everything with "nvcc -o $(OBJS)...". Works for me, anyway.
[quote name='mdowns' date='29 November 2011 - 04:36 PM' timestamp='1322609782' post='1334977']

Memstick, Thanks. I'll try that and if it doesn't work completely, then I guess I'll go through to make sure that everything can be compiled with a C++ compiler.





Another solution, which I think would be a lot easier, is to simply set a couple of make rules so that .c files are compiled by gcc, and .cu files by nvcc. E.g.





.c.o:

gcc $(CFLAGS) -c $>



.cu,o:

nvcc $(NVFLAGS) -c $>




Then link everything with "nvcc -o $(OBJS)...". Works for me, anyway.

#7
Posted 12/02/2011 05:42 PM   
[quote name='jamesqf' date='02 December 2011 - 06:42 PM' timestamp='1322847777' post='1336528']
Another solution, which I think would be a lot easier, is to simply set a couple of make rules so that .c files are compiled by gcc, and .cu files by nvcc. E.g.

[CODE]
.c.o:
gcc $(CFLAGS) -c $>

.cu,o:
nvcc $(NVFLAGS) -c $>
[/CODE]

Then link everything with "nvcc -o $(OBJS)...". Works for me, anyway.
[/quote]

Yeah, this is what I did. However the problem arises when you compile some kind of shared code in your .cu file with both nvcc and gcc (via your headers). In those cases you're likely to get some linker issues with unresolved references.
[quote name='jamesqf' date='02 December 2011 - 06:42 PM' timestamp='1322847777' post='1336528']

Another solution, which I think would be a lot easier, is to simply set a couple of make rules so that .c files are compiled by gcc, and .cu files by nvcc. E.g.





.c.o:

gcc $(CFLAGS) -c $>



.cu,o:

nvcc $(NVFLAGS) -c $>




Then link everything with "nvcc -o $(OBJS)...". Works for me, anyway.





Yeah, this is what I did. However the problem arises when you compile some kind of shared code in your .cu file with both nvcc and gcc (via your headers). In those cases you're likely to get some linker issues with unresolved references.

#8
Posted 12/05/2011 02:22 PM   
[quote name='memstick' date='05 December 2011 - 06:22 AM' timestamp='1323094930' post='1337862']
Yeah, this is what I did. However the problem arises when you compile some kind of shared code in your .cu file with both nvcc and gcc (via your headers). In those cases you're likely to get some linker issues with unresolved references.
[/quote]
Well, I've made all the code compile with nvcc/g++ and now I've finally started adding the Cuda Kernel code. ALL of the cuda kernel code will be shared with the host, so I added __host__ __device__ to all the shared functions in the .cu file and in the .h file. Now, when I compile I get something like this:

source='../../../src/mpeg/video.c' object='video.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
nvcc -DHAVE_CONFIG_H -I. -I../../../src/mpeg -I../.. -I../../../src/support -O3 -g -arch=compute_20 -D__CUDA_ARCH__=200 -DMSI_IS_LINUX --compiler-options -Wno-format --compiler-options -Wimplicit-int --compiler-options -Wparentheses --compiler-options -Wreturn-type --compiler-options -Wsequence-point --compiler-options -Wswitch -DCUDA_PROCESSING --compiler-options -march=core2 --compiler-options -m64 --compiler-options -msse2 --compiler-options -mcx16 --compiler-options -msahf --compiler-options -pipe -D_DEBUG_ -D_DEBUG_PRINT_STDERR_NO_ -D_ARCH64_ -c ../../../src/mpeg/video.c
../../../src/mpeg/video.h:695: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'
../../../src/mpeg/video.h:702: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'
make: *** [msiVideoMp4.o] Error 1

One of the prototypes from the .h file looks like this:

__host__ __device__ status_e video_decodeIFrameMacroblocks(uint8 *buff, ...);

So, it doesn't like the __host__ __device__ directives in front of the prototypes.


There is NO way that I can separate out cuda code and put it in a separate .cu file as was suggested earlier in this thread. As I said, ALL functions which will be in the kernel will be shared with the host (except of course for the __global__ kernelMain(...);

Any ideas what is wrong? Is this a Linux problem?

BTW, I can build and run the entire system when built with nvcc, so there are no 'C' only syntax or semantics hanging around. It just doesn't like __host__ __device__ in the header files.

Thanks in advance,
Mike
[quote name='memstick' date='05 December 2011 - 06:22 AM' timestamp='1323094930' post='1337862']

Yeah, this is what I did. However the problem arises when you compile some kind of shared code in your .cu file with both nvcc and gcc (via your headers). In those cases you're likely to get some linker issues with unresolved references.



Well, I've made all the code compile with nvcc/g++ and now I've finally started adding the Cuda Kernel code. ALL of the cuda kernel code will be shared with the host, so I added __host__ __device__ to all the shared functions in the .cu file and in the .h file. Now, when I compile I get something like this:



source='../../../src/mpeg/video.c' object='video.o' libtool=no \

DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \

nvcc -DHAVE_CONFIG_H -I. -I../../../src/mpeg -I../.. -I../../../src/support -O3 -g -arch=compute_20 -D__CUDA_ARCH__=200 -DMSI_IS_LINUX --compiler-options -Wno-format --compiler-options -Wimplicit-int --compiler-options -Wparentheses --compiler-options -Wreturn-type --compiler-options -Wsequence-point --compiler-options -Wswitch -DCUDA_PROCESSING --compiler-options -march=core2 --compiler-options -m64 --compiler-options -msse2 --compiler-options -mcx16 --compiler-options -msahf --compiler-options -pipe -D_DEBUG_ -D_DEBUG_PRINT_STDERR_NO_ -D_ARCH64_ -c ../../../src/mpeg/video.c

../../../src/mpeg/video.h:695: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'

../../../src/mpeg/video.h:702: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'

make: *** [msiVideoMp4.o] Error 1



One of the prototypes from the .h file looks like this:



__host__ __device__ status_e video_decodeIFrameMacroblocks(uint8 *buff, ...);



So, it doesn't like the __host__ __device__ directives in front of the prototypes.





There is NO way that I can separate out cuda code and put it in a separate .cu file as was suggested earlier in this thread. As I said, ALL functions which will be in the kernel will be shared with the host (except of course for the __global__ kernelMain(...);



Any ideas what is wrong? Is this a Linux problem?



BTW, I can build and run the entire system when built with nvcc, so there are no 'C' only syntax or semantics hanging around. It just doesn't like __host__ __device__ in the header files.



Thanks in advance,

Mike

#9
Posted 12/10/2011 12:08 AM   
[quote name='mdowns' date='09 December 2011 - 05:08 PM' timestamp='1323475697' post='1340190']
../../../src/mpeg/video.h:695: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'
../../../src/mpeg/video.h:702: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'
[/quote]

When I've seen errors like this, it's been an include file problem that somehow kept the compiler from seeing the CUDA include file that handles defining __host__ &c. Do you have your .h file protected against being included multiple times with an #ifdef? Last time it happened to me, I'd somehow managed to use the same #ifdef NAME in two files...
[quote name='mdowns' date='09 December 2011 - 05:08 PM' timestamp='1323475697' post='1340190']

../../../src/mpeg/video.h:695: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'

../../../src/mpeg/video.h:702: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__device__'





When I've seen errors like this, it's been an include file problem that somehow kept the compiler from seeing the CUDA include file that handles defining __host__ &c. Do you have your .h file protected against being included multiple times with an #ifdef? Last time it happened to me, I'd somehow managed to use the same #ifdef NAME in two files...

#10
Posted 12/10/2011 03:23 AM   
All our header files are protected using the #ifndef method. Besides, wouldn't I have seen the same problem before I started the modifications for Cuda? All I have to do is change two files from .c to .cu. The one files has my __global__ void kernelMain() and the other file has been around for years and contains 2 existing functions which are called from my kernelMain. That and I add the __host__ __device__ in front of the function signatures. The system builds with no errors and run perfectly until I make these small changes.
All our header files are protected using the #ifndef method. Besides, wouldn't I have seen the same problem before I started the modifications for Cuda? All I have to do is change two files from .c to .cu. The one files has my __global__ void kernelMain() and the other file has been around for years and contains 2 existing functions which are called from my kernelMain. That and I add the __host__ __device__ in front of the function signatures. The system builds with no errors and run perfectly until I make these small changes.

#11
Posted 12/10/2011 02:23 PM   
I just wrote a real simple test from scratch, but in the same Linux/compiler environment. I have a cudaUtils.h and .cu. They contain one simple function that just adds two integer input params and returns the result. The cuda kernel just calls this function. The 'add' function has the __device__ directive at the beginning of the line/signature.

Here's the entire output from the compilation/linking:

nvcc -v -O3 -g -arch=compute_20 -D__CUDA_ARCH__=200 -DMSI_IS_LINUX --compiler-options -Wno-format --compiler-options -Wparentheses --compiler-options -Wreturn-type --compiler-options -Wsequence-point --compiler-options -Wswitch -DCUDA_PROCESSING --compiler-options -march=core2 --compiler-options -m64 --compiler-options -msse2 --compiler-options -mcx16 --compiler-options -msahf --compiler-options -pipe -D_DEBUG_ -D_DEBUG_PRINT_STDERR_NO_ -D_ARCH64_ -I../../../src/support -I../../../src/alerts -I../../../tools/libTools -I../../../src -I../../../src/outputs -I../../../src/inputs -I../../../src/tables -I../../../src/stats -I../../../src/mpeg -DHAVE_CONFIG_H -I. -c ../../../tools/tests/cudaMain.cu
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/opt/cuda/bin
#$ _THERE_=/opt/cuda/bin
#$ _TARGET_SIZE_=64
#$ TOP=/opt/cuda/bin/..
#$ LD_LIBRARY_PATH=/opt/cuda/bin/../lib:/opt/cuda/bin/../extools/lib:
#$ PATH=/opt/cuda/bin/../open64/bin:/opt/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.2:/opt/cuda/bin:/var/qmail/bin
#$ INCLUDES="-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart"
#$ LIBRARIES= "-L/opt/cuda/bin/../lib64" -lcudart
#$ CUDAFE_FLAGS=
#$ OPENCC_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -D__CUDA_ARCH__=200 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -D"__CUDA_ARCH__=200" -D"MSI_IS_LINUX" -D"CUDA_PROCESSING" -D"_DEBUG_" -D"_DEBUG_PRINT_STDERR_NO_" -D"_ARCH64_" -D"HAVE_CONFIG_H" -include "cuda_runtime.h" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-4_cudaMain.cpp1.ii" "../../../tools/tests/cudaMain.cu"
#$ cudafe --m64 --gnu_version=40302 -tused --no_remove_unneeded_entities --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.c" --stub_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.gpu" --include_file_name "/tmp/tmpxft_00000b63_00000000-3_cudaMain.fatbin.c" "/tmp/tmpxft_00000b63_00000000-4_cudaMain.cpp1.ii"
#$ gcc -D__CUDA_ARCH__=200 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-5_cudaMain.cpp2.i" "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.gpu"
#$ cudafe --m64 --gnu_version=40302 --c --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.c" --stub_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.gpu" --include_file_name "/tmp/tmpxft_00000b63_00000000-3_cudaMain.fatbin.c" "/tmp/tmpxft_00000b63_00000000-5_cudaMain.cpp2.i"
#$ gcc -D__CUDA_ARCH__=200 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDABE__ -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i" "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.gpu"
#$ filehash -s " " "/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i" > "/tmp/tmpxft_00000b63_00000000-8_cudaMain.hash"
#$ gcc -E -x c++ -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -D"__CUDA_ARCH__=200" -D"MSI_IS_LINUX" -D"CUDA_PROCESSING" -D"_DEBUG_" -D"_DEBUG_PRINT_STDERR_NO_" -D"_ARCH64_" -D"HAVE_CONFIG_H" -include "cuda_runtime.h" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-9_cudaMain.cpp4.ii" "../../../tools/tests/cudaMain.cu"
#$ cudafe++ --m64 --gnu_version=40302 --parse_templates --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.cpp" --stub_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.stub.c" "/tmp/tmpxft_00000b63_00000000-9_cudaMain.cpp4.ii"
nvopencc -TARG:compute_20 -m64 -OPT:ftz=0 -CG:ftz=0 -CG:prec_div=1 -CG:prec_sqrt=1 "/tmp/tmpxft_00000b63_00000000-10_cudaMain" "/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i" -o "/tmp/tmpxft_00000b63_00000000-2_cudaMain.ptx"
../../../tools/tests/cudaMain.cu(28): Error: External calls are not supported (found non-inlined call to _Z14cudaUtils_plusii)
# --error 0x2 --
make: *** [cudaMain.o] Error 2

You can see that I specify -arch=compute_20, but it is acting like only inlining for device functions is supported. I'm using Cuda 4.0. If I do move the implementation to the header file and declare it as 'inline', it does compile and run correctly.

Can anybody tell me what I'm doing wrong?
Thanks,
Mike
I just wrote a real simple test from scratch, but in the same Linux/compiler environment. I have a cudaUtils.h and .cu. They contain one simple function that just adds two integer input params and returns the result. The cuda kernel just calls this function. The 'add' function has the __device__ directive at the beginning of the line/signature.



Here's the entire output from the compilation/linking:



nvcc -v -O3 -g -arch=compute_20 -D__CUDA_ARCH__=200 -DMSI_IS_LINUX --compiler-options -Wno-format --compiler-options -Wparentheses --compiler-options -Wreturn-type --compiler-options -Wsequence-point --compiler-options -Wswitch -DCUDA_PROCESSING --compiler-options -march=core2 --compiler-options -m64 --compiler-options -msse2 --compiler-options -mcx16 --compiler-options -msahf --compiler-options -pipe -D_DEBUG_ -D_DEBUG_PRINT_STDERR_NO_ -D_ARCH64_ -I../../../src/support -I../../../src/alerts -I../../../tools/libTools -I../../../src -I../../../src/outputs -I../../../src/inputs -I../../../src/tables -I../../../src/stats -I../../../src/mpeg -DHAVE_CONFIG_H -I. -c ../../../tools/tests/cudaMain.cu

#$ _SPACE_=

#$ _CUDART_=cudart

#$ _HERE_=/opt/cuda/bin

#$ _THERE_=/opt/cuda/bin

#$ _TARGET_SIZE_=64

#$ TOP=/opt/cuda/bin/..

#$ LD_LIBRARY_PATH=/opt/cuda/bin/../lib:/opt/cuda/bin/../extools/lib:

#$ PATH=/opt/cuda/bin/../open64/bin:/opt/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.2:/opt/cuda/bin:/var/qmail/bin

#$ INCLUDES="-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart"

#$ LIBRARIES= "-L/opt/cuda/bin/../lib64" -lcudart

#$ CUDAFE_FLAGS=

#$ OPENCC_FLAGS=

#$ PTXAS_FLAGS=

#$ gcc -D__CUDA_ARCH__=200 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -D"__CUDA_ARCH__=200" -D"MSI_IS_LINUX" -D"CUDA_PROCESSING" -D"_DEBUG_" -D"_DEBUG_PRINT_STDERR_NO_" -D"_ARCH64_" -D"HAVE_CONFIG_H" -include "cuda_runtime.h" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-4_cudaMain.cpp1.ii" "../../../tools/tests/cudaMain.cu"

#$ cudafe --m64 --gnu_version=40302 -tused --no_remove_unneeded_entities --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.c" --stub_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.gpu" --include_file_name "/tmp/tmpxft_00000b63_00000000-3_cudaMain.fatbin.c" "/tmp/tmpxft_00000b63_00000000-4_cudaMain.cpp1.ii"

#$ gcc -D__CUDA_ARCH__=200 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-5_cudaMain.cpp2.i" "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.gpu"

#$ cudafe --m64 --gnu_version=40302 --c --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.c" --stub_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.gpu" --include_file_name "/tmp/tmpxft_00000b63_00000000-3_cudaMain.fatbin.c" "/tmp/tmpxft_00000b63_00000000-5_cudaMain.cpp2.i"

#$ gcc -D__CUDA_ARCH__=200 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDABE__ -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i" "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.gpu"

#$ filehash -s " " "/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i" > "/tmp/tmpxft_00000b63_00000000-8_cudaMain.hash"

#$ gcc -E -x c++ -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -I"../../../src/support" -I"../../../src/alerts" -I"../../../tools/libTools" -I"../../../src" -I"../../../src/outputs" -I"../../../src/inputs" -I"../../../src/tables" -I"../../../src/stats" -I"../../../src/mpeg" -I"." "-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" -D"__CUDA_ARCH__=200" -D"MSI_IS_LINUX" -D"CUDA_PROCESSING" -D"_DEBUG_" -D"_DEBUG_PRINT_STDERR_NO_" -D"_ARCH64_" -D"HAVE_CONFIG_H" -include "cuda_runtime.h" -m64 -g -o "/tmp/tmpxft_00000b63_00000000-9_cudaMain.cpp4.ii" "../../../tools/tests/cudaMain.cu"

#$ cudafe++ --m64 --gnu_version=40302 --parse_templates --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.cpp" --stub_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.stub.c" "/tmp/tmpxft_00000b63_00000000-9_cudaMain.cpp4.ii"

nvopencc -TARG:compute_20 -m64 -OPT:ftz=0 -CG:ftz=0 -CG:prec_div=1 -CG:prec_sqrt=1 "/tmp/tmpxft_00000b63_00000000-10_cudaMain" "/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i" -o "/tmp/tmpxft_00000b63_00000000-2_cudaMain.ptx"

../../../tools/tests/cudaMain.cu(28): Error: External calls are not supported (found non-inlined call to _Z14cudaUtils_plusii)

# --error 0x2 --

make: *** [cudaMain.o] Error 2



You can see that I specify -arch=compute_20, but it is acting like only inlining for device functions is supported. I'm using Cuda 4.0. If I do move the implementation to the header file and declare it as 'inline', it does compile and run correctly.



Can anybody tell me what I'm doing wrong?

Thanks,

Mike

#12
Posted 12/11/2011 01:31 AM   
Scroll To Top