Build Error MSB3721 When calling object method within kernel, using compiler directives

So I have a problem with building a CUDA project.

I’m trying to parallelize a physics engine without having to massively rewrite my code, so I want to add in directives to define my factions as define CUDA_CALLABLE_MEMBER host device in order to reduce the amount of code duplication.

Basically I want my .cu files to call methods from my .h headers.

For example:
test.h

#include <iostream>

	#ifdef __CUDACC__
	#define CUDA_CALLABLE_MEMBER __host__ __device__
	#else
	#define CUDA_CALLABLE_MEMBER
	#endif

	class helloWorld
	{
	public:
		CUDA_CALLABLE_MEMBER helloWorld() {};
		CUDA_CALLABLE_MEMBER void boo();

		//__host__ __device__ helloWorld() {};
		//__host__ __device__ void boo();
	};

test.cpp

#include "test.h"

	CUDA_CALLABLE_MEMBER void helloWorld::boo()
	{

	}

test.cuh

#pragma once
	#include <cuda.h>
	#include "cuda_runtime.h"
	#include "device_launch_parameters.h"

	class test
	{
	private:
		int SIZE;
	public:
		test();
	};

test.cu

#include "test.h"
	#include "test.cuh"

	__global__ void myAddKernel(helloWorld* hw, int *c, const int *a, const int *b, int n)
	{
		int i = blockIdx.x*blockDim.x + threadIdx.x;

		if (i < n)
		{
			//hw->boo();
			c[i] = a[i] + b[i];
		
		}
	}

	test::test()
	{
		SIZE = 1024;

		helloWorld* hello = new helloWorld();

	}

The line hw->boo(); produces the following error:

Googling shows my problem seems to be different from all others. If I comment out that line it compiles fine. If I rewrite it to not bother with .h/.cpp and just use .cuh/.cu then it also compiles and works.

But I would very much like to use regular C++ h/cpp files for time saving reasons.

Additionally, I found the error goes away when:

If Boo() is defined inline in my .h file as

CUDA_CALLABLE_MEMBER void boo() { };

Then it compiles.

But defining it ‘properly’

CUDA_CALLABLE_MEMBER void helloWorld::boo()
{

}

And I get the same error?

Any help would be appreciated. :)

A .cpp file is by default delivered directly to the host compiler.

The host compiler doesn’t understand the host device syntax.

If you have a properly configured CUDA project in MS VS, you can also use the -x cu switch to cause your .cpp file to be treated as a .cu file.

It’s not clear what you want or what your objection is to naming the file with a .cu extension.

Likewise, if you annotate a .h file with host device, and then include that in a .cpp file, it won’t compile.

If you use a method like this:

http://stackoverflow.com/questions/32014839/how-to-use-a-cuda-class-header-file-in-both-cpp-and-cuda-modules

then you can put host device in a header file, and have it be usable in “ordinary” .cpp files (with no CUDA content) as well as .cu files.

I’m a little confused, in the link you gave me, it uses the EXACT method I am using and it gives that error, you did see that right?

My objection to using the .cu extension is that I don’t per se want to permanently turn my project into a cuda project (Which doesn’t work with VS2015), but only temporarily and ignore CUDA when I don’t want to use CUDA.

I’ll try out -x cu when I’m home though. :)

Yes, sorry, my previous comment was off-base.

  1. Your error output from VS is less than helpful. VS errors consist of 2 parts in the CUDA toolchain: an actual error output from the cuda tool (nvcc in this case) as well as a follow-up error message from Visual Studio executive stating that the subtool exited with an error. You have provided the 2nd part, but not the first part. If the first part is not showing up in your VS console output, please modify the VS settings to provide more verbose output.

  2. This line of code is a function call:

hw->boo();

Since this function call is originating from device code, you have 2 options: A. Use ordinary compilation (which is how your project is set up as indicated by the --compile switch), and provide all necessary device code in a single compilation unit. You are currently compiling this way, but since the actual boo function is in another compilation unit, it won’t work. This is the proximal reason for the error (which would be more evident with the verbose output.) or B. Use relocatable device code compilation with device linking, in which case you can call and link device code in one compilation unit from/to device code in another compilation unit (which is currently the way your code is structured, but your project is not set up to perform this type of compilation).

To enable the correct mode for compilation (called CUDA separate compilation) look for the “Generate Relocatable Device Code” option in VS project CUDA properties. If you’re still having trouble, please enable the verbose output from VS.

The only new error is:

But that’s from the relocatable code I imagine. Turning on verbosity to both Detailed and Diagnostic has not shown any additional error messages.

That is the error I would expect. You can make this error go away by compiling a relocatable code project instead of your current project setup.

How would I go about setting this up? This is the first I’m hearing of it.

Please re-read my comment #4 in this thread. I give a basic description of how to modify a VS project to set this up. For a more complete description, I would search for a Visual Studio sample cuda project that uses relocatable device code with separate compilation and linking, such as the simpleSeparateCompilation sample project, and refer to that for typical project settings:

[url]http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-static-gpu-device-library[/url]

You can also read the nvcc manual section that pertains to separate compilation and linking:

[url]NVCC :: CUDA Toolkit Documentation

And there are plenty of questions and answers about it on various web forums if you want to search for those.

You mean this?

I clearly already said I tried this.

The example in the second link doesn’t seem to match how my code is organized. How or which commands I am supposed to use to achieve the result I wish also are not clear.

Suppose I have a regular C++ header file with my method definitions and then the implementations in my CPP file, I want a separate .cu file and kernel to be able to call methods from that object.

What you linked seems to have the definitions in a .h file, but the implementations nevertheless still in the .cu, which I know already works, but I dislike having to do it that way.

Here’s what worked for me in VS2013, CUDA 7.0, win7 x64.

  1. Create a new CUDA project.
  2. Project…Properties…CUDA C/C++… modify the project type to x64 and set generate relocatable device code to Yes.
  3. Set the active project to x64 Release
  4. Add your files to the project. Since the project creation in step 1 creates a default kernel.cu file, I just replaced the default code in kernel.cu with the code from your test.cu into that file and then added your other 3 files. (and un-comment the troublesome line of code you have commented out.)
  5. If you use the ordinary method to add test.cpp to the project, then after adding it you will need to change its type from an ordinary C/C++ file to a CUDA C/C++ module. Right click on the file in the solution explorer window, go to Configuration Properties…General and change the Item Type from C++ to CUDA C/C++ Then go to CUDA C/C++ propeties on this file and add -x cu to Command Line…Additional Options field.
  6. For the file properties for both kernel.cu (your test.cu file) and test.cpp, right click on the file in the solution explorer window, and change the CUDA C/C++…Common…Generate Relocatable Device Code entry to Yes.

That should be everything needed setup-wise. However the files you have shown are not a complete project, so I chose to add the following line:

int main() {}

at the end of your test.cpp file. With that I was able to successfully compile the project with no errors. Here is the console output from the compile command:

1>------ Rebuild All started: Project: test10, Configuration: Release x64 ------
1>  
1>  c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile      -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -o x64\Release\kernel.cu.obj "c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10\kernel.cu" -clean 
1>  kernel.cu
1>  
1>  c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile  -x cu     -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -o x64\Release\test.cpp.obj "c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10\test.cpp" -clean 
1>  test.cpp
1>  Compiling CUDA source file kernel.cu...
1>  
1>  c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64" -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile -cudart static     -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -o x64\Release\kernel.cu.obj "c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10\kernel.cu" 
1>  kernel.cu
1>  Compiling CUDA source file test.cpp...
1>  
1>  c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64" -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile -cudart static -x cu     -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -o x64\Release\test.cpp.obj "c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10\test.cpp" 
1>  test.cpp
1>  
1>  c:\Users\robertc\documents\visual studio 2013\Projects\test10\test10>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\nvcc.exe" -dlink -o x64\Release\test10.device-link.obj -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\lib\x64" cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  -gencode=arch=compute_20,code=sm_20  --machine 64 x64\Release\kernel.cu.obj x64\Release\test.cpp.obj 
1>  cudart.lib
1>  kernel32.lib
1>  user32.lib
1>  gdi32.lib
1>  winspool.lib
1>  comdlg32.lib
1>  advapi32.lib
1>  shell32.lib
1>  ole32.lib
1>  oleaut32.lib
1>  uuid.lib
1>  odbc32.lib
1>  odbccp32.lib
1>  kernel.cu.obj
1>  test.cpp.obj
1>  LINK : /LTCG specified but no code generation required; remove /LTCG from the link command line to improve linker performance
1>  test10.vcxproj -> c:\Users\robertc\documents\visual studio 2013\Projects\test10\x64\Release\test10.exe
1>  copy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\cudart*.dll" "c:\Users\robertc\documents\visual studio 2013\Projects\test10\x64\Release\"
1>  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\cudart32_70.dll
1>  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0\bin\cudart64_70.dll
1>          2 file(s) copied.
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========

Just to re-emphasize a point that I made earlier:

You have a device function that you are calling from CUDA device code (in test.cu) in a separate compilation unit (test.cpp). The only way this will work is if this compilation unit (test.cpp) is handled properly by the nvcc compiler. The default behavior for nvcc is to pass a file with a .cpp extension directly to the host compiler. The host compiler will not generate the proper device code (and your CUDA_CALLABLE_MEMBER macro will evaluate to whitespace). There are at least 2 ways to address this. You can either change the file extension to .cu, but you seemed to be opposed to this. The other option is to override the nvcc default behavior by adding -x cu to the command line. That is the option I’ve demonstrated here.