Runtime error when I add .cu file to VS2008 C++ project I cannot get past this error, even after Goo

I’m having trouble building my Cuda application in Visual Studio Express 2008, using C++.
Although I’ve been a C/C++ developer for over 25 years, this is my first attempt at Cuda
development.

Specifically, I can build a Win32 CLR app successfully, and when I add my .cu source file
to the project and do a clean/rebuild, it builds successfully. But immediately I get the
run-time crash error “Debug Assertion Failed! Expression:_CrtlsValidHeapPointer(pUserData)”
when I try to run or debug it.

I’ve installed the cuda.rules, and set what I believe are the appropriate include
and lib paths. The Cuda command line created by the cuda.rules looks like this:

“C:\CUDA\bin\nvcc.exe”
-gencode=arch=compute_10,code="sm_10,compute_10"
-gencode=arch=compute_20,code="sm_20,compute_20"
-ccbin “C:\Program Files\Microsoft Visual Studio 9.0\VC\bin”
-I"C:\CUDA\include" -I"./" -I"…/…/common/inc" -I"…/…/…/shared/inc"
-Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd /GR "
-maxrregcount=32
-gencode=arch=compute_10,code="sm_10,compute_10"
-gencode=arch=compute_20,code="sm_20,compute_20"
–compile -o “Debug\CudaWin2.cu.obj”
“c:\Users\Rich\Documents\Visual Studio 2008\Projects\CF3\CudaWin2\CudaWin2.vcproj”

My linker command line is as follows:
/OUT:“C:\Users\Rich\Documents\Visual Studio 2008\Projects\CF3\Debug\CudaWin2.exe”
/INCREMENTAL /NOLOGO /LIBPATH:“C:\CUDA\lib/…/lib”
/LIBPATH:“…/…/common/lib” /LIBPATH:“…/…/…/shared/lib”
/MANIFEST /MANIFESTFILE:“Debug\CudaWin2.exe.intermediate.manifest”
/MANIFESTUAC:“level=‘asInvoker’ uiAccess=‘false’”
/DEBUG /ASSEMBLYDEBUG
/PDB:“c:\Users\Rich\Documents\Visual Studio 2008\Projects\CF3\Debug\CudaWin2.pdb”
/SUBSYSTEM:WINDOWS /ENTRY:“main”
/DYNAMICBASE /FIXED:No /NXCOMPAT /MACHINE:X86 /ERRORREPORT:PROMPT
cudart.lib

All other project source has the Runtime Library set as /MDd, so I don’t believe there’s
an inconsistency there, as seems to be indicated when I Google the error I get.

The kernel code is at the top of my .cu file, with c++ methods at the bottom which call
the kernel functions. As far as I know this is legal, but if you can’t call a kernel
function from within a cpp function, could someone please let me know?

The run-time error described above comes up immediately as soon as I try to run the
application using either “Debug / Start Debugging” or “Debug / Start without Debugging”
from within Visual Studio C++ 2008.

The Output window reports the following when I try to run it and click Ignore, etc, when the error
occurs:

‘CudaWin2.exe’: Loaded ‘C:\Users\Rich\Documents\Visual Studio 2008\Projects\CF3\Debug\CudaWin2.exe’, Symbols loaded.
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\ntdll.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\mscoree.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\kernel32.dll’
‘CudaWin2.exe’: Loaded ‘C:\CUDA\bin\cudart32_31_9.dll’, Binary was not built with debug information.
‘CudaWin2.exe’: Loaded ‘C:\Windows\winsxs\x86_microsoft.vc90.debugcrt_1fc8b3b9a1e18e3b_9.0.30729.1_none_bb1f6aa1308c35eb\msvcr90d.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\winsxs\x86_microsoft.vc90.debugcrt_1fc8b3b9a1e18e3b_9.0.30729.1_none_bb1f6aa1308c35eb\msvcm90d.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\ole32.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\msvcrt.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\gdi32.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\user32.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\advapi32.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\rpcrt4.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\imm32.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\msctf.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\lpk.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\usp10.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscoreei.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\shlwapi.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\winsxs\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.6002.18305_none_5cb72f2a088b0ed3\comctl32.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\winsxs\x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.4053_none_d08d7da0442a985d\msvcr80.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\shell32.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\Microsoft.NET\Framework\v2.0.50727\Culture.dll’
‘CudaWin2.exe’: Unloaded ‘C:\Windows\Microsoft.NET\Framework\v2.0.50727\Culture.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\assembly\NativeImages_v2.0.50727_32\mscorlib\98bbdd8c400493ad228b8283665cc9da\mscorlib.ni.dll’
‘CudaWin2.exe’ (Managed): Loaded ‘C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\uxtheme.dll’
‘CudaWin2.exe’ (Managed): Loaded ‘c:\Users\Rich\Documents\Visual Studio 2008\Projects\CF3\Debug\CudaWin2.exe’, Symbols loaded.
‘CudaWin2.exe’: Loaded ‘C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorjit.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\Microsoft.NET\Framework\v2.0.50727\diasymreader.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\System32\rsaenh.dll’
‘CudaWin2.exe’ (Managed): Loaded ‘C:\Windows\WinSxS\x86_microsoft.vc90.debugcrt_1fc8b3b9a1e18e3b_9.0.30729.1_none_bb1f6aa1308c35eb\msvcm90d.dll’
‘CudaWin2.exe’: Loaded ‘C:\Windows\assembly\NativeImages_v2.0.50727_32\System\ed6ae2749d12c4729ee43ff339de4bb8\System.ni.dll’
‘CudaWin2.exe’ (Managed): Loaded ‘C:\Windows\assembly\GAC_MSIL\System\2.0.0.0__b77a5c561934e089\System.dll’
First-chance exception at 0x770d128a in CudaWin2.exe: 0xC0000005: Access violation reading location 0xa2377ab5.
First-chance exception at 0x5ea71079 in CudaWin2.exe: 0xC0000005: Access violation reading location 0xa2377aca.
A first chance exception of type ‘System.AccessViolationException’ occurred in CudaWin2.exe
A first chance exception of type ‘.ModuleLoadException’ occurred in msvcm90d.dll
First-chance exception at 0x76d7fbae in CudaWin2.exe: Microsoft C++ exception: [rethrow] at memory location 0x00000000…
An unhandled exception of type ‘System.TypeInitializationException’ occurred in Unknown Module.

Additional information: The type initializer for ‘’ threw an exception.

The thread ‘Win32 Thread’ (0xc70) has exited with code 0 (0x0).
The thread ‘Win32 Thread’ (0x978) has exited with code 0 (0x0).
The thread ‘Main Thread’ (0xf4c) has exited with code 0 (0x0).
The program ‘[4052] CudaWin2.exe: Managed’ has exited with code 0 (0x0).
The program ‘[4052] CudaWin2.exe: Native’ has exited with code 0 (0x0).

Do I have to separate my kernels from my C++ methods?
Is there a specific runtime debug library that I need to link with instead of cudart.lib?
I ask this latter question because in the Output window I always see
“C:\CUDA\bin\cudart32_31_9.dll not built with debugging information”.

Any help would be greatly appreciated!
~Rich

Well, I’ve noticed that nearly 1000 people have viewed my post, and none have posted a comment. I’ve since changed the program to a console program, which at least gives me the ability to run the program, but I have yet to find out if it’s even possible to create a Windows Cuda app that uses /CLR, i.e. .NET support.

I find it nearly impossible to believe that since 2007 not one Cuda developer has tried to create a Windows application that uses Cuda, so I’m shocked that I did not at least get a reply of “I tried that too, but it didn’t work”, or “That’s impossible as far as I know” or “You have to take such-and-such steps to get Cuda to work inside a Windows /clr application.”

For those that end up reading this post, I’ll offer what I’ve discovered while learning how to debug Visual Studio and Cuda kernels:

  1. When debugging, make sure that all C++ and Cuda .cu files are compiled with /Od to disable optimization. If you fail to do this, your variables will show strange and invalid results for most of them when you hover the cursor over them to see their values.

  2. As far as I can tell, you can’t step into a kernel, at least not from VS, when you only have a single Cuda card installed. I’ve set breakpoints within them but have had no success tracing into or within them. On the other hand though, the debugger will walk through the assembly of the cuda code, which is of limited value, imho.

  3. For debugging device and global cuda routines, duplicate them in a c++ method of your class, and test the accuracy that way, looping through each thread in the method call. It at least tells you whether your algorithm is working and puts the focus of cuda crashes on the kernel calls or memcpy’s themselves.

Hope this helps at least one person.

Sorry that no one could help you; I personally have never built a cuda project that wasn’t console, so I can’t really speak to your original issue.

As for some of your other questions/statements, when you ask “Do I have to separate my kernels from my C++ methods?”, the answer is… maybe. Don’t include large libraries/C++ code that NVCC can’t parse in your .cu files, but there’s nothing otherwise that prevents you from having host//regular C++ code side-by-side in your .cu files.

As for kernel debugging, one other thing you didn’t mention was using print statements; after cuda version 3.2 (I think?) you can use printf’s directly in device code, while if you have an older version you’ll have to use cuPrintf from the SDK. I wouldn’t advise using your method #3, as you won’t be able to see any of the possible race conditions/concurrency issues if you’re executing it serially in that manner.

Maybe if you posted your issue in the General or Programming boards you’d get more assistance; the vista board doesn’t get as much traffic

Thank you for your response.

According to my experience, I have to have the source file in with the cuda device and global kernels, otherwise nvcc complains or it doesn’t link,I can’t remember which. In any event, I forward declare and include all kernels in with my c++ methods and it seems to work fine.

I’m using nvcc 3.1 so I can’t use the printf calls. Also, I don’t know if it’s just me or what, but I get an odd linker error at the end of a build, which doesn’t seem to have a negative effect on the program, as it runs fine, but it’s a little disconcerting. The error is:

1>c:\cuda\include\common_functions.h(73): warning: dllexport/dllimport conflict with “printf”
1>C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\stdio.h(287): here; dllimport/dllexport dropped

I have no idea what is causing this error or how to fix it. Ideally I’d prefer a program that doesn’t have any errors at all, needless to say, but the error seems benign at this time.

I don’t have any problems with race issues or concurrency at this time, as all of the kernels are small and have to be performed sequentially anyway. Where the cuda shines is that it can do tens of thousands of these threads simultaneously, which is exactly what I needed it to do.

I posted the same general question in the Development page about 6-8 hours after the post to the Vista forum, as you indicated, simply because I thought it would get better exposure. Oddly enough, nobody read the Development post, and about 490 read this one. Go figure.

Thanks again for your post; at least now I now it’s not a conspiracy. ;-)