Is it possible to have be.exe compiled has 64bits? Issue with CudaToolkit4.0 64bits

Hi to All!

This is a topic that I created at nvidia developer forums but got no solution…

So sorry for this kind of double post :)

Original post:

http://forums.developer.nvidia.com/devforum/discussion/6646/is-it-possible-to-have-be-exe-compiled-has-64bits

Thanks in advance for any kind of help!

Best regards

Marco Silva

A typical situation where Open64 runs out of memory is when it tries to optimize really large kernels (even kernels that look small-ish in source code can balloon in size due to inlining of all called functions). In order to get the code to compile, you could try lowering the optimization level for Open64. The default is -Xopencc -O3. I would suggest working backwards to -Xopencc -O2, -Xopencc -O1, -Xopencc -O0. Another workaround may be to split the large kernel into two or more smaller kernels.

[Later:]

The compiler team reminds me that you can also use the noinline attribute with device functions to limit the amount of inlining performed in the kernel. Note that due to architectural restrictions with sm_1x, noinline cannot be used with all device functions; the compiler will warn about functions that it must inline and ignores the noinline attribute for these.

Hi njuffa!

thank you for your reply!

On the original discussion the user Tera found the solution!
If I use the switch -nvvm in nvcc, I can use the new LLVM compiler even on sm1x code!
So after almost 2 days compiling, the cubin file was created and worked successfully!

Best regards,
Marco Silva

It is good to hear that you found an approach that works for you, but please note that to the best of my knowledge, this is not a supported configuration, meaning use of the NVVM frontend with sm_1x is not covered by our internal testing, and it may or may not work.

That’s not good at all…

In my project I have already many kernels to solve the inline issue.
But each kernel grows to quickly, so even multiple kernels won’t do the trick…

I will try to use also the noinline attribute in some key functions and see if this lowers the mem use.

With the -nvvm switch at least the sm1x for 32bits worked. Or so it seams :)
I will do some more testing (the 64bits version is compiling ATM) to see if anything is wrong.

When you say that it may or may not work, means that the compiler can crash, or that it may produce bad code?

As there are occasional bugs that cause tested paths to terminate abnormally or generate incorrect code, that is obviously a distinct possibility for any path that is not tested. In general I tend to steer people away from unsupported compiler flags for production code. Proceed at your own risk.

Ok, so this is probably not best solution…

There are no plans for the be.exe and inline.exe to be 64bits :)? That would be perfect!

If not, I have to go with the noinline tags, and see what I can get…