Is it possible to have be.exe compiled has 64bits? Issue with CudaToolkit4.0 64bits

MarcoSilva · April 10, 2012, 3:05pm

Hi to All!

This is a topic that I created at nvidia developer forums but got no solution…

So sorry for this kind of double post :)

Original post:

http://forums.developer.nvidia.com/devforum/discussion/6646/is-it-possible-to-have-be-exe-compiled-has-64bits

Hi All,

Before anything else here is my test machine and CUDA version:

Win7 64bits with 6GB of RAM.

CUDA Tookit 4.0 also 64bits.

Now my problem:

I have a pretty big kernel to compile and it stops at be.exe.

After some investigation I found that be.exe stops with an out of memory error when it started using more than 2GB of RAM, and that makes sense, because be.exe is a 32bits application (even thou the toolkit is the 64bits version).

Hoping that my kernel compile wouldn’t exceed 3GB and that be.exe would be compatible with the flag /LARGEDADDRESSAWARE, I changed it.

But alas… The compilation quickly consumed the 3GB and the same error occurred…

Now I am stuck… The only chance is to have be.exe compiled has a 64bits executable, then it would be able to use all the RAM available (unfortunately cutting stuff out of the kernel is not a possibility)…

Or there is any workaround that I am missing?

Best regards to all!

Marco Silva

Addendum:

I need to compile sm_1x code, so the new LLVM based compiler won’t be used…

Thanks in advance for any kind of help!

Best regards

Marco Silva

njuffa · April 10, 2012, 6:45pm

A typical situation where Open64 runs out of memory is when it tries to optimize really large kernels (even kernels that look small-ish in source code can balloon in size due to inlining of all called functions). In order to get the code to compile, you could try lowering the optimization level for Open64. The default is -Xopencc -O3. I would suggest working backwards to -Xopencc -O2, -Xopencc -O1, -Xopencc -O0. Another workaround may be to split the large kernel into two or more smaller kernels.

[Later:]

The compiler team reminds me that you can also use the noinline attribute with device functions to limit the amount of inlining performed in the kernel. Note that due to architectural restrictions with sm_1x, noinline cannot be used with all device functions; the compiler will warn about functions that it must inline and ignores the noinline attribute for these.

MarcoSilva · April 12, 2012, 2:26pm

Hi njuffa!

thank you for your reply!

On the original discussion the user Tera found the solution!
If I use the switch -nvvm in nvcc, I can use the new LLVM compiler even on sm1x code!
So after almost 2 days compiling, the cubin file was created and worked successfully!

Best regards,
Marco Silva

njuffa · April 12, 2012, 5:00pm

It is good to hear that you found an approach that works for you, but please note that to the best of my knowledge, this is not a supported configuration, meaning use of the NVVM frontend with sm_1x is not covered by our internal testing, and it may or may not work.

MarcoSilva · April 12, 2012, 6:47pm

That’s not good at all…

In my project I have already many kernels to solve the inline issue.
But each kernel grows to quickly, so even multiple kernels won’t do the trick…

I will try to use also the noinline attribute in some key functions and see if this lowers the mem use.

With the -nvvm switch at least the sm1x for 32bits worked. Or so it seams :)
I will do some more testing (the 64bits version is compiling ATM) to see if anything is wrong.

When you say that it may or may not work, means that the compiler can crash, or that it may produce bad code?

njuffa · April 12, 2012, 7:17pm

As there are occasional bugs that cause tested paths to terminate abnormally or generate incorrect code, that is obviously a distinct possibility for any path that is not tested. In general I tend to steer people away from unsupported compiler flags for production code. Proceed at your own risk.

MarcoSilva · April 12, 2012, 8:55pm

Ok, so this is probably not best solution…

There are no plans for the be.exe and inline.exe to be 64bits :)? That would be perfect!

If not, I have to go with the noinline tags, and see what I can get…