Olimit was exceeded - Out of memory
Hi all,

I am a novice CUDA-user, and I got a trouble compiling my program, with huge kernel functions.
When I launch "make" with my makefile, after waiting for a while, I obtain:

"Olimit was exceeded […]; will not perform function-scope optimization"
"To still perform function-scope optimization use -OPT:Olimit=0"
"Out of memory in Allocate_Large_Block
"nvopencc INTERNAL ERROR"

I tried to modify the "common.mk" file, which is called by the makefile, by adding:
"NVCCFLAGS += --opencc-options -OPT:olimit=0",
but the compiler says something like: "Warning: --opencc-options (Xopencc) obsolete and ignored for compute_20, sm_20 or higher" and then it gives me the same error.

How can I handle this stuff?? I use OpenSuse 11.2, with NVIDIA GeForce 9800 GX2.

Thanks a lot!
Hi all,



I am a novice CUDA-user, and I got a trouble compiling my program, with huge kernel functions.

When I launch "make" with my makefile, after waiting for a while, I obtain:



"Olimit was exceeded […]; will not perform function-scope optimization"

"To still perform function-scope optimization use -OPT:Olimit=0"

"Out of memory in Allocate_Large_Block

"nvopencc INTERNAL ERROR"



I tried to modify the "common.mk" file, which is called by the makefile, by adding:

"NVCCFLAGS += --opencc-options -OPT:olimit=0",

but the compiler says something like: "Warning: --opencc-options (Xopencc) obsolete and ignored for compute_20, sm_20 or higher" and then it gives me the same error.



How can I handle this stuff?? I use OpenSuse 11.2, with NVIDIA GeForce 9800 GX2.



Thanks a lot!

#1
Posted 04/25/2012 01:49 PM   
You could try compiling with -Xopencc -O0, i.e. compiling without optimization. If that does not help, I would suggest breaking up the very large kernel into several smaller kernels.
You could try compiling with -Xopencc -O0, i.e. compiling without optimization. If that does not help, I would suggest breaking up the very large kernel into several smaller kernels.

#2
Posted 05/21/2012 07:10 PM   
I fixed it.

Since the kernel was featured by recursive calls to powf and huge mathematical functions, I defined some functions to be installed on the device similar to powf and used USE_FAST_MATH for all the other operations.
Then I chose to limit the number of registers assigned to each thread, by using MAXRREGCOUNT and studied the sensitivity of calculation-time to it.

Thanks anyway!

p.s. my kernel couldn't be split into several ones, because of the nature of the problem.
I fixed it.



Since the kernel was featured by recursive calls to powf and huge mathematical functions, I defined some functions to be installed on the device similar to powf and used USE_FAST_MATH for all the other operations.

Then I chose to limit the number of registers assigned to each thread, by using MAXRREGCOUNT and studied the sensitivity of calculation-time to it.



Thanks anyway!



p.s. my kernel couldn't be split into several ones, because of the nature of the problem.

#3
Posted 05/31/2012 11:45 AM   
Scroll To Top