way to limit registers?
Hi,

is there any way to limit the OpenCL's kernel register like we can do in CUDA using the -maxregcount ?
I have a 57 registers kernel and I need to use 32 max or my occupancy will suck!

thx
Hi,



is there any way to limit the OpenCL's kernel register like we can do in CUDA using the -maxregcount ?

I have a 57 registers kernel and I need to use 32 max or my occupancy will suck!



thx

#1
Posted 06/11/2010 03:37 AM   
-cl-nv-maxrregcount <N>

Take a look at [url="http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/opencl_extensions/cl_nv_compiler_options.txt"]http://developer.download.nvidia.com/compu...ler_options.txt[/url]

You need to pass it when calling clBuildProgram
-cl-nv-maxrregcount <N>



Take a look at http://developer.download.nvidia.com/compu...ler_options.txt



You need to pass it when calling clBuildProgram

#2
Posted 06/11/2010 09:15 AM   
[quote name='pplaszew' post='1071799' date='Jun 11 2010, 10:15 AM']-cl-nv-maxrregcount <N>

Take a look at [url="http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/opencl_extensions/cl_nv_compiler_options.txt"]http://developer.download.nvidia.com/compu...ler_options.txt[/url]

You need to pass it when calling clBuildProgram[/quote]
Thanks.

I think that doc is not correct though. It's not "-cl-nv-maxrregcount 32" but "-cl-nv-maxrregcount=32".
If I pass "-cl-nv-maxrregcount 32" without the equal then CL just crashes.

Btw... there is something strange with that option. If I pass as options this:

[code]clBuildProgram ( program, 1, dev, "-cl-fast-relaxed-math -cl-nv-maxrregcount=32 -cl-nv-verbose", NULL, NULL );[/code]

the CL compiler reports is not reducing the registers to 32, it keeps them to 57, so seems the "-cl-nv-maxrregcount=32" is not taking effect. Curiously, the ocg compiler recognizes the command because it shows this:
[quote]: Retrieving binary for 'anonymous_jit_identity', for gpu='sm_12', usage mode=' --maxrregcount 32 --verbose'
: Considering profile 'compute_10' for gpu='sm_12' in 'anonymous_jit_identity'
: Control flags for 'anonymous_jit_identity' disable search path
: Ptx binary found for 'anonymous_jit_identity', architecture='compute_10'
: Ptx compilation for 'anonymous_jit_identity', for gpu='s

Compiling entry function 'myComplexKernel' for 'sm_12'.ptxas info : Used 57 registers, 480+0 bytes lmem, 40+1[/quote]


and... should be the maxrregcount applied by kernel function better instead of for the whole program?

and -cl-nv-opt-level <N> ... what's the max "N", pls? 9?
[quote name='pplaszew' post='1071799' date='Jun 11 2010, 10:15 AM']-cl-nv-maxrregcount <N>



Take a look at http://developer.download.nvidia.com/compu...ler_options.txt



You need to pass it when calling clBuildProgram

Thanks.



I think that doc is not correct though. It's not "-cl-nv-maxrregcount 32" but "-cl-nv-maxrregcount=32".

If I pass "-cl-nv-maxrregcount 32" without the equal then CL just crashes.



Btw... there is something strange with that option. If I pass as options this:



clBuildProgram ( program, 1, dev, "-cl-fast-relaxed-math -cl-nv-maxrregcount=32 -cl-nv-verbose", NULL, NULL );




the CL compiler reports is not reducing the registers to 32, it keeps them to 57, so seems the "-cl-nv-maxrregcount=32" is not taking effect. Curiously, the ocg compiler recognizes the command because it shows this:

: Retrieving binary for 'anonymous_jit_identity', for gpu='sm_12', usage mode=' --maxrregcount 32 --verbose'

: Considering profile 'compute_10' for gpu='sm_12' in 'anonymous_jit_identity'

: Control flags for 'anonymous_jit_identity' disable search path

: Ptx binary found for 'anonymous_jit_identity', architecture='compute_10'

: Ptx compilation for 'anonymous_jit_identity', for gpu='s



Compiling entry function 'myComplexKernel' for 'sm_12'.ptxas info : Used 57 registers, 480+0 bytes lmem, 40+1






and... should be the maxrregcount applied by kernel function better instead of for the whole program?



and -cl-nv-opt-level <N> ... what's the max "N", pls? 9?

#3
Posted 06/11/2010 04:27 PM   
Scroll To Top