Hi,
is there any way to limit the OpenCL’s kernel register like we can do in CUDA using the -maxregcount ?
I have a 57 registers kernel and I need to use 32 max or my occupancy will suck!
thx
Hi,
is there any way to limit the OpenCL’s kernel register like we can do in CUDA using the -maxregcount ?
I have a 57 registers kernel and I need to use 32 max or my occupancy will suck!
thx
-cl-nv-maxrregcount
Take a look at [url=“http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/opencl_extensions/cl_nv_compiler_options.txt”]http://developer.download.nvidia.com/compu...ler_options.txt[/url]
You need to pass it when calling clBuildProgram
Thanks.
I think that doc is not correct though. It’s not “-cl-nv-maxrregcount 32” but “-cl-nv-maxrregcount=32”.
If I pass “-cl-nv-maxrregcount 32” without the equal then CL just crashes.
Btw… there is something strange with that option. If I pass as options this:
clBuildProgram ( program, 1, dev, "-cl-fast-relaxed-math -cl-nv-maxrregcount=32 -cl-nv-verbose", NULL, NULL );
the CL compiler reports is not reducing the registers to 32, it keeps them to 57, so seems the “-cl-nv-maxrregcount=32” is not taking effect. Curiously, the ocg compiler recognizes the command because it shows this:
and… should be the maxrregcount applied by kernel function better instead of for the whole program?
and -cl-nv-opt-level … what’s the max “N”, pls? 9?