OpenCL, is there instruction limitations ? opencl, instruction

thomasp · January 31, 2012, 5:16pm

Hi

I plan to code something in OpenCL, using Ã¼berKernel pattern.

It means that a given kernel would have this structure:

__kernel void my_uber_kernel(void)

{

     while(...)

     {

          if(stage==..)

          {

               device_function_0() ;

          } else

          if(stage==...)

          {

               device_function_1() ;

          }

          // etc...

          stage = stage + 1 ;

     }

}

Each one of

device_function_X()

potentially contains a substantial amount of code.

I’m wondering if there is known limitations regarding the amount of instructions supported (per thread?) before performances are impacted ?

Does splitting process in small device functions calls help to optimize ?

Or do I have to split process in several kernel calls (so that above-mentioned device_function_X become kernels)

short · January 31, 2012, 5:54pm

Doing this in separate kernel launches will include reading / writing to global memory overheads ( 100’s of cycles ). Best thing to do here is to compute all the stuff in one kernel, keep temporary results in registers and write results just once.

thomasp · January 31, 2012, 7:09pm

The memory usage overhead is one reason why I chose Ã¼berkernel way, but what if, in the end, the kernel contains like 10,000 lines of code (all calls inlined) ?

laughingrice · February 5, 2012, 1:17pm

Maximum kernel size (the limit is on the kernel, not thread), is 2000000 assembly instructions (I don’t think that that changed with Fermi).

The thing that you may need to watch is instruction cache pollution. You don’t want too much code inside an if conditional where the block diverges as it causes instruction cache pollution that can degrade performance. It can also cause issues if you have multiple blocks per multicore and they diverge.

Whether it’s better to split to multiple kernels or use a single Ã¼berKernel depends on your actual code. Going to global memory is very expensive, generally much more so than instruction cache pollution, but there are exception.

thomasp · February 5, 2012, 6:04pm

and do you know an order of magnitude for the program cache (instruction cache) size ?
something like 64KB ?

laughingrice · February 5, 2012, 6:46pm

The instruction cache is in the constant cache. If memory serves its 8KB.

pass · March 14, 2012, 4:22am

Best thing to do here is to compute all the stuff in one kernel.External Media