Reduce the number of register

Hello All,

I wrote a kernel, this kernel will execute an equation, and when I use the visual profiler I found that I cant run all available thread in parallel, for example the gpu can run 1563 thread in parallel but I can reach only 1024 due to register factor, and the profiler said that I use 26 register per thread which is high, any one can give me a hint how to reduce the number of register per thread?

The kernel is:

__kernel void relCalculation(const __global int* a,

                             const __global double* fProb,

                             const __global float* T,                             

                             __global int* Output,

                             const int max,

                             const int idx,

                             const int col,                             

                             const double rel,

                             const double t)

{

const int i = get_global_id(0);

if (i <= max )

    {         

double GP = 1;

        for (int j = 0; j < col; ++j)

        {

            GP *= pow( (1 - pow(fProb[j + idx], a[i*col + j]) ), t/T[j]);

        }

if ( GP >= rel)

            Output[i] = 1; 

        else

            Output[i] = 0; 

}

}

You could try to put your const vars into constant memory instead of submitting them via parameters, e.g.

__constant double t = 12345;

The constant memory is slower than private memmory but maybe it performs better because of running more threads.

Srdja