Problem with kernel output

please see this kernel and correspond output, when I execute this kernel it will work for the first thread and the other will give a wrong output, so any one can help me to figure out what is the problem?

__kernel void rmsCalculation(const __global float* a ,
const __global float * C,
__global float * O,
const int col)
{

const int ar = get_global_id(0);

  float R=0; 
  float I=0;
  float c=0;
  bool totalSch = true;
  float sum=0;

  for(int j=0;j<col; ++j)
{
  c = C[j] * a[ar * col + j]; 
  I=0;
  
  do
    {
      R = I + c;
      I=0;
   
      if(R>T[j])
	{		
	  totalSch = false;
	  break;
	}
      else
	{
	  for(int k=0 ; k<j ; ++k)
	    {		      
	      I = I + C[k] * a[ar * col + k];
	    }		      
	}
    }while(I+c > R);
    
    sum = sum + R;	       
  
    if(totalSch == false)
        {
	break;	
    }
  
}//end for(j=0..

O[ar]=sum;

}

but in the output the first element in the “O” array is calculated correctly but the other elements are wrong as shown bleow;
0= 11
1= -9.99199e+18
2= -9.99199e+18
3= -9.99199e+18
4= -9.99199e+18
5= -9.99199e+18
6= -9.99199e+18
7= -9.99199e+18
8= -9.99199e+18
9= -9.99199e+18
10= -9.99199e+18
11= -9.99199e+18
12= -9.99199e+18
13= -9.99199e+18
14= -9.99199e+18
15= -9.99199e+18
16= -9.99199e+18
17= -9.99199e+18
18= -9.99199e+18
19= -9.99199e+18

So what is the problem in the code?

I solved the problem, the problem not with the kernel, it was with setting the buffer size for array “a”, since it should be row*col, and I just made the size as col only, External Image