OpenCL sqrt() precision
I got different result computing sqrt(178868) on GPU in two different ways.

1) computing on GPU, 178868 given as a constant defined inside kernel

int a = 178868;
output[0] = sqrt( (float) a);

-> result = 422.927886962890625

2) computing on GPU, 178868 given as a kernel parameter

output[1] = sqrt( (float) i); // i = 178868

-> result = 422.9278564453125


I suppose that in case 1, the opencl compiler computes sqrt(178868), but why is the result different form case 2 ?

I also try to compute sqrtf(178868) on CPU : the result is identical to GPU case 1.


The kernel code I use :

__kernel void testsqrt( __global float *output, int i)
{
int a = 178868;
output[0] = sqrt( (float) a);

output[1] = sqrt( (float) i);
}

My config :
NVIDIA GTS450, 2 Go RAM
Intel Xeon E5620
Ubuntu 11.10 64 bits
CUDA toolkit 4.1.28
NVIDIA developer driver 285.05.33
I got different result computing sqrt(178868) on GPU in two different ways.



1) computing on GPU, 178868 given as a constant defined inside kernel



int a = 178868;

output[0] = sqrt( (float) a);



-> result = 422.927886962890625



2) computing on GPU, 178868 given as a kernel parameter



output[1] = sqrt( (float) i); // i = 178868



-> result = 422.9278564453125





I suppose that in case 1, the opencl compiler computes sqrt(178868), but why is the result different form case 2 ?



I also try to compute sqrtf(178868) on CPU : the result is identical to GPU case 1.





The kernel code I use :



__kernel void testsqrt( __global float *output, int i)

{

int a = 178868;

output[0] = sqrt( (float) a);



output[1] = sqrt( (float) i);

}



My config :

NVIDIA GTS450, 2 Go RAM

Intel Xeon E5620

Ubuntu 11.10 64 bits

CUDA toolkit 4.1.28

NVIDIA developer driver 285.05.33

#1
Posted 03/23/2012 01:52 PM   
This is common to many compilers, when arithmetic operations are done on constants, or varioable containing constants (true constant not data from constant memory :) ), they pre-compute the result and use it instead compiling the computation itself.

ie:
int a = 178868;
b = sqrt( (float) a);
becomes, before the translation to machine-language code:
b = 422.927886962890625;

As the compiler run on the CPU, it use it's floating-point operation, that are IEEE compliant, so the number is correct and naturally equal to the same operation computed indepently on the same CPU (or another IEEE-compliant CPU).
But your GPU is *NOT* conceived to be fully IEEE-compliant and major complex floating-point operations use shortcuts to be fasters instead to send back results with full-precision correctness. This is absolutely normal, and a correct behavior for a GPU.
This is common to many compilers, when arithmetic operations are done on constants, or varioable containing constants (true constant not data from constant memory :) ), they pre-compute the result and use it instead compiling the computation itself.



ie:

int a = 178868;

b = sqrt( (float) a);

becomes, before the translation to machine-language code:

b = 422.927886962890625;



As the compiler run on the CPU, it use it's floating-point operation, that are IEEE compliant, so the number is correct and naturally equal to the same operation computed indepently on the same CPU (or another IEEE-compliant CPU).

But your GPU is *NOT* conceived to be fully IEEE-compliant and major complex floating-point operations use shortcuts to be fasters instead to send back results with full-precision correctness. This is absolutely normal, and a correct behavior for a GPU.

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

#2
Posted 03/27/2012 05:37 PM   
Scroll To Top