This is common to many compilers, when arithmetic operations are done on constants, or varioable containing constants (true constant not data from constant memory :) ), they pre-compute the result and use it instead compiling the computation itself.

ie:
int a = 178868;
b = sqrt( (float) a);
becomes, before the translation to machine-language code:
b = 422.927886962890625;

As the compiler run on the CPU, it use it's floating-point operation, that are IEEE compliant, so the number is correct and naturally equal to the same operation computed indepently on the same CPU (or another IEEE-compliant CPU).
But your GPU is *NOT* conceived to be fully IEEE-compliant and major complex floating-point operations use shortcuts to be fasters instead to send back results with full-precision correctness. This is absolutely normal, and a correct behavior for a GPU.

This is common to many compilers, when arithmetic operations are done on constants, or varioable containing constants (true constant not data from constant memory :) ), they pre-compute the result and use it instead compiling the computation itself.

ie:

int a = 178868;

b = sqrt( (float) a);

becomes, before the translation to machine-language code:

b = 422.927886962890625;

As the compiler run on the CPU, it use it's floating-point operation, that are IEEE compliant, so the number is correct and naturally equal to the same operation computed indepently on the same CPU (or another IEEE-compliant CPU).

But your GPU is *NOT* conceived to be fully IEEE-compliant and major complex floating-point operations use shortcuts to be fasters instead to send back results with full-precision correctness. This is absolutely normal, and a correct behavior for a GPU.

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

1) computing on GPU, 178868 given as a constant defined inside kernel

int a = 178868;

output[0] = sqrt( (float) a);

-> result = 422.927886962890625

2) computing on GPU, 178868 given as a kernel parameter

output[1] = sqrt( (float) i); // i = 178868

-> result = 422.9278564453125

I suppose that in case 1, the opencl compiler computes sqrt(178868), but why is the result different form case 2 ?

I also try to compute sqrtf(178868) on CPU : the result is identical to GPU case 1.

The kernel code I use :

__kernel void testsqrt( __global float *output, int i)

{

int a = 178868;

output[0] = sqrt( (float) a);

output[1] = sqrt( (float) i);

}

My config :

NVIDIA GTS450, 2 Go RAM

Intel Xeon E5620

Ubuntu 11.10 64 bits

CUDA toolkit 4.1.28

NVIDIA developer driver 285.05.33

1) computing on GPU, 178868 given as a constant defined inside kernel

int a = 178868;

output[0] = sqrt( (float) a);

-> result = 422.927886962890625

2) computing on GPU, 178868 given as a kernel parameter

output[1] = sqrt( (float) i); // i = 178868

-> result = 422.9278564453125

I suppose that in case 1, the opencl compiler computes sqrt(178868), but why is the result different form case 2 ?

I also try to compute sqrtf(178868) on CPU : the result is identical to GPU case 1.

The kernel code I use :

__kernel void testsqrt( __global float *output, int i)

{

int a = 178868;

output[0] = sqrt( (float) a);

output[1] = sqrt( (float) i);

}

My config :

NVIDIA GTS450, 2 Go RAM

Intel Xeon E5620

Ubuntu 11.10 64 bits

CUDA toolkit 4.1.28

NVIDIA developer driver 285.05.33

ie:

int a = 178868;

b = sqrt( (float) a);

becomes, before the translation to machine-language code:

b = 422.927886962890625;

As the compiler run on the CPU, it use it's floating-point operation, that are IEEE compliant, so the number is correct and naturally equal to the same operation computed indepently on the same CPU (or another IEEE-compliant CPU).

But your GPU is *NOT* conceived to be fully IEEE-compliant and major complex floating-point operations use shortcuts to be fasters instead to send back results with full-precision correctness. This is absolutely normal, and a correct behavior for a GPU.

ie:

int a = 178868;

b = sqrt( (float) a);

becomes, before the translation to machine-language code:

b = 422.927886962890625;

As the compiler run on the CPU, it use it's floating-point operation, that are IEEE compliant, so the number is correct and naturally equal to the same operation computed indepently on the same CPU (or another IEEE-compliant CPU).

But your GPU is *NOT* conceived to be fully IEEE-compliant and major complex floating-point operations use shortcuts to be fasters instead to send back results with full-precision correctness. This is absolutely normal, and a correct behavior for a GPU.

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark