GET STARTED

GET INVOLVED

Authorization Required

Not a member? Register Now

I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.

Here is my test:

[code]

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

#define XBLOCK 256

#define YBLOCK 1

#define XGRID 84

#define YGRID 1

/** Kernel function **/

__global__ void KernelTest()

{

// Thread index

int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);

float A, result;

A = 11877.303711f;

result = A*A;

if(idx==0){

printf("A=%f - result=%f\n",A, result );

/* The displayed result is 141070336.0 instead of 141070343.443334 in double */

}

}

int main(){

cudaDeviceReset();

/** Launch of the kernel **/

dim3 blockSize(XBLOCK,1);

dim3 gridSize(XGRID,1);

KernelTest<<<gridSize, blockSize>>>();

}

[/code]

I would like to know if I can have the good result with float. I tried using __fmul_r[u,d,n](float,float) but still wrong.

I am afraid there is not solution but I still hoping there is one.

EDIT: One idea I had if to compute the result like this

[code]

float product1 = __fmul_rd( A,A );

float product2 = __fmul_ru( A,A );

float result = 0.5f*( produit1 + produit2 );

if(idx==0){

printf("A=%f - product1=%f - product2=%f - result=%f\n",A,product1,product2, result );

}

[/code]

But the result is false. I have this:

[code]

A=11877.303711 - product1=141070336.000000 - product2=141070352.000000 - result=141070336.000000

[/code]

Last thing, what do you think to use double instead of float?

Could you give me some advice?

Thanks

I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.

Here is my test:

I would like to know if I can have the good result with float. I tried using __fmul_r[u,d,n](float,float) but still wrong.

I am afraid there is not solution but I still hoping there is one.

EDIT: One idea I had if to compute the result like this

But the result is false. I have this:

Last thing, what do you think to use double instead of float?

Could you give me some advice?

Thanks

Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.

Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.

Hello,

Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.

[/quote]

Thanks you very much for trying to help me but it change nothing!

Hello,

Thanks you very much for trying to help me but it change nothing!

Hi,

I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.

[/quote]

It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.

Any more digits that you display with printf have *no* meaning, they're essentially random numbers /wink.gif' class='bbc_emoticon' alt=';)' />

People really need to understand these limitations.

Christian

Hi,

It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.

Any more digits that you display with printf have *no* meaning, they're essentially random numbers /wink.gif' class='bbc_emoticon' alt=';)' />

People really need to understand these limitations.

Christian

the problem with your first source code that there is actually no computation happening inside your kernel... what you see in the output is a result that was computed by the compiler, not by the GPU.

even if you would not use a constant inside a kernel you still have an another problem: 11877.303711 can not be represented as float, if you try it out you will see that the value stored in the float variable is 11877.3037.

next problem is that even if you somehow computed the result with higher precision ultimately you are assigning it to a float, but 141070343.443334 can not be represented as float and gets truncated to 141070336.0.

So for this concrete number and computation the result is perfectly right and the best you can get with float.

For other numbers and/or computations the result might differ between computing in float and computing in double and assigning result to float.

The only way to higher precision without actually using double is simulation of double precision using single precision.... google is your friend in that case :)

cheers

RoBiK

the problem with your first source code that there is actually no computation happening inside your kernel... what you see in the output is a result that was computed by the compiler, not by the GPU.

even if you would not use a constant inside a kernel you still have an another problem: 11877.303711 can not be represented as float, if you try it out you will see that the value stored in the float variable is 11877.3037.

next problem is that even if you somehow computed the result with higher precision ultimately you are assigning it to a float, but 141070343.443334 can not be represented as float and gets truncated to 141070336.0.

So for this concrete number and computation the result is perfectly right and the best you can get with float.

For other numbers and/or computations the result might differ between computing in float and computing in double and assigning result to float.

The only way to higher precision without actually using double is simulation of double precision using single precision.... google is your friend in that case :)

cheers

RoBiK

@ RoBiK : Now, I do the compute in double. I know that is not efficient (according to computation time) but I have other problems I would like to correct before improving computation time. I also had a look at google about "simulation of double precision using single precision" but I found nothing. Do you have some examples, or references please?

Thanks again,

Dext

@ RoBiK : Now, I do the compute in double. I know that is not efficient (according to computation time) but I have other problems I would like to correct before improving computation time. I also had a look at google about "simulation of double precision using single precision" but I found nothing. Do you have some examples, or references please?

Thanks again,

Dext

a search on these forums for dsfun might also turn out some results

Christian

a search on these forums for dsfun might also turn out some results

Christian

It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.

Any more digits that you display with printf have *no* meaning, they're essentially random numbers /wink.gif' class='bbc_emoticon' alt=';)' />

People really need to understand these limitations.

Christian

[/quote]

Thanks for the post. I started to program CUDA with almost 0 knowledge of C. I practically learned C in the same time with CUDA.

People really need to understand these limitations.

Christian

Thanks for the post. I started to program CUDA with almost 0 knowledge of C. I practically learned C in the same time with CUDA.