Precision issue! Wrong result for a multiplication
Hi,
I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.
Here is my test:

[code]
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define XBLOCK 256
#define YBLOCK 1
#define XGRID 84
#define YGRID 1

/** Kernel function **/
__global__ void KernelTest()
{
// Thread index
int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);

float A, result;

A = 11877.303711f;
result = A*A;

if(idx==0){
printf("A=%f - result=%f\n",A, result );

/* The displayed result is 141070336.0 instead of 141070343.443334 in double */
}
}

int main(){

cudaDeviceReset();

/** Launch of the kernel **/
dim3 blockSize(XBLOCK,1);
dim3 gridSize(XGRID,1);

KernelTest<<<gridSize, blockSize>>>();

}
[/code]

I would like to know if I can have the good result with float. I tried using __fmul_r[u,d,n](float,float) but still wrong.
I am afraid there is not solution but I still hoping there is one.

EDIT: One idea I had if to compute the result like this
[code]

float product1 = __fmul_rd( A,A );
float product2 = __fmul_ru( A,A );
float result = 0.5f*( produit1 + produit2 );
if(idx==0){
printf("A=%f - product1=%f - product2=%f - result=%f\n",A,product1,product2, result );
}
[/code]
But the result is false. I have this:
[code]
A=11877.303711 - product1=141070336.000000 - product2=141070352.000000 - result=141070336.000000
[/code]


Last thing, what do you think to use double instead of float?

Could you give me some advice?
Thanks
Hi,

I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.

Here is my test:





#include <stdio.h>

#include <stdlib.h>

#include <time.h>



#define XBLOCK 256

#define YBLOCK 1

#define XGRID 84

#define YGRID 1



/** Kernel function **/

__global__ void KernelTest()

{

// Thread index

int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);



float A, result;



A = 11877.303711f;

result = A*A;



if(idx==0){

printf("A=%f - result=%f\n",A, result );



/* The displayed result is 141070336.0 instead of 141070343.443334 in double */

}

}



int main(){



cudaDeviceReset();



/** Launch of the kernel **/

dim3 blockSize(XBLOCK,1);

dim3 gridSize(XGRID,1);



KernelTest<<<gridSize, blockSize>>>();



}




I would like to know if I can have the good result with float. I tried using __fmul_r[u,d,n](float,float) but still wrong.

I am afraid there is not solution but I still hoping there is one.



EDIT: One idea I had if to compute the result like this





float product1 = __fmul_rd( A,A );

float product2 = __fmul_ru( A,A );

float result = 0.5f*( produit1 + produit2 );

if(idx==0){

printf("A=%f - product1=%f - product2=%f - result=%f\n",A,product1,product2, result );

}


But the result is false. I have this:



A=11877.303711 - product1=141070336.000000 - product2=141070352.000000 - result=141070336.000000






Last thing, what do you think to use double instead of float?



Could you give me some advice?

Thanks

#1
Posted 04/04/2012 01:10 PM   
Hello,

Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.
Hello,



Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.

#2
Posted 04/04/2012 02:08 PM   
[quote name='pasoleatis' date='04 April 2012 - 04:08 PM' timestamp='1333548539' post='1391890']
Hello,

Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.
[/quote]

Thanks you very much for trying to help me but it change nothing!
[quote name='pasoleatis' date='04 April 2012 - 04:08 PM' timestamp='1333548539' post='1391890']

Hello,



Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.





Thanks you very much for trying to help me but it change nothing!

#3
Posted 04/04/2012 02:33 PM   
[quote name='Dext' date='04 April 2012 - 03:10 PM' timestamp='1333545006' post='1391877']
Hi,
I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.
[/quote]

It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.

Any more digits that you display with printf have *no* meaning, they're essentially random numbers /wink.gif' class='bbc_emoticon' alt=';)' />

People really need to understand these limitations.

Christian
[quote name='Dext' date='04 April 2012 - 03:10 PM' timestamp='1333545006' post='1391877']

Hi,

I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.





It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.



Any more digits that you display with printf have *no* meaning, they're essentially random numbers /wink.gif' class='bbc_emoticon' alt=';)' />



People really need to understand these limitations.



Christian

#4
Posted 04/04/2012 03:50 PM   
Hello,

the problem with your first source code that there is actually no computation happening inside your kernel... what you see in the output is a result that was computed by the compiler, not by the GPU.

even if you would not use a constant inside a kernel you still have an another problem: 11877.303711 can not be represented as float, if you try it out you will see that the value stored in the float variable is 11877.3037.
next problem is that even if you somehow computed the result with higher precision ultimately you are assigning it to a float, but 141070343.443334 can not be represented as float and gets truncated to 141070336.0.
So for this concrete number and computation the result is perfectly right and the best you can get with float.
For other numbers and/or computations the result might differ between computing in float and computing in double and assigning result to float.
The only way to higher precision without actually using double is simulation of double precision using single precision.... google is your friend in that case :)

cheers
RoBiK
Hello,



the problem with your first source code that there is actually no computation happening inside your kernel... what you see in the output is a result that was computed by the compiler, not by the GPU.



even if you would not use a constant inside a kernel you still have an another problem: 11877.303711 can not be represented as float, if you try it out you will see that the value stored in the float variable is 11877.3037.

next problem is that even if you somehow computed the result with higher precision ultimately you are assigning it to a float, but 141070343.443334 can not be represented as float and gets truncated to 141070336.0.

So for this concrete number and computation the result is perfectly right and the best you can get with float.

For other numbers and/or computations the result might differ between computing in float and computing in double and assigning result to float.

The only way to higher precision without actually using double is simulation of double precision using single precision.... google is your friend in that case :)



cheers

RoBiK

#5
Posted 04/04/2012 04:14 PM   
@ cbuchner1 : Thanks for the explanation. It seems logical now.

@ RoBiK : Now, I do the compute in double. I know that is not efficient (according to computation time) but I have other problems I would like to correct before improving computation time. I also had a look at google about "simulation of double precision using single precision" but I found nothing. Do you have some examples, or references please?

Thanks again,
Dext
@ cbuchner1 : Thanks for the explanation. It seems logical now.



@ RoBiK : Now, I do the compute in double. I know that is not efficient (according to computation time) but I have other problems I would like to correct before improving computation time. I also had a look at google about "simulation of double precision using single precision" but I found nothing. Do you have some examples, or references please?



Thanks again,

Dext

#6
Posted 04/10/2012 10:12 AM   
Google for "double single" computation or a cuda port of the dsfun library (from Fortran)

a search on these forums for dsfun might also turn out some results

Christian
Google for "double single" computation or a cuda port of the dsfun library (from Fortran)



a search on these forums for dsfun might also turn out some results



Christian

#7
Posted 04/10/2012 10:15 AM   
[quote name='cbuchner1' date='04 April 2012 - 04:50 PM' timestamp='1333554616' post='1391932']
It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.

Any more digits that you display with printf have *no* meaning, they're essentially random numbers /wink.gif' class='bbc_emoticon' alt=';)' />

People really need to understand these limitations.

Christian
[/quote]
Thanks for the post. I started to program CUDA with almost 0 knowledge of C. I practically learned C in the same time with CUDA.
[quote name='cbuchner1' date='04 April 2012 - 04:50 PM' timestamp='1333554616' post='1391932']

It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.



Any more digits that you display with printf have *no* meaning, they're essentially random numbers /wink.gif' class='bbc_emoticon' alt=';)' />



People really need to understand these limitations.



Christian



Thanks for the post. I started to program CUDA with almost 0 knowledge of C. I practically learned C in the same time with CUDA.

#8
Posted 04/11/2012 08:12 AM   
Scroll To Top