CUSPARSE(A*x) != CUBLAS(A*x)
  1 / 2    
Hello,

while evaluating cusparse and some other sparse matrix libraries we encountered different
results for the following operation:

A * x

The following simple example matrix A (2,2) multiplied with the given vector X demonstrates this problem:

[code]Matrix A:
A[0] = 0.939129173755645752;
A[1] = 0.799645721912384033;
A[2] = 0.814138710498809814;
A[3] = 0.594497263431549072;

(We are using a matrix with no zero values here. The failure also appears while using other matrices with zero values)

Vector X:
x[0] = 0.657200932502746582;
x[1] = 0.995299935340881348;[/code]

the result of the two operations:
[code]cublasSgemv (...);
cusparseScsrmv(...);[/code]

is:
[code]cuSPARSEcuBLAScompare test running..
--CUBLAS-- result of A*x: [1.427508711814880371, 1.117230892181396484]
-CUSPARSE- result of A*x: [1.427508831024169922, 1.117231011390686035][/code]


While testing the cusp library we encountered similar (mabe the same?) wrong results.
We then programmed the sparse matrix multiplication on the CPU and verified that CUBLAS works as expected.
To verify this we wrote an own sparse matrix multiplication kernel which also confirmed that CUBLAS works as expected.

The failure gets worse when using matrices with higher dimensions.

The observed behavior was tested on a GTX 480 (windows + linux) and 8800GTS (linux)

For everyone to test the behavior i attached a zip file with the code + makefile.
To be able to make the program without any issues its important to have the "CUDA Toolkit 3.2 RC" installed
as well as the "GPU Computing SDK code samples". If you can make these examples without a problem you copy the
source to "GPU Computing SDK code samples"/C/src/ and then just go to the src/cuSPARSEcuBLAScompare/ directory and initiate make.
(This step is necessary because our makefile uses the nvidia samples common-makefile).
The resulting executable is in the same folder as the other sample executables.
Hello,



while evaluating cusparse and some other sparse matrix libraries we encountered different

results for the following operation:



A * x



The following simple example matrix A (2,2) multiplied with the given vector X demonstrates this problem:



Matrix A:

A[0] = 0.939129173755645752;

A[1] = 0.799645721912384033;

A[2] = 0.814138710498809814;

A[3] = 0.594497263431549072;



(We are using a matrix with no zero values here. The failure also appears while using other matrices with zero values)



Vector X:

x[0] = 0.657200932502746582;

x[1] = 0.995299935340881348;




the result of the two operations:

cublasSgemv (...);

cusparseScsrmv(...);




is:

cuSPARSEcuBLAScompare test running..

--CUBLAS-- result of A*x: [1.427508711814880371, 1.117230892181396484]

-CUSPARSE- result of A*x: [1.427508831024169922, 1.117231011390686035]






While testing the cusp library we encountered similar (mabe the same?) wrong results.

We then programmed the sparse matrix multiplication on the CPU and verified that CUBLAS works as expected.

To verify this we wrote an own sparse matrix multiplication kernel which also confirmed that CUBLAS works as expected.



The failure gets worse when using matrices with higher dimensions.



The observed behavior was tested on a GTX 480 (windows + linux) and 8800GTS (linux)



For everyone to test the behavior i attached a zip file with the code + makefile.

To be able to make the program without any issues its important to have the "CUDA Toolkit 3.2 RC" installed

as well as the "GPU Computing SDK code samples". If you can make these examples without a problem you copy the

source to "GPU Computing SDK code samples"/C/src/ and then just go to the src/cuSPARSEcuBLAScompare/ directory and initiate make.

(This step is necessary because our makefile uses the nvidia samples common-makefile).

The resulting executable is in the same folder as the other sample executables.

#1
Posted 09/23/2010 08:37 PM   
Hello,

while evaluating cusparse and some other sparse matrix libraries we encountered different
results for the following operation:

A * x

The following simple example matrix A (2,2) multiplied with the given vector X demonstrates this problem:

[code]Matrix A:
A[0] = 0.939129173755645752;
A[1] = 0.799645721912384033;
A[2] = 0.814138710498809814;
A[3] = 0.594497263431549072;

(We are using a matrix with no zero values here. The failure also appears while using other matrices with zero values)

Vector X:
x[0] = 0.657200932502746582;
x[1] = 0.995299935340881348;[/code]

the result of the two operations:
[code]cublasSgemv (...);
cusparseScsrmv(...);[/code]

is:
[code]cuSPARSEcuBLAScompare test running..
--CUBLAS-- result of A*x: [1.427508711814880371, 1.117230892181396484]
-CUSPARSE- result of A*x: [1.427508831024169922, 1.117231011390686035][/code]


While testing the cusp library we encountered similar (mabe the same?) wrong results.
We then programmed the sparse matrix multiplication on the CPU and verified that CUBLAS works as expected.
To verify this we wrote an own sparse matrix multiplication kernel which also confirmed that CUBLAS works as expected.

The failure gets worse when using matrices with higher dimensions.

The observed behavior was tested on a GTX 480 (windows + linux) and 8800GTS (linux)

For everyone to test the behavior i attached a zip file with the code + makefile.
To be able to make the program without any issues its important to have the "CUDA Toolkit 3.2 RC" installed
as well as the "GPU Computing SDK code samples". If you can make these examples without a problem you copy the
source to "GPU Computing SDK code samples"/C/src/ and then just go to the src/cuSPARSEcuBLAScompare/ directory and initiate make.
(This step is necessary because our makefile uses the nvidia samples common-makefile).
The resulting executable is in the same folder as the other sample executables.
Hello,



while evaluating cusparse and some other sparse matrix libraries we encountered different

results for the following operation:



A * x



The following simple example matrix A (2,2) multiplied with the given vector X demonstrates this problem:



Matrix A:

A[0] = 0.939129173755645752;

A[1] = 0.799645721912384033;

A[2] = 0.814138710498809814;

A[3] = 0.594497263431549072;



(We are using a matrix with no zero values here. The failure also appears while using other matrices with zero values)



Vector X:

x[0] = 0.657200932502746582;

x[1] = 0.995299935340881348;




the result of the two operations:

cublasSgemv (...);

cusparseScsrmv(...);




is:

cuSPARSEcuBLAScompare test running..

--CUBLAS-- result of A*x: [1.427508711814880371, 1.117230892181396484]

-CUSPARSE- result of A*x: [1.427508831024169922, 1.117231011390686035]






While testing the cusp library we encountered similar (mabe the same?) wrong results.

We then programmed the sparse matrix multiplication on the CPU and verified that CUBLAS works as expected.

To verify this we wrote an own sparse matrix multiplication kernel which also confirmed that CUBLAS works as expected.



The failure gets worse when using matrices with higher dimensions.



The observed behavior was tested on a GTX 480 (windows + linux) and 8800GTS (linux)



For everyone to test the behavior i attached a zip file with the code + makefile.

To be able to make the program without any issues its important to have the "CUDA Toolkit 3.2 RC" installed

as well as the "GPU Computing SDK code samples". If you can make these examples without a problem you copy the

source to "GPU Computing SDK code samples"/C/src/ and then just go to the src/cuSPARSEcuBLAScompare/ directory and initiate make.

(This step is necessary because our makefile uses the nvidia samples common-makefile).

The resulting executable is in the same folder as the other sample executables.

#2
Posted 09/23/2010 08:37 PM   
How is the result surprising given that you use float variables?
How is the result surprising given that you use float variables?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#3
Posted 09/23/2010 08:42 PM   
How is the result surprising given that you use float variables?
How is the result surprising given that you use float variables?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#4
Posted 09/23/2010 08:42 PM   
[quote name='tera' post='1121545' date='Sep 23 2010, 10:42 PM']How is the result surprising given that you use float variables?[/quote]

Its very suprising cause the failure is at 1e^-7 while using a matrix with 4 elements which isnt
a floating point precision failure anymore.
[quote name='tera' post='1121545' date='Sep 23 2010, 10:42 PM']How is the result surprising given that you use float variables?



Its very suprising cause the failure is at 1e^-7 while using a matrix with 4 elements which isnt

a floating point precision failure anymore.

#5
Posted 09/23/2010 08:52 PM   
[quote name='tera' post='1121545' date='Sep 23 2010, 10:42 PM']How is the result surprising given that you use float variables?[/quote]

Its very suprising cause the failure is at 1e^-7 while using a matrix with 4 elements which isnt
a floating point precision failure anymore.
[quote name='tera' post='1121545' date='Sep 23 2010, 10:42 PM']How is the result surprising given that you use float variables?



Its very suprising cause the failure is at 1e^-7 while using a matrix with 4 elements which isnt

a floating point precision failure anymore.

#6
Posted 09/23/2010 08:52 PM   
What would be a tolerable level of inaccuracy for single precision, then?
What would be a tolerable level of inaccuracy for single precision, then?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#7
Posted 09/23/2010 09:17 PM   
What would be a tolerable level of inaccuracy for single precision, then?
What would be a tolerable level of inaccuracy for single precision, then?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#8
Posted 09/23/2010 09:17 PM   
I now think too that the root cause of the difference is the floating point accuracy . After analyzing the cusp spmv multiplication kernel
which is also developed by nvidia? and mabe is used in the cusparse library i think the cause for the difference between cusparse and
cublas is the kernel caching algorithm which looks like it splits the results in more floats before accumulating them to the overall result of one dimension.
This leads in many cases to an enhanced inaccuracy.
I now think too that the root cause of the difference is the floating point accuracy . After analyzing the cusp spmv multiplication kernel

which is also developed by nvidia? and mabe is used in the cusparse library i think the cause for the difference between cusparse and

cublas is the kernel caching algorithm which looks like it splits the results in more floats before accumulating them to the overall result of one dimension.

This leads in many cases to an enhanced inaccuracy.

#9
Posted 09/23/2010 10:37 PM   
I now think too that the root cause of the difference is the floating point accuracy . After analyzing the cusp spmv multiplication kernel
which is also developed by nvidia? and mabe is used in the cusparse library i think the cause for the difference between cusparse and
cublas is the kernel caching algorithm which looks like it splits the results in more floats before accumulating them to the overall result of one dimension.
This leads in many cases to an enhanced inaccuracy.
I now think too that the root cause of the difference is the floating point accuracy . After analyzing the cusp spmv multiplication kernel

which is also developed by nvidia? and mabe is used in the cusparse library i think the cause for the difference between cusparse and

cublas is the kernel caching algorithm which looks like it splits the results in more floats before accumulating them to the overall result of one dimension.

This leads in many cases to an enhanced inaccuracy.

#10
Posted 09/23/2010 10:37 PM   
An enhanced inaccuracy? Suppose I wish to evaluate a+b+c. Is it more accurate to evaluate it as
( a + b ) + c
or as
a + ( b + c )
?
An enhanced inaccuracy? Suppose I wish to evaluate a+b+c. Is it more accurate to evaluate it as

( a + b ) + c

or as

a + ( b + c )

?

#11
Posted 09/23/2010 11:27 PM   
An enhanced inaccuracy? Suppose I wish to evaluate a+b+c. Is it more accurate to evaluate it as
( a + b ) + c
or as
a + ( b + c )
?
An enhanced inaccuracy? Suppose I wish to evaluate a+b+c. Is it more accurate to evaluate it as

( a + b ) + c

or as

a + ( b + c )

?

#12
Posted 09/23/2010 11:27 PM   
I see your point but i think the problem is of the following nature:

you have two floats and you could sum them up in the following way:

[code]float a = 1.0;
float b = 1.0;

result = a + b;
print result;
2.000000000000[/code]

or you could do it in the following way:

[code]float a = 1.0;
float b = 1.0;

for (int i = 0; i < 5000000; i++) {
a = a + 1.1;
}

for (int i = 0; i < 5000000; i++) {
a = a - 1.1;
}

result = a + b;
print result;
2.171823024749755859[/code]

its your decision which one you would choose. i would choose the first one.

my point in this is, that cusparse Scsrmv is more inaccurate than it could be
I see your point but i think the problem is of the following nature:



you have two floats and you could sum them up in the following way:



float a = 1.0;

float b = 1.0;



result = a + b;

print result;

2.000000000000




or you could do it in the following way:



float a = 1.0;

float b = 1.0;



for (int i = 0; i < 5000000; i++) {

a = a + 1.1;

}



for (int i = 0; i < 5000000; i++) {

a = a - 1.1;

}



result = a + b;

print result;

2.171823024749755859




its your decision which one you would choose. i would choose the first one.



my point in this is, that cusparse Scsrmv is more inaccurate than it could be

#13
Posted 09/24/2010 07:44 AM   
I see your point but i think the problem is of the following nature:

you have two floats and you could sum them up in the following way:

[code]float a = 1.0;
float b = 1.0;

result = a + b;
print result;
2.000000000000[/code]

or you could do it in the following way:

[code]float a = 1.0;
float b = 1.0;

for (int i = 0; i < 5000000; i++) {
a = a + 1.1;
}

for (int i = 0; i < 5000000; i++) {
a = a - 1.1;
}

result = a + b;
print result;
2.171823024749755859[/code]

its your decision which one you would choose. i would choose the first one.

my point in this is, that cusparse Scsrmv is more inaccurate than it could be
I see your point but i think the problem is of the following nature:



you have two floats and you could sum them up in the following way:



float a = 1.0;

float b = 1.0;



result = a + b;

print result;

2.000000000000




or you could do it in the following way:



float a = 1.0;

float b = 1.0;



for (int i = 0; i < 5000000; i++) {

a = a + 1.1;

}



for (int i = 0; i < 5000000; i++) {

a = a - 1.1;

}



result = a + b;

print result;

2.171823024749755859




its your decision which one you would choose. i would choose the first one.



my point in this is, that cusparse Scsrmv is more inaccurate than it could be

#14
Posted 09/24/2010 07:44 AM   
Albert Einstein is reported to have once said that

[quote]a man with a watch always knowns exactly what the time is, but a man with two watches is never quite sure.[/quote]

You might find it useful to study the paper at [url="http://docs.sun.com/source/806-3568/ncg_goldberg.html"]this link[/url].
Albert Einstein is reported to have once said that



a man with a watch always knowns exactly what the time is, but a man with two watches is never quite sure.




You might find it useful to study the paper at this link.

#15
Posted 09/24/2010 09:26 AM   
  1 / 2    
Scroll To Top