Matrix Inversion source
Hi,

I want to share my matrix inversion. It is based on an blocked Gauß elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". [url="http://www.scai.fraunhofer.de/fileadmin/ArbeitsgruppeTrottenberg/SS08/seminar/Seminar_Heinrich.pdf"]Gauselimination[/url]
I build the rest to invert a matrix.

This is one of my first CUDA projects, maybe there are already better implementations but i didn't found one.

I Used the VS_Project Wizard 1.2 with a custom cuda.rules (included in the zip). There is an test envirement for matlab included, so you can easily load it as a library.

The results where the expected one, i got a performance boost if the matrixsize
is bigger than 128x128.

I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)


I run my tests with a
Asus Gforce GTX260 (192shader@1,24GHz)
AMD x2 3800+ am2 (2,0GHz)

Results:
small matrix sizes
[attachment=7627:attachment]
larger matrix sizes
[attachment=7628:attachment]

Ch. Wagner
---
update: documentation
Hi,



I want to share my matrix inversion. It is based on an blocked Gauß elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". Gauselimination

I build the rest to invert a matrix.



This is one of my first CUDA projects, maybe there are already better implementations but i didn't found one.



I Used the VS_Project Wizard 1.2 with a custom cuda.rules (included in the zip). There is an test envirement for matlab included, so you can easily load it as a library.



The results where the expected one, i got a performance boost if the matrixsize

is bigger than 128x128.



I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)





I run my tests with a

Asus Gforce GTX260 (192shader@1,24GHz)

AMD x2 3800+ am2 (2,0GHz)



Results:

small matrix sizes

[attachment=7627:attachment]

larger matrix sizes

[attachment=7628:attachment]



Ch. Wagner

---

update: documentation

#1
Posted 10/21/2008 02:01 PM   
Hey,
Nice work! Just wanted to know if you have ever written anything for blocking of 3d matrices? Any reference code u have available?

Thanks,
Vandhan

[quote name='laplasch' date='Oct 21 2008, 06:01 AM']Hi,

I want to share my matrix inversion. It is based on an blocked Gauß elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". [url="http://www.scai.fraunhofer.de/fileadmin/ArbeitsgruppeTrottenberg/SS08/seminar/Seminar_Heinrich.pdf"]Gauselimination[/url]
I build the rest to invert a matrix.

This is one of my first CUDA projects, maybe there are already better implementations but i didn't found one.

I Used the VS_Project Wizard 1.2 with a custom cuda.rules (included in the zip). There is an test envirement for matlab included, so you can easily load it as a library.

The results where the expected one, i got a performance boost if the matrixsize
is bigger than 128x128.

I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)
I run my tests with a
Asus Gforce GTX260 (192shader@1,24GHz)
AMD x2 3800+ am2 (2,0GHz)

Results:
small matrix sizes
[attachment=9849:attachment]
larger matrix sizes
[attachment=9850:attachment]

Ch. Wagner
[right][snapback]454690[/snapback][/right]
[/quote]
Hey,

Nice work! Just wanted to know if you have ever written anything for blocking of 3d matrices? Any reference code u have available?



Thanks,

Vandhan



[quote name='laplasch' date='Oct 21 2008, 06:01 AM']Hi,



I want to share my matrix inversion. It is based on an blocked Gauß elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". Gauselimination

I build the rest to invert a matrix.



This is one of my first CUDA projects, maybe there are already better implementations but i didn't found one.



I Used the VS_Project Wizard 1.2 with a custom cuda.rules (included in the zip). There is an test envirement for matlab included, so you can easily load it as a library.



The results where the expected one, i got a performance boost if the matrixsize

is bigger than 128x128.



I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)

I run my tests with a

Asus Gforce GTX260 (192shader@1,24GHz)

AMD x2 3800+ am2 (2,0GHz)



Results:

small matrix sizes

[attachment=9849:attachment]

larger matrix sizes

[attachment=9850:attachment]



Ch. Wagner

[snapback]454690[/snapback]


#2
Posted 10/21/2008 05:01 PM   
I'm guessing the performance charts are using timings with a straight-forward (not highly optimized) CPU implementation? Otherwise I don't follow why Matlab would be that much faster. Do you happen to have timing results with some optimized math library (along the lines of MKL, etc.)?

Paulius
I'm guessing the performance charts are using timings with a straight-forward (not highly optimized) CPU implementation? Otherwise I don't follow why Matlab would be that much faster. Do you happen to have timing results with some optimized math library (along the lines of MKL, etc.)?



Paulius

#3
Posted 10/21/2008 05:08 PM   
[quote name='callingearthling' date='Oct 21 2008, 09:01 AM']Hey,
Nice work! Just wanted to know if you have ever written anything for blocking of 3d matrices? Any reference code u have available?

Thanks,
Vandhan
[right][snapback]454743[/snapback][/right]
[/quote]
Sorry, i havent anything about solving 3D matrices. I am happy that i finished work with the 2D stuff /smile2.gif' class='bbc_emoticon' alt=':))' />

[quote name='paulius' date='Oct 21 2008, 09:08 AM']I'm guessing the performance charts are using timings with a straight-forward (not highly optimized) CPU implementation?  Otherwise I don't follow why Matlab would be that much faster.  Do you happen to have timing results with some optimized math library (along the lines of MKL, etc.)?

Paulius
[right][snapback]454748[/snapback][/right]
[/quote]
Yes the CPU implementation isn't optimzed, i used the standard MS c compiler and there are no SIMD optimisations in the code. You can take a look at the reference c code in the zip archiv. I also surpriced that the MatLab version is very fast. In the MatLab documentaion is a reference that they are using the LAPACK library for the calculation of the inverse matrix. I thought that LAPACK is a fully optimized library for algebra problems. I will complete my docu. and will add some messurement with other optimized libraries and other GPUs.
[quote name='callingearthling' date='Oct 21 2008, 09:01 AM']Hey,

Nice work! Just wanted to know if you have ever written anything for blocking of 3d matrices? Any reference code u have available?



Thanks,

Vandhan

[snapback]454743[/snapback]




Sorry, i havent anything about solving 3D matrices. I am happy that i finished work with the 2D stuff /smile2.gif' class='bbc_emoticon' alt=':))' />



[quote name='paulius' date='Oct 21 2008, 09:08 AM']I'm guessing the performance charts are using timings with a straight-forward (not highly optimized) CPU implementation?  Otherwise I don't follow why Matlab would be that much faster.  Do you happen to have timing results with some optimized math library (along the lines of MKL, etc.)?



Paulius

[snapback]454748[/snapback]




Yes the CPU implementation isn't optimzed, i used the standard MS c compiler and there are no SIMD optimisations in the code. You can take a look at the reference c code in the zip archiv. I also surpriced that the MatLab version is very fast. In the MatLab documentaion is a reference that they are using the LAPACK library for the calculation of the inverse matrix. I thought that LAPACK is a fully optimized library for algebra problems. I will complete my docu. and will add some messurement with other optimized libraries and other GPUs.

#4
Posted 10/22/2008 08:49 AM   
i tried compiling ur code but got an error "cutil32D.dll missing" even though the file is present.
do you have a code in which cpu and gpu code are in the same file? i'm not good with the project you have posted. i'm looking for a code which compute inversion on cpu and gpu in the same code.
i tried compiling ur code but got an error "cutil32D.dll missing" even though the file is present.

do you have a code in which cpu and gpu code are in the same file? i'm not good with the project you have posted. i'm looking for a code which compute inversion on cpu and gpu in the same code.

#5
Posted 03/15/2010 05:01 AM   
I got your code to compile fine, but your test example is flawed, the memory addressing wasn't correct. when fixed, the code no longer validates on the inverse of the identity matrix. I changed your matrix generation code to:

tmp = dataInput;
for(int i =0; i <size;i++) {
for (int j=0; j < size;j++) {
if (i==j) {
tmp[i*size+j]=1;
} else {
tmp[i*size+j]=0;
}
}
}

The error gets better as the number increases, but at this time it is not working correctly. I will look into your code and to compare and contrast.

Shan
I got your code to compile fine, but your test example is flawed, the memory addressing wasn't correct. when fixed, the code no longer validates on the inverse of the identity matrix. I changed your matrix generation code to:



tmp = dataInput;

for(int i =0; i <size;i++) {

for (int j=0; j < size;j++) {

if (i==j) {

tmp[i*size+j]=1;

} else {

tmp[i*size+j]=0;

}

}

}



The error gets better as the number increases, but at this time it is not working correctly. I will look into your code and to compare and contrast.



Shan

#6
Posted 02/02/2011 07:31 PM   
I found a way to "**** up" CULA in order to create inverse matrix in free edition(normally that function available in standard edition) for dense matrices. Everything is pretty simple. There is available function, that solves linear equations of type A * X = B(1) called GESV. Where is B is NOT vector, but matrix. Therefor that function solves multiple equations with different right parts, specified in columns of matrix B.
So let's get down to business. It is known that A*A^(-1) = E(2). Where is E -- matrix with ones in diagonal. Let's assume that X in formula (1) is A^(-1) and B in formula (1) is E. Voila!
Of course this is obvious thing. But maybe certain folks don't about know that :)
Source code:
[code]
# include <cula.h>

# define THREADS 512

__global__ void generateEye(float *E, int size);

int matrixInvert(float *devInput, float *devOutput, int size){
int *ipiv;
float *cache;
int size2 = size * size * sizeof(float);
int blocks = (size * size) / THREADS + 1;
cudaMalloc(&ipiv, size2);
cudaMalloc(&cache, size2);
generateEye<<<blocks, THREADS>>>(devOutput, size);
cudaMemcpy(cache, devInput, size2, cudaMemcpyDeviceToDevice);
culaInitialize();
culaDeviceSgesv(size, size, cache, size, ipiv, devOutput, size);
culaShutdown();
cudaFree(ipiv);
cudaFree(cache);
return 0;
}

__global__ void generateEye(float *E, int size){
int offset = blockIdx.x * blockDim.x + threadIdx.x;
int y = offset / size;
int x = offset - y * size;
if(offset > size * size) return;
if(x == y) E[offset] = 1;
else E[offset] = 0;
}

[/code]

CULA is fin tool. But i prefer to not pay for software. Greetings from Ukraine :)
I found a way to "**** up" CULA in order to create inverse matrix in free edition(normally that function available in standard edition) for dense matrices. Everything is pretty simple. There is available function, that solves linear equations of type A * X = B(1) called GESV. Where is B is NOT vector, but matrix. Therefor that function solves multiple equations with different right parts, specified in columns of matrix B.

So let's get down to business. It is known that A*A^(-1) = E(2). Where is E -- matrix with ones in diagonal. Let's assume that X in formula (1) is A^(-1) and B in formula (1) is E. Voila!

Of course this is obvious thing. But maybe certain folks don't about know that :)

Source code:



# include <cula.h>



# define THREADS 512



__global__ void generateEye(float *E, int size);



int matrixInvert(float *devInput, float *devOutput, int size){

int *ipiv;

float *cache;

int size2 = size * size * sizeof(float);

int blocks = (size * size) / THREADS + 1;

cudaMalloc(&ipiv, size2);

cudaMalloc(&cache, size2);

generateEye<<<blocks, THREADS>>>(devOutput, size);

cudaMemcpy(cache, devInput, size2, cudaMemcpyDeviceToDevice);

culaInitialize();

culaDeviceSgesv(size, size, cache, size, ipiv, devOutput, size);

culaShutdown();

cudaFree(ipiv);

cudaFree(cache);

return 0;

}



__global__ void generateEye(float *E, int size){

int offset = blockIdx.x * blockDim.x + threadIdx.x;

int y = offset / size;

int x = offset - y * size;

if(offset > size * size) return;

if(x == y) E[offset] = 1;

else E[offset] = 0;

}






CULA is fin tool. But i prefer to not pay for software. Greetings from Ukraine :)

#7
Posted 11/13/2011 11:21 PM   
Hi, I downloaded your method, but I'd like to implement one by myself but the doc files are written in German, Is there a way I could get a copy from the documentation file in English? And also I would like to contact you by e-mail
Hi, I downloaded your method, but I'd like to implement one by myself but the doc files are written in German, Is there a way I could get a copy from the documentation file in English? And also I would like to contact you by e-mail

#8
Posted 03/06/2012 09:58 PM   
Scroll To Top

Add Reply