GET STARTED

GET INVOLVED

Authorization Required

Not a member? Register Now

I want to share my matrix inversion. It is based on an blocked GauÃŸ elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". [url="http://www.scai.fraunhofer.de/fileadmin/ArbeitsgruppeTrottenberg/SS08/seminar/Seminar_Heinrich.pdf"]Gauselimination[/url]

I build the rest to invert a matrix.

This is one of my first CUDA projects, maybe there are already better implementations but i didn't found one.

I Used the VS_Project Wizard 1.2 with a custom cuda.rules (included in the zip). There is an test envirement for matlab included, so you can easily load it as a library.

The results where the expected one, i got a performance boost if the matrixsize

is bigger than 128x128.

I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)

I run my tests with a

Asus Gforce GTX260 (192shader@1,24GHz)

AMD x2 3800+ am2 (2,0GHz)

Results:

small matrix sizes

[attachment=7627:attachment]

larger matrix sizes

[attachment=7628:attachment]

Ch. Wagner

---

update: documentation

I want to share my matrix inversion. It is based on an blocked GauÃŸ elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". Gauselimination

I build the rest to invert a matrix.

This is one of my first CUDA projects, maybe there are already better implementations but i didn't found one.

I Used the VS_Project Wizard 1.2 with a custom cuda.rules (included in the zip). There is an test envirement for matlab included, so you can easily load it as a library.

The results where the expected one, i got a performance boost if the matrixsize

is bigger than 128x128.

I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)

I run my tests with a

Asus Gforce GTX260 (192shader@1,24GHz)

AMD x2 3800+ am2 (2,0GHz)

Results:

small matrix sizes

[attachment=7627:attachment]

larger matrix sizes

[attachment=7628:attachment]

Ch. Wagner

---

update: documentation

Nice work! Just wanted to know if you have ever written anything for blocking of 3d matrices? Any reference code u have available?

Thanks,

Vandhan

[quote name='laplasch' date='Oct 21 2008, 06:01 AM']Hi,

I want to share my matrix inversion. It is based on an blocked GauÃŸ elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". [url="http://www.scai.fraunhofer.de/fileadmin/ArbeitsgruppeTrottenberg/SS08/seminar/Seminar_Heinrich.pdf"]Gauselimination[/url]

I build the rest to invert a matrix.

This is one of my first CUDA projects, maybe there are already better implementations but i didn't found one.

I Used the VS_Project Wizard 1.2 with a custom cuda.rules (included in the zip). There is an test envirement for matlab included, so you can easily load it as a library.

The results where the expected one, i got a performance boost if the matrixsize

is bigger than 128x128.

I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)

I run my tests with a

Asus Gforce GTX260 (192shader@1,24GHz)

AMD x2 3800+ am2 (2,0GHz)

Results:

small matrix sizes

[attachment=9849:attachment]

larger matrix sizes

[attachment=9850:attachment]

Ch. Wagner

[right][snapback]454690[/snapback][/right]

[/quote]

Nice work! Just wanted to know if you have ever written anything for blocking of 3d matrices? Any reference code u have available?

Thanks,

Vandhan

[quote name='laplasch' date='Oct 21 2008, 06:01 AM']Hi,

I want to share my matrix inversion. It is based on an blocked GauÃŸ elimination Algorithm. I found this algorithm in an ppt from "Christian Heinrich". Gauselimination

I build the rest to invert a matrix.

The results where the expected one, i got a performance boost if the matrixsize

is bigger than 128x128.

I need this algorithm to solve an matrix equation. So if you got an faster one so tell me,please =)

I run my tests with a

Asus Gforce GTX260 (192shader@1,24GHz)

AMD x2 3800+ am2 (2,0GHz)

Results:

small matrix sizes

[attachment=9849:attachment]

larger matrix sizes

[attachment=9850:attachment]

Ch. Wagner

Paulius

Paulius

Nice work! Just wanted to know if you have ever written anything for blocking of 3d matrices? Any reference code u have available?

Thanks,

Vandhan

[right][snapback]454743[/snapback][/right]

[/quote]

Sorry, i havent anything about solving 3D matrices. I am happy that i finished work with the 2D stuff /smile2.gif' class='bbc_emoticon' alt=':))' />

[quote name='paulius' date='Oct 21 2008, 09:08 AM']I'm guessing the performance charts are using timings with a straight-forward (not highly optimized) CPU implementation? Otherwise I don't follow why Matlab would be that much faster. Do you happen to have timing results with some optimized math library (along the lines of MKL, etc.)?

Paulius

[right][snapback]454748[/snapback][/right]

[/quote]

Yes the CPU implementation isn't optimzed, i used the standard MS c compiler and there are no SIMD optimisations in the code. You can take a look at the reference c code in the zip archiv. I also surpriced that the MatLab version is very fast. In the MatLab documentaion is a reference that they are using the LAPACK library for the calculation of the inverse matrix. I thought that LAPACK is a fully optimized library for algebra problems. I will complete my docu. and will add some messurement with other optimized libraries and other GPUs.

Thanks,

Vandhan

Sorry, i havent anything about solving 3D matrices. I am happy that i finished work with the 2D stuff /smile2.gif' class='bbc_emoticon' alt=':))' />

[quote name='paulius' date='Oct 21 2008, 09:08 AM']I'm guessing the performance charts are using timings with a straight-forward (not highly optimized) CPU implementation? Otherwise I don't follow why Matlab would be that much faster. Do you happen to have timing results with some optimized math library (along the lines of MKL, etc.)?

Paulius

Yes the CPU implementation isn't optimzed, i used the standard MS c compiler and there are no SIMD optimisations in the code. You can take a look at the reference c code in the zip archiv. I also surpriced that the MatLab version is very fast. In the MatLab documentaion is a reference that they are using the LAPACK library for the calculation of the inverse matrix. I thought that LAPACK is a fully optimized library for algebra problems. I will complete my docu. and will add some messurement with other optimized libraries and other GPUs.

do you have a code in which cpu and gpu code are in the same file? i'm not good with the project you have posted. i'm looking for a code which compute inversion on cpu and gpu in the same code.

do you have a code in which cpu and gpu code are in the same file? i'm not good with the project you have posted. i'm looking for a code which compute inversion on cpu and gpu in the same code.

tmp = dataInput;

for(int i =0; i <size;i++) {

for (int j=0; j < size;j++) {

if (i==j) {

tmp[i*size+j]=1;

} else {

tmp[i*size+j]=0;

}

}

}

The error gets better as the number increases, but at this time it is not working correctly. I will look into your code and to compare and contrast.

Shan

tmp = dataInput;

for(int i =0; i <size;i++) {

for (int j=0; j < size;j++) {

if (i==j) {

tmp[i*size+j]=1;

} else {

tmp[i*size+j]=0;

}

}

}

The error gets better as the number increases, but at this time it is not working correctly. I will look into your code and to compare and contrast.

Shan

So let's get down to business. It is known that A*A^(-1) = E(2). Where is E -- matrix with ones in diagonal. Let's assume that X in formula (1) is A^(-1) and B in formula (1) is E. Voila!

Of course this is obvious thing. But maybe certain folks don't about know that :)

Source code:

[code]

# include <cula.h>

# define THREADS 512

__global__ void generateEye(float *E, int size);

int matrixInvert(float *devInput, float *devOutput, int size){

int *ipiv;

float *cache;

int size2 = size * size * sizeof(float);

int blocks = (size * size) / THREADS + 1;

cudaMalloc(&ipiv, size2);

cudaMalloc(&cache, size2);

generateEye<<<blocks, THREADS>>>(devOutput, size);

cudaMemcpy(cache, devInput, size2, cudaMemcpyDeviceToDevice);

culaInitialize();

culaDeviceSgesv(size, size, cache, size, ipiv, devOutput, size);

culaShutdown();

cudaFree(ipiv);

cudaFree(cache);

return 0;

}

__global__ void generateEye(float *E, int size){

int offset = blockIdx.x * blockDim.x + threadIdx.x;

int y = offset / size;

int x = offset - y * size;

if(offset > size * size) return;

if(x == y) E[offset] = 1;

else E[offset] = 0;

}

[/code]

CULA is fin tool. But i prefer to not pay for software. Greetings from Ukraine :)

So let's get down to business. It is known that A*A^(-1) = E(2). Where is E -- matrix with ones in diagonal. Let's assume that X in formula (1) is A^(-1) and B in formula (1) is E. Voila!

Of course this is obvious thing. But maybe certain folks don't about know that :)

Source code:

CULA is fin tool. But i prefer to not pay for software. Greetings from Ukraine :)