Multi-frontal direct solver for general sparse matrix

Hi,

New to the CUDA forums and unfortunately, never really had a chance to play with it. However, someone from our team in Shanghai has worked with CUDA for a while and come up with something interesting. I don't want to spam the forums with a product announcement but I would like to do a quick survey of what is currently available to make sure we are doing something really new.

Our main application is semiconductor physics and we do a lot of FEM modeling of very non-linear equations. Because of that, iterative solvers have never been sufficiently powerful to converge reliably with our badly conditioned asymmetric matrices and we rely on direct solvers. In order to parallelize as much as possible, we had our own multi-frontal solver and recently implemented the well-known MUMPS (http://en.wikipedia.org/wiki/MUMPS) solver.

When we started considering using a CUDA version of a multi-frontal solver a few months ago, it seemed like none was available. To the best of my knowledge, only direct solvers for full/banded matrices (CULA LAPACK) and iterative sparse solvers (http://www.acceleware.com/) exist. I know that ANSYS has also ported its own mechanical FEM solver but I do not know if they are using a direct solver or an iterative one.

So what can the experienced CUDA developers out there tell me ? Has anyone else ported a multi-frontal direct solver to CUDA ? I'd like to believe our developer's claims that he is the first but a little due diligence never hurt anyone.

If anyone is interested, I can release some benchmark comparisons to MUMPS.

New to the CUDA forums and unfortunately, never really had a chance to play with it. However, someone from our team in Shanghai has worked with CUDA for a while and come up with something interesting. I don't want to spam the forums with a product announcement but I would like to do a quick survey of what is currently available to make sure we are doing something really new.

Our main application is semiconductor physics and we do a lot of FEM modeling of very non-linear equations. Because of that, iterative solvers have never been sufficiently powerful to converge reliably with our badly conditioned asymmetric matrices and we rely on direct solvers. In order to parallelize as much as possible, we had our own multi-frontal solver and recently implemented the well-known MUMPS (http://en.wikipedia.org/wiki/MUMPS) solver.

When we started considering using a CUDA version of a multi-frontal solver a few months ago, it seemed like none was available. To the best of my knowledge, only direct solvers for full/banded matrices (CULA LAPACK) and iterative sparse solvers (http://www.acceleware.com/) exist. I know that ANSYS has also ported its own mechanical FEM solver but I do not know if they are using a direct solver or an iterative one.

So what can the experienced CUDA developers out there tell me ? Has anyone else ported a multi-frontal direct solver to CUDA ? I'd like to believe our developer's claims that he is the first but a little due diligence never hurt anyone.

If anyone is interested, I can release some benchmark comparisons to MUMPS.

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf
Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes
Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf
Geraud P. Krawezik, Gene Poole
Accelerating the ANSYS Direct Sparse Solver with GPUs

[quote name='mfatica' date='06 December 2010 - 09:47 AM' timestamp='1291657640' post='1156838']
There are a couple of direct sparse solvers that are CUDA accelerated:

The time is in seconds and is the total solver time for our non-linear Newton solver. There are 3 variables per node point so the smallest matrix is n*n with n=0.54E6. I don't have the number of non-zero elements on hand but the largest matrix maxed out the 16 GB of RAM on our test machine. However, our software does more than just the matrix calculations so that may not be a reliable benchmark.

The hardware used is a single C1060 Tesla card on a i7 chip. MUMPS is parallelized only on the i7 cores vs. the hundreds of cores of the Tesla.

The sparse matrix itself is asymmetric and very badly conditioned.

If you can post a link to benchmark data, it will be very useful.

Thanks for the feedback. Here is what we have so far:

Mesh size | 180K | 214K | 230K

-------------+----------------------------

MUMPS | 687 | 1443 | 2023

-------------+----------------------------

GPU-MF | 519 | 698 | 801

-------------+----------------------------

The time is in seconds and is the total solver time for our non-linear Newton solver. There are 3 variables per node point so the smallest matrix is n*n with n=0.54E6. I don't have the number of non-zero elements on hand but the largest matrix maxed out the 16 GB of RAM on our test machine. However, our software does more than just the matrix calculations so that may not be a reliable benchmark.

The hardware used is a single C1060 Tesla card on a i7 chip. MUMPS is parallelized only on the i7 cores vs. the hundreds of cores of the Tesla.

The sparse matrix itself is asymmetric and very badly conditioned.

[quote name='njuffa' date='06 December 2010 - 10:58 AM' timestamp='1291661898' post='1156871']
The following papers may be of interest:

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf
Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes
Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf
Geraud P. Krawezik, Gene Poole
Accelerating the ANSYS Direct Sparse Solver with GPUs
[/quote]

Thanks for the papers. I had some of our developers take a closer look and the key point seems to be GENERAL sparse matrix. From what they tell me, the ANSYS accelerated solver and other GPU solvers we've seen so far are all for symmetric sparse matrices.

Do you know of anyone besides the grusoft link your colleague put up who has worked on an accelerated matrix solver that can handle asymmetric matrices ?

Accelerating the ANSYS Direct Sparse Solver with GPUs

Thanks for the papers. I had some of our developers take a closer look and the key point seems to be GENERAL sparse matrix. From what they tell me, the ANSYS accelerated solver and other GPU solvers we've seen so far are all for symmetric sparse matrices.

Do you know of anyone besides the grusoft link your colleague put up who has worked on an accelerated matrix solver that can handle asymmetric matrices ?

New to the CUDA forums and unfortunately, never really had a chance to play with it. However, someone from our team in Shanghai has worked with CUDA for a while and come up with something interesting. I don't want to spam the forums with a product announcement but I would like to do a quick survey of what is currently available to make sure we are doing something really new.

Our main application is semiconductor physics and we do a lot of FEM modeling of very non-linear equations. Because of that, iterative solvers have never been sufficiently powerful to converge reliably with our badly conditioned asymmetric matrices and we rely on direct solvers. In order to parallelize as much as possible, we had our own multi-frontal solver and recently implemented the well-known MUMPS (http://en.wikipedia.org/wiki/MUMPS) solver.

When we started considering using a CUDA version of a multi-frontal solver a few months ago, it seemed like none was available. To the best of my knowledge, only direct solvers for full/banded matrices (CULA LAPACK) and iterative sparse solvers (http://www.acceleware.com/) exist. I know that ANSYS has also ported its own mechanical FEM solver but I do not know if they are using a direct solver or an iterative one.

So what can the experienced CUDA developers out there tell me ? Has anyone else ported a multi-frontal direct solver to CUDA ? I'd like to believe our developer's claims that he is the first but a little due diligence never hurt anyone.

If anyone is interested, I can release some benchmark comparisons to MUMPS.

Regards,

Michel Lestrade

Crosslight Software

New to the CUDA forums and unfortunately, never really had a chance to play with it. However, someone from our team in Shanghai has worked with CUDA for a while and come up with something interesting. I don't want to spam the forums with a product announcement but I would like to do a quick survey of what is currently available to make sure we are doing something really new.

Our main application is semiconductor physics and we do a lot of FEM modeling of very non-linear equations. Because of that, iterative solvers have never been sufficiently powerful to converge reliably with our badly conditioned asymmetric matrices and we rely on direct solvers. In order to parallelize as much as possible, we had our own multi-frontal solver and recently implemented the well-known MUMPS (http://en.wikipedia.org/wiki/MUMPS) solver.

When we started considering using a CUDA version of a multi-frontal solver a few months ago, it seemed like none was available. To the best of my knowledge, only direct solvers for full/banded matrices (CULA LAPACK) and iterative sparse solvers (http://www.acceleware.com/) exist. I know that ANSYS has also ported its own mechanical FEM solver but I do not know if they are using a direct solver or an iterative one.

So what can the experienced CUDA developers out there tell me ? Has anyone else ported a multi-frontal direct solver to CUDA ? I'd like to believe our developer's claims that he is the first but a little due diligence never hurt anyone.

If anyone is interested, I can release some benchmark comparisons to MUMPS.

Regards,

Michel Lestrade

Crosslight Software

Michel Lestrade

Crosslight Software

Michel Lestrade

Crosslight Software

1)BCSLIB-EXT http://www.aanalytics.com/products.htm

2) http://www.grusoft.com/GSS.htm

If you can post a link to benchmark data, it will be very useful.

1)BCSLIB-EXT http://www.aanalytics.com/products.htm

2) http://www.grusoft.com/GSS.htm

If you can post a link to benchmark data, it will be very useful.

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf

Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes

Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf

Geraud P. Krawezik, Gene Poole

Accelerating the ANSYS Direct Sparse Solver with GPUs

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf

Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes

Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf

Geraud P. Krawezik, Gene Poole

Accelerating the ANSYS Direct Sparse Solver with GPUs

There are a couple of direct sparse solvers that are CUDA accelerated:

1)BCSLIB-EXT http://www.aanalytics.com/products.htm

2) http://www.grusoft.com/GSS.htm

If you can post a link to benchmark data, it will be very useful.

[/quote]

Thanks for the feedback. Here is what we have so far:

Mesh size | 180K | 214K | 230K

-------------+----------------------------

MUMPS | 687 | 1443 | 2023

-------------+----------------------------

GPU-MF | 519 | 698 | 801

-------------+----------------------------

The time is in seconds and is the total solver time for our non-linear Newton solver. There are 3 variables per node point so the smallest matrix is n*n with n=0.54E6. I don't have the number of non-zero elements on hand but the largest matrix maxed out the 16 GB of RAM on our test machine. However, our software does more than just the matrix calculations so that may not be a reliable benchmark.

The hardware used is a single C1060 Tesla card on a i7 chip. MUMPS is parallelized only on the i7 cores vs. the hundreds of cores of the Tesla.

The sparse matrix itself is asymmetric and very badly conditioned.

There are a couple of direct sparse solvers that are CUDA accelerated:

1)BCSLIB-EXT http://www.aanalytics.com/products.htm

2) http://www.grusoft.com/GSS.htm

If you can post a link to benchmark data, it will be very useful.

Thanks for the feedback. Here is what we have so far:

Mesh size | 180K | 214K | 230K

-------------+----------------------------

MUMPS | 687 | 1443 | 2023

-------------+----------------------------

GPU-MF | 519 | 698 | 801

-------------+----------------------------

The time is in seconds and is the total solver time for our non-linear Newton solver. There are 3 variables per node point so the smallest matrix is n*n with n=0.54E6. I don't have the number of non-zero elements on hand but the largest matrix maxed out the 16 GB of RAM on our test machine. However, our software does more than just the matrix calculations so that may not be a reliable benchmark.

The hardware used is a single C1060 Tesla card on a i7 chip. MUMPS is parallelized only on the i7 cores vs. the hundreds of cores of the Tesla.

The sparse matrix itself is asymmetric and very badly conditioned.

The following papers may be of interest:

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf

Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes

Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf

Geraud P. Krawezik, Gene Poole

Accelerating the ANSYS Direct Sparse Solver with GPUs

[/quote]

Thanks for the papers. I had some of our developers take a closer look and the key point seems to be GENERAL sparse matrix. From what they tell me, the ANSYS accelerated solver and other GPU solvers we've seen so far are all for symmetric sparse matrices.

Do you know of anyone besides the grusoft link your colleague put up who has worked on an accelerated matrix solver that can handle asymmetric matrices ?

The following papers may be of interest:

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf

Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes

Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf

Geraud P. Krawezik, Gene Poole

Accelerating the ANSYS Direct Sparse Solver with GPUs

Thanks for the papers. I had some of our developers take a closer look and the key point seems to be GENERAL sparse matrix. From what they tell me, the ANSYS accelerated solver and other GPU solvers we've seen so far are all for symmetric sparse matrices.

Do you know of anyone besides the grusoft link your colleague put up who has worked on an accelerated matrix solver that can handle asymmetric matrices ?