Multi-frontal direct solver for general sparse matrix
Hi,

New to the CUDA forums and unfortunately, never really had a chance to play with it. However, someone from our team in Shanghai has worked with CUDA for a while and come up with something interesting. I don't want to spam the forums with a product announcement but I would like to do a quick survey of what is currently available to make sure we are doing something really new.

Our main application is semiconductor physics and we do a lot of FEM modeling of very non-linear equations. Because of that, iterative solvers have never been sufficiently powerful to converge reliably with our badly conditioned asymmetric matrices and we rely on direct solvers. In order to parallelize as much as possible, we had our own multi-frontal solver and recently implemented the well-known MUMPS (http://en.wikipedia.org/wiki/MUMPS) solver.

When we started considering using a CUDA version of a multi-frontal solver a few months ago, it seemed like none was available. To the best of my knowledge, only direct solvers for full/banded matrices (CULA LAPACK) and iterative sparse solvers (http://www.acceleware.com/) exist. I know that ANSYS has also ported its own mechanical FEM solver but I do not know if they are using a direct solver or an iterative one.

So what can the experienced CUDA developers out there tell me ? Has anyone else ported a multi-frontal direct solver to CUDA ? I'd like to believe our developer's claims that he is the first but a little due diligence never hurt anyone.

If anyone is interested, I can release some benchmark comparisons to MUMPS.

Regards,

Michel Lestrade
Crosslight Software
Hi,



New to the CUDA forums and unfortunately, never really had a chance to play with it. However, someone from our team in Shanghai has worked with CUDA for a while and come up with something interesting. I don't want to spam the forums with a product announcement but I would like to do a quick survey of what is currently available to make sure we are doing something really new.



Our main application is semiconductor physics and we do a lot of FEM modeling of very non-linear equations. Because of that, iterative solvers have never been sufficiently powerful to converge reliably with our badly conditioned asymmetric matrices and we rely on direct solvers. In order to parallelize as much as possible, we had our own multi-frontal solver and recently implemented the well-known MUMPS (http://en.wikipedia.org/wiki/MUMPS) solver.



When we started considering using a CUDA version of a multi-frontal solver a few months ago, it seemed like none was available. To the best of my knowledge, only direct solvers for full/banded matrices (CULA LAPACK) and iterative sparse solvers (http://www.acceleware.com/) exist. I know that ANSYS has also ported its own mechanical FEM solver but I do not know if they are using a direct solver or an iterative one.



So what can the experienced CUDA developers out there tell me ? Has anyone else ported a multi-frontal direct solver to CUDA ? I'd like to believe our developer's claims that he is the first but a little due diligence never hurt anyone.



If anyone is interested, I can release some benchmark comparisons to MUMPS.



Regards,



Michel Lestrade

Crosslight Software

#1
Posted 12/06/2010 05:37 PM   
To board moderators; please delete my duplicate posts. Server was giving me HTTP 500 errors ...

Michel Lestrade
Crosslight Software
To board moderators; please delete my duplicate posts. Server was giving me HTTP 500 errors ...



Michel Lestrade

Crosslight Software

#2
Posted 12/06/2010 05:39 PM   
There are a couple of direct sparse solvers that are CUDA accelerated:

1)BCSLIB-EXT http://www.aanalytics.com/products.htm

2) http://www.grusoft.com/GSS.htm

If you can post a link to benchmark data, it will be very useful.
There are a couple of direct sparse solvers that are CUDA accelerated:



1)BCSLIB-EXT http://www.aanalytics.com/products.htm



2) http://www.grusoft.com/GSS.htm



If you can post a link to benchmark data, it will be very useful.

#3
Posted 12/06/2010 05:47 PM   
The following papers may be of interest:

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf
Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes
Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf
Geraud P. Krawezik, Gene Poole
Accelerating the ANSYS Direct Sparse Solver with GPUs
The following papers may be of interest:



http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf

Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes

Multifrontal Computations on GPUs and Their Multi-core Hosts



http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf

Geraud P. Krawezik, Gene Poole

Accelerating the ANSYS Direct Sparse Solver with GPUs

#4
Posted 12/06/2010 06:58 PM   
[quote name='mfatica' date='06 December 2010 - 09:47 AM' timestamp='1291657640' post='1156838']
There are a couple of direct sparse solvers that are CUDA accelerated:

1)BCSLIB-EXT http://www.aanalytics.com/products.htm

2) http://www.grusoft.com/GSS.htm

If you can post a link to benchmark data, it will be very useful.
[/quote]

Thanks for the feedback. Here is what we have so far:

Mesh size | 180K | 214K | 230K
-------------+----------------------------
MUMPS | 687 | 1443 | 2023
-------------+----------------------------
GPU-MF | 519 | 698 | 801
-------------+----------------------------

The time is in seconds and is the total solver time for our non-linear Newton solver. There are 3 variables per node point so the smallest matrix is n*n with n=0.54E6. I don't have the number of non-zero elements on hand but the largest matrix maxed out the 16 GB of RAM on our test machine. However, our software does more than just the matrix calculations so that may not be a reliable benchmark.

The hardware used is a single C1060 Tesla card on a i7 chip. MUMPS is parallelized only on the i7 cores vs. the hundreds of cores of the Tesla.

The sparse matrix itself is asymmetric and very badly conditioned.
[quote name='mfatica' date='06 December 2010 - 09:47 AM' timestamp='1291657640' post='1156838']

There are a couple of direct sparse solvers that are CUDA accelerated:



1)BCSLIB-EXT http://www.aanalytics.com/products.htm



2) http://www.grusoft.com/GSS.htm



If you can post a link to benchmark data, it will be very useful.





Thanks for the feedback. Here is what we have so far:



Mesh size | 180K | 214K | 230K

-------------+----------------------------

MUMPS | 687 | 1443 | 2023

-------------+----------------------------

GPU-MF | 519 | 698 | 801

-------------+----------------------------



The time is in seconds and is the total solver time for our non-linear Newton solver. There are 3 variables per node point so the smallest matrix is n*n with n=0.54E6. I don't have the number of non-zero elements on hand but the largest matrix maxed out the 16 GB of RAM on our test machine. However, our software does more than just the matrix calculations so that may not be a reliable benchmark.



The hardware used is a single C1060 Tesla card on a i7 chip. MUMPS is parallelized only on the i7 cores vs. the hundreds of cores of the Tesla.



The sparse matrix itself is asymmetric and very badly conditioned.

#5
Posted 12/06/2010 07:15 PM   
[quote name='njuffa' date='06 December 2010 - 10:58 AM' timestamp='1291661898' post='1156871']
The following papers may be of interest:

http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf
Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes
Multifrontal Computations on GPUs and Their Multi-core Hosts

http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf
Geraud P. Krawezik, Gene Poole
Accelerating the ANSYS Direct Sparse Solver with GPUs
[/quote]

Thanks for the papers. I had some of our developers take a closer look and the key point seems to be GENERAL sparse matrix. From what they tell me, the ANSYS accelerated solver and other GPU solvers we've seen so far are all for symmetric sparse matrices.

Do you know of anyone besides the grusoft link your colleague put up who has worked on an accelerated matrix solver that can handle asymmetric matrices ?
[quote name='njuffa' date='06 December 2010 - 10:58 AM' timestamp='1291661898' post='1156871']

The following papers may be of interest:



http://www.isi.edu/~ddavis/JESPP/2010_Papers/VECPAR10/vecpar2010.pdf

Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes

Multifrontal Computations on GPUs and Their Multi-core Hosts



http://saahpc.ncsa.illinois.edu/09/papers/Krawezik_paper.pdf

Geraud P. Krawezik, Gene Poole

Accelerating the ANSYS Direct Sparse Solver with GPUs





Thanks for the papers. I had some of our developers take a closer look and the key point seems to be GENERAL sparse matrix. From what they tell me, the ANSYS accelerated solver and other GPU solvers we've seen so far are all for symmetric sparse matrices.



Do you know of anyone besides the grusoft link your colleague put up who has worked on an accelerated matrix solver that can handle asymmetric matrices ?

#6
Posted 12/07/2010 05:44 PM   
Scroll To Top