The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:
Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76
Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77
Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:
https://nvdeveloper.nvidia.com

Is there some simple way to make this code run using single precision complex numbers?

I'm currently working on a batched solver my self for small (dim 2-32) positive definite systems using Cholesky decomposition.
Is there any plans to extend this code to support these kind of systems?

Thanks, but I was more thinking about all this config optimization.
Complex float would have the same config as double, so I just copy-paste that one.
Will the same config be optimal for non-complex float as well? With float you could fit more into shared memory, and therefore support larger matrices, so some changes to the launch config must be done.

Thanks, but I was more thinking about all this config optimization.

Complex float would have the same config as double, so I just copy-paste that one.

Will the same config be optimal for non-complex float as well? With float you could fit more into shared memory, and therefore support larger matrices, so some changes to the launch config must be done.

[quote name='mfatica' date='21 September 2011 - 10:29 PM' timestamp='1316615386' post='1296427']
The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:
Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76
Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77
Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:
https://nvdeveloper.nvidia.com
[/quote]
Hi!
I'm a fortran programmer. I call batched solver through fortran's interface, and I think it works perfectly. But I wonder if you can provide a batched solver which can solve least square problem. That will be helpful to me. Also I want you add single precision suport, that will be perfect!

[quote name='mfatica' date='21 September 2011 - 10:29 PM' timestamp='1316615386' post='1296427']

The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

Hi!

I'm a fortran programmer. I call batched solver through fortran's interface, and I think it works perfectly. But I wonder if you can provide a batched solver which can solve least square problem. That will be helpful to me. Also I want you add single precision suport, that will be perfect!

I can't view the code, but i'm alredy registered on Nvidia developer Zone, but i can't sign in on the link you provide. Where can I do the registration?

I can't view the code, but i'm alredy registered on Nvidia developer Zone, but i can't sign in on the link you provide. Where can I do the registration?

Here is the process I followed when I signed up recently. It sounds like you already completed the first step.

1) Get a DevZone ID
a. http://developer.nvidia.com Top Right either new account or if you have a devzone account just login.

2) Once logged in go to â€œmy accountâ€ top right
a. Complete the Basic Profile, and then complete the CUDA registered developer form.

3) You will receive email confirmations and will usually be approved within one business day

4) Once approved the CUDA registered developer program home page is accessible via the â€œMy Accountâ€ - the program name will be green and is a hyperlink.

2) Once logged in go to â€œmy accountâ€ top right

a. Complete the Basic Profile, and then complete the CUDA registered developer form.

3) You will receive email confirmations and will usually be approved within one business day

4) Once approved the CUDA registered developer program home page is accessible via the â€œMy Accountâ€ - the program name will be green and is a hyperlink.

[quote name='mfatica' date='22 September 2011 - 01:29 AM' timestamp='1316615386' post='1296427']
The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:
Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76
Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77
Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:
https://nvdeveloper.nvidia.com
[/quote]
Hi I have registered as a Basic Registered Developer, I have registered for this form and I have applied for the CUDA/GPU Computing Registered Developer Program. When I click on the link given above; https://nvdeveloper.nvidia.com , I get to a login screen but it does not recognise my login or password. After trying several times it appears I am being blocked, I cannot see the login page, I am now getting "The connection has timed out , The server at nvdeveloper.nvidia.com is taking too long to respond."

[quote name='mfatica' date='22 September 2011 - 01:29 AM' timestamp='1316615386' post='1296427']

The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

Hi I have registered as a Basic Registered Developer, I have registered for this form and I have applied for the CUDA/GPU Computing Registered Developer Program. When I click on the link given above; https://nvdeveloper.nvidia.com , I get to a login screen but it does not recognise my login or password. After trying several times it appears I am being blocked, I cannot see the login page, I am now getting "The connection has timed out , The server at nvdeveloper.nvidia.com is taking too long to respond."

There are two registered developer websites, the old site and the new site. Best I am aware, they do not share login information. So the problem may simply be a mismatch between login information and website.

To access the code via the new registered developer website:

(1) Go to http://developer.nvidia.com/
(2) On right hand side, click on green link "Registered Developers Website"
(3) Log in or create new account as needed
(4) click on green link "CUDA/GPU Computing Registered Developer Program"
(5) Scroll down to section "CUDA Batch Solver"
(6) Click on green link "follow this link"
(7) Click green "I ACCEPT" at usage agreement
(8) your download should start

To access the code via the old registered developer website:

(1) Go to http://partners.nvidia.com
(2) Sign in with email address and password
(3) There is a menu on the right hand side titled "Newest Downloads"
(4) Click on link "Batched Solver"
(5) Click on link "download"
(6) Click "Accept" button below the usage agreement
(7) your downoad should start

There are two registered developer websites, the old site and the new site. Best I am aware, they do not share login information. So the problem may simply be a mismatch between login information and website.

To access the code via the new registered developer website:

(1) Go to http://developer.nvidia.com/

(2) On right hand side, click on green link "Registered Developers Website"

(3) Log in or create new account as needed

(4) click on green link "CUDA/GPU Computing Registered Developer Program"

(5) Scroll down to section "CUDA Batch Solver"

(6) Click on green link "follow this link"

(7) Click green "I ACCEPT" at usage agreement

(8) your download should start

To access the code via the old registered developer website:

(1) Go to http://partners.nvidia.com

(2) Sign in with email address and password

(3) There is a menu on the right hand side titled "Newest Downloads"

(4) Click on link "Batched Solver"

(5) Click on link "download"

(6) Click "Accept" button below the usage agreement

Anyone succeeded to convert this to single precision (float) ?
The main problem seems to be the config class, which I'm not sure how to define it for float.
Thanks.

[quote="rm9"]Anyone succeeded to convert this to single precision (float) ?
The main problem seems to be the config class, which I'm not sure how to define it for float.
Thanks.[/quote]
If someone has a float/complex version, I'm also very interested!

Do you need the float/complex solver, or float/complex matrix inverse? By the way, there are no dark secrets for finding the data needed for the config class, but it takes time to run necessary experiments to find the best configuration.

Do you need the float/complex solver, or float/complex matrix inverse? By the way, there are no dark secrets for finding the data needed for the config class, but it takes time to run necessary experiments to find the best configuration.

There is a version of Matrix Inversion which I use for convex optimization problems. Have tested it on matrices as large as (4,000 by 4,000) and it works well on dense matrices.
Also libraries like CULA or MAGMA allow you to solve for it.
Not sure about a complex version though.

There is a version of Matrix Inversion which I use for convex optimization problems. Have tested it on matrices as large as (4,000 by 4,000) and it works well on dense matrices.

Also libraries like CULA or MAGMA allow you to solve for it.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

Is there some simple way to make this code run using single precision complex numbers?

I'm currently working on a batched solver my self for small (dim 2-32) positive definite systems using Cholesky decomposition.

Is there any plans to extend this code to support these kind of systems?

Is there some simple way to make this code run using single precision complex numbers?

I'm currently working on a batched solver my self for small (dim 2-32) positive definite systems using Cholesky decomposition.

Is there any plans to extend this code to support these kind of systems?

Complex float would have the same config as double, so I just copy-paste that one.

Will the same config be optimal for non-complex float as well? With float you could fit more into shared memory, and therefore support larger matrices, so some changes to the launch config must be done.

Complex float would have the same config as double, so I just copy-paste that one.

Will the same config be optimal for non-complex float as well? With float you could fit more into shared memory, and therefore support larger matrices, so some changes to the launch config must be done.

The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

[/quote]

Hi!

I'm a fortran programmer. I call batched solver through fortran's interface, and I think it works perfectly. But I wonder if you can provide a batched solver which can solve least square problem. That will be helpful to me. Also I want you add single precision suport, that will be perfect!

The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

Hi!

I'm a fortran programmer. I call batched solver through fortran's interface, and I think it works perfectly. But I wonder if you can provide a batched solver which can solve least square problem. That will be helpful to me. Also I want you add single precision suport, that will be perfect!

1) Get a DevZone ID

a. http://developer.nvidia.com Top Right either new account or if you have a devzone account just login.

2) Once logged in go to â€œmy accountâ€ top right

a. Complete the Basic Profile, and then complete the CUDA registered developer form.

3) You will receive email confirmations and will usually be approved within one business day

4) Once approved the CUDA registered developer program home page is accessible via the â€œMy Accountâ€ - the program name will be green and is a hyperlink.

1) Get a DevZone ID

a. http://developer.nvidia.com Top Right either new account or if you have a devzone account just login.

2) Once logged in go to â€œmy accountâ€ top right

a. Complete the Basic Profile, and then complete the CUDA registered developer form.

3) You will receive email confirmations and will usually be approved within one business day

4) Once approved the CUDA registered developer program home page is accessible via the â€œMy Accountâ€ - the program name will be green and is a hyperlink.

The source code for an efficient solver and matrix inversion for small matrices using partial pivoting, is now available to all registered developers.

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

[/quote]

Hi I have registered as a Basic Registered Developer, I have registered for this form and I have applied for the CUDA/GPU Computing Registered Developer Program. When I click on the link given above; https://nvdeveloper.nvidia.com , I get to a login screen but it does not recognise my login or password. After trying several times it appears I am being blocked, I cannot see the login page, I am now getting "The connection has timed out , The server at nvdeveloper.nvidia.com is taking too long to respond."

On sm_2x GPUs (48KB shared memory), the maximum dimensions for the matrices are:

Â Â Â Â DP Â Â Â Â Â solver, non-batched matrix inverse: Â 2...76

Â Â Â Â DP complex Â solver, non-batched matrix inverse: Â 2...53

Â Â Â Â DP Â Â Â Â Â batched matrix inverse: Â Â Â Â Â Â Â 2...77

Â Â Â Â DP complex Â batched matrix inverse: Â Â Â Â Â Â Â 2...55

The code has been released under BSD license.

It is available from the CUDA registered developer web site:

https://nvdeveloper.nvidia.com

Hi I have registered as a Basic Registered Developer, I have registered for this form and I have applied for the CUDA/GPU Computing Registered Developer Program. When I click on the link given above; https://nvdeveloper.nvidia.com , I get to a login screen but it does not recognise my login or password. After trying several times it appears I am being blocked, I cannot see the login page, I am now getting "The connection has timed out , The server at nvdeveloper.nvidia.com is taking too long to respond."

To access the code via the new registered developer website:

(1) Go to http://developer.nvidia.com/

(2) On right hand side, click on green link "Registered Developers Website"

(3) Log in or create new account as needed

(4) click on green link "CUDA/GPU Computing Registered Developer Program"

(5) Scroll down to section "CUDA Batch Solver"

(6) Click on green link "follow this link"

(7) Click green "I ACCEPT" at usage agreement

(8) your download should start

To access the code via the old registered developer website:

(1) Go to http://partners.nvidia.com

(2) Sign in with email address and password

(3) There is a menu on the right hand side titled "Newest Downloads"

(4) Click on link "Batched Solver"

(5) Click on link "download"

(6) Click "Accept" button below the usage agreement

(7) your downoad should start

To access the code via the new registered developer website:

(1) Go to http://developer.nvidia.com/

(2) On right hand side, click on green link "Registered Developers Website"

(3) Log in or create new account as needed

(4) click on green link "CUDA/GPU Computing Registered Developer Program"

(5) Scroll down to section "CUDA Batch Solver"

(6) Click on green link "follow this link"

(7) Click green "I ACCEPT" at usage agreement

(8) your download should start

To access the code via the old registered developer website:

(1) Go to http://partners.nvidia.com

(2) Sign in with email address and password

(3) There is a menu on the right hand side titled "Newest Downloads"

(4) Click on link "Batched Solver"

(5) Click on link "download"

(6) Click "Accept" button below the usage agreement

(7) your downoad should start

The main problem seems to be the config class, which I'm not sure how to define it for float.

Thanks.

If someone has a float/complex version, I'm also very interested!

Also libraries like CULA or MAGMA allow you to solve for it.

Not sure about a complex version though.