Here is the interface unit for [b]CuBlas[/b] (double & single) and [b]Cuda[/b] (just some functions) in [b]Cuda 4.0 Version[/b], the older files that I have in my Forum are not 4.0.
The Zipped File includes the [b]CuBlasMtxVec Classes [/b](Vector & Matrix) using only CuBlas functions, the solvers are CG and BiCGStab, GMRES is pending, sorry, I will work on it, but first I had to create the CuBlas interface unit for [b]Cuda 4.0[/b] in [b]Delphi 7.0[/b].

...the file also includes:
cublas_api.pas
cudaRT_v2.pas
cublas.pas
and an example

and PDF files from where I took the information or code for CG and BiCGStab.

Here is the interface unit for CuBlas (double & single) and Cuda (just some functions) in Cuda 4.0 Version, the older files that I have in my Forum are not 4.0.

The Zipped File includes the CuBlasMtxVec Classes (Vector & Matrix) using only CuBlas functions, the solvers are CG and BiCGStab, GMRES is pending, sorry, I will work on it, but first I had to create the CuBlas interface unit for Cuda 4.0 in Delphi 7.0.

...the file also includes:

cublas_api.pas

cudaRT_v2.pas

cublas.pas

and an example

and PDF files from where I took the information or code for CG and BiCGStab.

[twitter]Tavo_Nice[/twitter] Finally here is [b]GMRES[/b] in [b]CuBlas V2 Cuda 4[/b] for [b]Delphi[/b] with Cuda RunTime, CuBlas and Cublas API unit interfaces...
I have included the [b]Standard LU Decomposition [/b]and [b]Block LU Decomposition[/b], but without examples.

DAE Solver (Differential Algebraic Equations) with CUBLAS in Delphi 7.0 (Object Pascal)
& CuBLAS Dense Linear Algebra Matrix Vector Classes with Direct and Krylov Methods

The Circuit Analysis Technique is Modified Nodal Analysis.

Transient and DC Analysis (Damped Newton Method)

Examples:

Rectifier Bridge with RC Filter Load

Rectifier Bridge with DC Linear Voltage Regulator and RC Filter Load

both with options to run Half Wave, Full Wave and to adjust RL, CL, Freq and Vi AC Volts.

Integration Methods include BE (Backward Euler), TR (Trapezoidal), BDF2 (Backwad Differentiation Formula 2nd or Gear 2nd Method) and TRBDF2

Visual Example

CULACuBlasMtxVec

CULA (interface unit)

CuBlas (Interface Unit)

Tested with

Delphi 7.0

Cuda ToolKit 4

Cula Premim R12

devdriver_4.0_winxp_32_270.81_general

PNY NVIDIA Quadro VCQ 4000 Professional Graphics Card VCQ4000-PB

Some of the Copy Functions doesn't work, so don't use them.

Any Ideas To Improvet It are Welcome

Visual Example

CULACuBlasMtxVec

CULA (interface unit)

CuBlas (Interface Unit)

Tested with

Delphi 7.0

Cuda ToolKit 4

Cula Premim R12

devdriver_4.0_winxp_32_270.81_general

PNY NVIDIA Quadro VCQ 4000 Professional Graphics Card VCQ4000-PB

Some of the Copy Functions doesn't work, so don't use them.

Any Ideas To Improvet It are Welcome

I use the same example to run it with culaDgesv(Cula) and BicgStab(cublas).

The next week I will work with GMRES.

It was tested with Quadro 4000 and GTX 450.

I use the same example to run it with culaDgesv(Cula) and BicgStab(cublas).

The next week I will work with GMRES.

It was tested with Quadro 4000 and GTX 450.

The Zipped File includes the [b]CuBlasMtxVec Classes [/b](Vector & Matrix) using only CuBlas functions, the solvers are CG and BiCGStab, GMRES is pending, sorry, I will work on it, but first I had to create the CuBlas interface unit for [b]Cuda 4.0[/b] in [b]Delphi 7.0[/b].

...the file also includes:

cublas_api.pas

cudaRT_v2.pas

cublas.pas

and an example

and PDF files from where I took the information or code for CG and BiCGStab.

CuBlas(double & single) andCuda(just some functions) inCuda 4.0 Version, the older files that I have in my Forum are not 4.0.The Zipped File includes the

CuBlasMtxVec Classes(Vector & Matrix) using only CuBlas functions, the solvers are CG and BiCGStab, GMRES is pending, sorry, I will work on it, but first I had to create the CuBlas interface unit forCuda 4.0inDelphi 7.0....the file also includes:

cublas_api.pas

cudaRT_v2.pas

cublas.pas

and an example

and PDF files from where I took the information or code for CG and BiCGStab.

I have included the [b]Standard LU Decomposition [/b]and [b]Block LU Decomposition[/b], but without examples.

GMRESinCuBlas V2 Cuda 4forDelphiwith Cuda RunTime, CuBlas and Cublas API unit interfaces...I have included the

Standard LU DecompositionandBlock LU Decomposition, but without examples.The Solution of the Triangular System is made with Cublas too.

Code:

[b]for[/b] col:= 0 [b]to[/b] ii-1 [b]do[/b] cudaMemcpy(@h_device[col*ii], @H[col*(m+1)],ii*sizeof(double),cudaMemcpyHostToDevice);

statuscublas:= cublasSetVector(ii, sizeof(double),@s[0],1,@y[0],1);

cublasDtrsv(' ',' ',' ',(ii), h_device[0], (ii), y[0], 1);

cublasDgemv('N', N, (ii), 1.0, V[0], N, y[0], 1, 1.0, xx[0] , 1);

The Solution of the Triangular System is made with Cublas too.

Code:

forcol:= 0toii-1docudaMemcpy(@h_device[col*ii], @H[col*(m+1)],ii*sizeof(double),cudaMemcpyHostToDevice);statuscublas:= cublasSetVector(ii, sizeof(double),@s[0],1,@y[0],1);

cublasDtrsv(' ',' ',' ',(ii), h_device[0], (ii), y[0], 1);

cublasDgemv('N', N, (ii), 1.0, V[0], N, y[0], 1, 1.0, xx[0] , 1);

Cublas V2

Cuda 4.0

[b]Object Oriented Dense Linear Algebra for Nvidia GPU's with Delphi 7.0[/b]

Conjugate Gradient [b]CG[/b]

BiConjugate Gradient Stabilizated [b]BiCGStab[/b]

Generalized Minimal Residual with Modified Gram Schmidt [b]GMRES[/b]

Simpler Generalized Minimal Residual with Modified Gram Schmidt [b]SGMRES[/b]

Adaptive Simpler Generalized Minimal Residual with Modified Gram Schmidt [b]AdaptiveSGMRES[/b]

Cublas V2

Cuda 4.0

Object Oriented Dense Linear Algebra for Nvidia GPU's with Delphi 7.0Conjugate Gradient

CGBiConjugate Gradient Stabilizated

BiCGStabGeneralized Minimal Residual with Modified Gram Schmidt

GMRESSimpler Generalized Minimal Residual with Modified Gram Schmidt

SGMRESAdaptive Simpler Generalized Minimal Residual with Modified Gram Schmidt

AdaptiveSGMRESObject Oriented Linear Circuit Simulator using Object Oriented Dense Linear Algebra (Cublas Wrapper for Delphi 7.0)

example (included):

TCircuitProblem = [b]Class[/b](TCircuit)

[b]public[/b]

VDC1, VDC2: TDCVSource;

IDC1: TDCISource;

R1, R2, R3: TResistor;

[b]constructor[/b] Create;

[b]end;[/b]

.

.

.

{ TCircuitProblem }

[b]constructor[/b] TCircuitProblem.Create;

[b]begin[/b]

[b]inherited[/b] Create;

VDC1:= TDCVSource.Create(1,0,4.5);

VDC2:= TDCVSource.Create(3,0,4.5);

IDC1:= TDCISource.Create(2,0,2.0);

R1:= TResistor.Create(1,2,4.0);

R2:= TResistor.Create(2,3,4.0);

R3:= TResistor.Create(2,0,3.0);

Add(VDC1);

Add(VDC2);

Add(IDC1);

Add(R1);

Add(R2);

Add(R3);

[b]end;[/b]

DC Analysis:

CircuitProblem.DCAnalysis //the solver is LU Decomposition Totally Implemented with CuBLAS

in the furure I will make the NonLinear...

Object Oriented Linear Circuit Simulator using Object Oriented Dense Linear Algebra (Cublas Wrapper for Delphi 7.0)

example (included):

TCircuitProblem =

Class(TCircuit)publicVDC1, VDC2: TDCVSource;

IDC1: TDCISource;

R1, R2, R3: TResistor;

constructorCreate;end;.

.

.

{ TCircuitProblem }

constructorTCircuitProblem.Create;begininheritedCreate;VDC1:= TDCVSource.Create(1,0,4.5);

VDC2:= TDCVSource.Create(3,0,4.5);

IDC1:= TDCISource.Create(2,0,2.0);

R1:= TResistor.Create(1,2,4.0);

R2:= TResistor.Create(2,3,4.0);

R3:= TResistor.Create(2,0,3.0);

Add(VDC1);

Add(VDC2);

Add(IDC1);

Add(R1);

Add(R2);

Add(R3);

end;DC Analysis:

CircuitProblem.DCAnalysis //the solver is LU Decomposition Totally Implemented with CuBLAS

in the furure I will make the NonLinear...

[b]Direct Solvers:[/b]

LU Decomposition Solver [b]LUSolve[/b]

Gauss Elimination Solver [b]GaussSolve[/b]

[b]Iterative Solvers:[/b]

Conjugate Gradient [b]CG[/b]

BiConjugate Gradient Stabilizated [b]BiCGStab[/b]

Generalized Minimal Residual with Modified Gram Schmidt [b]GMRES[/b]

Simpler Generalized Minimal Residual with Modified Gram Schmidt [b]SGMRES[/b]

Adaptive Simpler Generalized Minimal Residual with Modified Gram Schmidt [b]AdaptiveSGMRES[/b]

[b]Plus:[/b]

Add, Sub, Prod, NormL2, Dot functions using CuBLAS. CUDA 4.0.

[b]Examples:[/b]

Linear Algebra & Linear Circuit Simulator (DC Analysis)

code c, c++, cpp, h, pascal

Object Oriented Dense Linear Algebra for Nvidia GPU's with Delphi 7.0 using CuBLASDirect Solvers:LU Decomposition Solver

LUSolveGauss Elimination Solver

GaussSolveIterative Solvers:Conjugate Gradient

CGBiConjugate Gradient Stabilizated

BiCGStabGeneralized Minimal Residual with Modified Gram Schmidt

GMRESSimpler Generalized Minimal Residual with Modified Gram Schmidt

SGMRESAdaptive Simpler Generalized Minimal Residual with Modified Gram Schmidt

AdaptiveSGMRESPlus:Add, Sub, Prod, NormL2, Dot functions using CuBLAS. CUDA 4.0.

Examples:Linear Algebra & Linear Circuit Simulator (DC Analysis)

code c, c++, cpp, h, pascal

DAE Solver (Differential Algebraic Equations) with CUBLAS in Delphi 7.0 (Object Pascal)

& CuBLAS Dense Linear Algebra Matrix Vector Classes with Direct and Krylov Methods

The Circuit Analysis Technique is Modified Nodal Analysis.

Transient and DC Analysis (Damped Newton Method)

Examples:

Rectifier Bridge with RC Filter Load

Rectifier Bridge with DC Linear Voltage Regulator and RC Filter Load

both with options to run Half Wave, Full Wave and to adjust RL, CL, Freq and Vi AC Volts.

Integration Methods include BE (Backward Euler), TR (Trapezoidal), BDF2 (Backwad Differentiation Formula 2nd or Gear 2nd Method) and TRBDF2

with automatic step size and order control.

Linear & NonLinear Electronic Circuit SimulatorDAE Solver (Differential Algebraic Equations) with CUBLAS in Delphi 7.0 (Object Pascal)

& CuBLAS Dense Linear Algebra Matrix Vector Classes with Direct and Krylov Methods

The Circuit Analysis Technique is Modified Nodal Analysis.

Transient and DC Analysis (Damped Newton Method)

Examples:

Rectifier Bridge with RC Filter Load

Rectifier Bridge with DC Linear Voltage Regulator and RC Filter Load

both with options to run Half Wave, Full Wave and to adjust RL, CL, Freq and Vi AC Volts.

Integration Methods include BE (Backward Euler), TR (Trapezoidal), BDF2 (Backwad Differentiation Formula 2nd or Gear 2nd Method) and TRBDF2

with automatic step size and order control.