Complete cuBLAS anytime soon?
Is there a project to complete the cublas library anytime soon?
If not, would it be possible to provide the source code for one typical routine? Then I can continue my project!
I did try to get one more function cudaized but it does no work properly yet.
Is there a project to complete the cublas library anytime soon?

If not, would it be possible to provide the source code for one typical routine? Then I can continue my project!

I did try to get one more function cudaized but it does no work properly yet.

#1
Posted 10/25/2009 10:59 PM   
Since I don't get any response, I will embark on the porting of (more complete)blas to cublas myself.
Since I don't get any response, I will embark on the porting of (more complete)blas to cublas myself.

#2
Posted 10/27/2009 01:43 PM   
As of today, I have completed 80 Blas functions(single,double,complex,double complex) up to being able to compile with nvcc without errors.
I still have to arrange the grids and blocks values and convert the for loops for all functions.
As of today, I have completed 80 Blas functions(single,double,complex,double complex) up to being able to compile with nvcc without errors.

I still have to arrange the grids and blocks values and convert the for loops for all functions.

#3
Posted 11/05/2009 02:26 PM   
[quote name='jam1' post='945788' date='Nov 5 2009, 10:26 AM']As of today, I have completed 80 Blas functions(single,double,complex,double complex) up to being able to compile with nvcc without errors.
I still have to arrange the grids and blocks values and convert the for loops for all functions.[/quote]

Just a note to encourage you to keep going. I am sure there are a lot of people who will appreciate access to them.

MMB
[quote name='jam1' post='945788' date='Nov 5 2009, 10:26 AM']As of today, I have completed 80 Blas functions(single,double,complex,double complex) up to being able to compile with nvcc without errors.

I still have to arrange the grids and blocks values and convert the for loops for all functions.



Just a note to encourage you to keep going. I am sure there are a lot of people who will appreciate access to them.



MMB

#4
Posted 11/06/2009 01:20 AM   
CUDA 3.0 beta has a lot more BLAS functions implemented.
To check if the ones you need are in the library, you could use something similar to this:

lib64]$ nm -D libcublas.so |grep -y gemm
000000000006af40 T cublasCgemm
0000000000070a80 T cublasDgemm
000000000002ae00 T cublasSgemm
000000000006ff80 T cublasZgemm

lib64]$ nm -D libcublas.so |grep -y trsm
0000000000075590 T cublasCtrsm
0000000000079030 T cublasDtrsm
000000000004dcf0 T cublasStrsm
00000000000b6710 T cublasZtrsm
CUDA 3.0 beta has a lot more BLAS functions implemented.

To check if the ones you need are in the library, you could use something similar to this:



lib64]$ nm -D libcublas.so |grep -y gemm

000000000006af40 T cublasCgemm

0000000000070a80 T cublasDgemm

000000000002ae00 T cublasSgemm

000000000006ff80 T cublasZgemm



lib64]$ nm -D libcublas.so |grep -y trsm

0000000000075590 T cublasCtrsm

0000000000079030 T cublasDtrsm

000000000004dcf0 T cublasStrsm

00000000000b6710 T cublasZtrsm

#5
Posted 11/06/2009 01:30 AM   
I've been contemplating the BLAS/LAPACK problem recently. There are things happening...

One project is CULA [url="http://www.culatools.com/"]http://www.culatools.com/[/url] which has a number of BLAS and LAPACK routines available - some are free, some are not. The routine to solve the matrix equation A*X=B (lapack sgesv) is free. CULA seems to be a quasi-commercial operation that works closely with Nvidia, I gather.

Another project is MAGMA [url="http://icl.cs.utk.edu/magma/"]http://icl.cs.utk.edu/magma/[/url] - this is a collaborative project involving Jack Dongarra and Vasily Volkov: "The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current "Multicore+GPU" systems." MAGMA is meant to be releasing a number of new routines in a couple of weeks time.

I've been putting CULA's sgesv routine to great use of late. But I suspect that the MAGMA routines will eventually dominate given the people and philosophies (computing approaches, freely available routines) behind it. I'm looking forward to what routines MAGMA will be making available in the middle of this month.
I've been contemplating the BLAS/LAPACK problem recently. There are things happening...



One project is CULA http://www.culatools.com/ which has a number of BLAS and LAPACK routines available - some are free, some are not. The routine to solve the matrix equation A*X=B (lapack sgesv) is free. CULA seems to be a quasi-commercial operation that works closely with Nvidia, I gather.



Another project is MAGMA http://icl.cs.utk.edu/magma/ - this is a collaborative project involving Jack Dongarra and Vasily Volkov: "The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current "Multicore+GPU" systems." MAGMA is meant to be releasing a number of new routines in a couple of weeks time.



I've been putting CULA's sgesv routine to great use of late. But I suspect that the MAGMA routines will eventually dominate given the people and philosophies (computing approaches, freely available routines) behind it. I'm looking forward to what routines MAGMA will be making available in the middle of this month.

#6
Posted 11/07/2009 09:28 PM   
[quote name='Boxed Cylon' post='947170' date='Nov 7 2009, 05:28 PM']I've been contemplating the BLAS/LAPACK problem recently. There are things happening...

One project is CULA [url="http://www.culatools.com/"]http://www.culatools.com/[/url] which has a number of BLAS and LAPACK routines available - some are free, some are not. The routine to solve the matrix equation A*X=B (lapack sgesv) is free. CULA seems to be a quasi-commercial operation that works closely with Nvidia, I gather.

Another project is MAGMA [url="http://icl.cs.utk.edu/magma/"]http://icl.cs.utk.edu/magma/[/url] - this is a collaborative project involving Jack Dongarra and Vasily Volkov: "The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current "Multicore+GPU" systems." MAGMA is meant to be releasing a number of new routines in a couple of weeks time.

I've been putting CULA's sgesv routine to great use of late. But I suspect that the MAGMA routines will eventually dominate given the people and philosophies (computing approaches, freely available routines) behind it. I'm looking forward to what routines MAGMA will be making available in the middle of this month.[/quote]
One needs to distinguish between BLAS and LAPACK. I believe both CULA and MAGMA are directed at LAPACK. Obviously, they probably use BLAS to build LAPACK, but I don't see any BLAS offered on their websites.

MMB
[quote name='Boxed Cylon' post='947170' date='Nov 7 2009, 05:28 PM']I've been contemplating the BLAS/LAPACK problem recently. There are things happening...



One project is CULA http://www.culatools.com/ which has a number of BLAS and LAPACK routines available - some are free, some are not. The routine to solve the matrix equation A*X=B (lapack sgesv) is free. CULA seems to be a quasi-commercial operation that works closely with Nvidia, I gather.



Another project is MAGMA http://icl.cs.utk.edu/magma/ - this is a collaborative project involving Jack Dongarra and Vasily Volkov: "The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current "Multicore+GPU" systems." MAGMA is meant to be releasing a number of new routines in a couple of weeks time.



I've been putting CULA's sgesv routine to great use of late. But I suspect that the MAGMA routines will eventually dominate given the people and philosophies (computing approaches, freely available routines) behind it. I'm looking forward to what routines MAGMA will be making available in the middle of this month.

One needs to distinguish between BLAS and LAPACK. I believe both CULA and MAGMA are directed at LAPACK. Obviously, they probably use BLAS to build LAPACK, but I don't see any BLAS offered on their websites.



MMB

#7
Posted 11/07/2009 10:12 PM   
Hi,
I just started the testing phase of the single precision routines. I am using the original blas testing driver routines. No good results yet. It looks like I will have to look at the driver to see what is going on.
Hi,

I just started the testing phase of the single precision routines. I am using the original blas testing driver routines. No good results yet. It looks like I will have to look at the driver to see what is going on.

#8
Posted 11/08/2009 01:56 PM   
Thank you for the 3.0 cublas.
Now I need to understand how to integrated lapack routine with this.
Thank you for the 3.0 cublas.

Now I need to understand how to integrated lapack routine with this.

#9
Posted 11/16/2009 02:00 PM   
Ok, I got it to work. I did integrate lapack routine with cublas.

I am using this script (executable) to add cublas_ to all blas functions. This is in-place replacement. Backup your program before using
Check the program after. Some function could be twice modify giving cublas_cublas_XXXXX

[codebox]
sed -i -e 's/ZGEMM/cublas_ZGEMM/g' *.f90
sed -i -e 's/ZGEMV/cublas_ZGEMV/g' *.f90
sed -i -e 's/ZAXPY/cublas_ZAXPY/g' *.f90
sed -i -e 's/ZCOPY/cublas_ZCOPY/g' *.f90
sed -i -e 's/ZDOTU/cublas_ZDOTU/g' *.f90
sed -i -e 's/ZSCAL/cublas_ZSCAL/g' *.f90
sed -i -e 's/ZSWAP/cublas_ZSWAP/g' *.f90
sed -i -e 's/ZTRMV/cublas_ZTRMV/g' *.f90
sed -i -e 's/ZGERU/cublas_ZGERU/g' *.f90
sed -i -e 's/ZGERC/cublas_ZGERC/g' *.f90
sed -i -e 's/ZTRMM/cublas_ZTRMM/g' *.f90
sed -i -e 's/ZSYMM/cublas_ZSYMM/g' *.f90
sed -i -e 's/ZSYRK/cublas_ZSYRK/g' *.f90
sed -i -e 's/ZHERK/cublas_ZHERK/g' *.f90
sed -i -e 's/ZTRSM/cublas_ZTRSM/g' *.f90

sed -i -e 's/IDAMAX/cublas_IDAMAX/g' *.f90
sed -i -e 's/IDAMIN/cublas_IDAMIN/g' *.f90
sed -i -e 's/DASUM/cublas_DASUM/g' *.f90
sed -i -e 's/DAXPY/cublas_DAXPY/g' *.f90
sed -i -e 's/DCOPY/cublas_DCOPY/g' *.f90
sed -i -e 's/DDOT/cublas_DDOT/g' *.f90
sed -i -e 's/DNRM2/cublas_DNRM2/g' *.f90
sed -i -e 's/DROT/cublas_DROT/g' *.f90
sed -i -e 's/DROTG/cublas_DROTG/g' *.f90
sed -i -e 's/DROTM/cublas_DROTM/g' *.f90
sed -i -e 's/DROTMG/cublas_DROTMG/g' *.f90
sed -i -e 's/DSCAL/cublas_DSCAL/g' *.f90
sed -i -e 's/DSWAP/cublas_DSWAP/g' *.f90
sed -i -e 's/DGEMV/cublas_DGEMV/g' *.f90
sed -i -e 's/DGEMM/cublas_DGEMM/g' *.f90
sed -i -e 's/DGER/cublas_DGER/g' *.f90
sed -i -e 's/DSYR/cublas_DSYR/g' *.f90
sed -i -e 's/DTRMV/cublas_DTRMV/g' *.f90
sed -i -e 's/DTRSV/cublas_DTRSV/g' *.f90
sed -i -e 's/DSYMM/cublas_DSYMM/g' *.f90
sed -i -e 's/DSYRK/cublas_DSYRK/g' *.f90
sed -i -e 's/DTRMM/cublas_DTRMM/g' *.f90
sed -i -e 's/DTRSM/cublas_DTRSM/g' *.f90
sed -i -e 's/DSYR2K/cublas_DSYR2K/g' *.f90


sed -i -e 's/ZGEMM/cublas_ZGEMM/g' *.f
sed -i -e 's/ZGEMV/cublas_ZGEMV/g' *.f
sed -i -e 's/ZAXPY/cublas_ZAXPY/g' *.f
sed -i -e 's/ZCOPY/cublas_ZCOPY/g' *.f
sed -i -e 's/ZDOTU/cublas_ZDOTU/g' *.f
sed -i -e 's/ZSCAL/cublas_ZSCAL/g' *.f
sed -i -e 's/ZSWAP/cublas_ZSWAP/g' *.f
sed -i -e 's/ZTRMV/cublas_ZTRMV/g' *.f
sed -i -e 's/ZGERU/cublas_ZGERU/g' *.f
sed -i -e 's/ZGERC/cublas_ZGERC/g' *.f
sed -i -e 's/ZTRMM/cublas_ZTRMM/g' *.f
sed -i -e 's/ZSYMM/cublas_ZSYMM/g' *.f
sed -i -e 's/ZSYRK/cublas_ZSYRK/g' *.f
sed -i -e 's/ZHERK/cublas_ZHERK/g' *.f
sed -i -e 's/ZTRSM/cublas_ZTRSM/g' *.f

sed -i -e 's/IDAMAX/cublas_IDAMAX/g' *.f
sed -i -e 's/IDAMIN/cublas_IDAMIN/g' *.f
sed -i -e 's/DASUM/cublas_DASUM/g' *.f
sed -i -e 's/DAXPY/cublas_DAXPY/g' *.f
sed -i -e 's/DCOPY/cublas_DCOPY/g' *.f
sed -i -e 's/DDOT/cublas_DDOT/g' *.f
sed -i -e 's/DNRM2/cublas_DNRM2/g' *.f
sed -i -e 's/DROT/cublas_DROT/g' *.f
sed -i -e 's/DROTG/cublas_DROTG/g' *.f
sed -i -e 's/DROTM/cublas_DROTM/g' *.f
sed -i -e 's/DROTMG/cublas_DROTMG/g' *.f
sed -i -e 's/DSCAL/cublas_DSCAL/g' *.f
sed -i -e 's/DSWAP/cublas_DSWAP/g' *.f
sed -i -e 's/DGEMV/cublas_DGEMV/g' *.f
sed -i -e 's/DGEMM/cublas_DGEMM/g' *.f
sed -i -e 's/DGER/cublas_DGER/g' *.f
sed -i -e 's/DSYR/cublas_DSYR/g' *.f
sed -i -e 's/DTRMV/cublas_DTRMV/g' *.f
sed -i -e 's/DTRSV/cublas_DTRSV/g' *.f
sed -i -e 's/DSYMM/cublas_DSYMM/g' *.f
sed -i -e 's/DSYRK/cublas_DSYRK/g' *.f
sed -i -e 's/DTRMM/cublas_DTRMM/g' *.f
sed -i -e 's/DTRSM/cublas_DTRSM/g' *.f
sed -i -e 's/DSYR2K/cublas_DSYR2K/g' *.f

[/codebox]

Another script removes the suffixes:

[codebox]
sed -i -e 's/cublas_ZGEMM/ ZGEMM/g' *.f90
sed -i -e 's/cublas_ZGEMV/ ZGEMV/g' *.f90
sed -i -e 's/cublas_ZAXPY/ ZAXPY/g' *.f90
sed -i -e 's/cublas_ZCOPY/ ZCOPY/g' *.f90
sed -i -e 's/cublas_ZDOTU/ ZDOTU/g' *.f90
sed -i -e 's/cublas_ZSCAL/ ZSCAL/g' *.f90
sed -i -e 's/cublas_ZSWAP/ ZSWAP/g' *.f90
sed -i -e 's/cublas_ZTRMV/ ZTRMV/g' *.f90
sed -i -e 's/cublas_ZGERU/ ZGERU/g' *.f90
sed -i -e 's/cublas_ZGERC/ ZGERC/g' *.f90
sed -i -e 's/cublas_ZTRMM/ ZTRMM/g' *.f90
sed -i -e 's/cublas_ZSYMM/ ZSYMM/g' *.f90
sed -i -e 's/cublas_ZSYRK/ ZSYRK/g' *.f90
sed -i -e 's/cublas_ZHERK/ ZHERK/g' *.f90
sed -i -e 's/cublas_ZTRSM/ ZTRSM/g' *.f90

sed -i -e 's/cublas_IDAMAX/ IDAMAX/g' *.f90
sed -i -e 's/cublas_IDAMIN/ IDAMIN/g' *.f90
sed -i -e 's/cublas_DASUM/ DASUM/g' *.f90
sed -i -e 's/cublas_DAXPY/ DAXPY/g' *.f90
sed -i -e 's/cublas_DCOPY/ DCOPY/g' *.f90
sed -i -e 's/cublas_DDOT/ DDOT/g' *.f90
sed -i -e 's/cublas_DNRM2/ DNRM2/g' *.f90
sed -i -e 's/cublas_DROT/ DROT/g' *.f90
sed -i -e 's/cublas_DROTG/ DROTG/g' *.f90
sed -i -e 's/cublas_DROTM/ DROTM/g' *.f90
sed -i -e 's/cublas_DROTMG/ DROTMG/g' *.f90
sed -i -e 's/cublas_DSCAL/ DSCAL/g' *.f90
sed -i -e 's/cublas_DSWAP/ DSWAP/g' *.f90
sed -i -e 's/cublas_DGEMV/ DGEMV/g' *.f90
sed -i -e 's/cublas_DGEMM/ DGEMM/g' *.f90
sed -i -e 's/cublas_DGER/ DGER/g' *.f90
sed -i -e 's/cublas_DSYR/ DSYR/g' *.f90
sed -i -e 's/cublas_DTRMV/ DTRMV/g' *.f90
sed -i -e 's/cublas_DTRSV/ DTRSV/g' *.f90
sed -i -e 's/cublas_DSYMM/ DSYMM/g' *.f90
sed -i -e 's/cublas_DSYRK/ DSYRK/g' *.f90
sed -i -e 's/cublas_DTRMM/ DTRMM/g' *.f90
sed -i -e 's/cublas_DTRSM/ DTRSM/g' *.f90
sed -i -e 's/cublas_DSYR2K/ DSYR2K/g' *.f90

sed -i -e 's/cublas_ZGEMM/ ZGEMM/g' *.f
sed -i -e 's/cublas_ZGEMV/ ZGEMV/g' *.f
sed -i -e 's/cublas_ZAXPY/ ZAXPY/g' *.f
sed -i -e 's/cublas_ZCOPY/ ZCOPY/g' *.f
sed -i -e 's/cublas_ZDOTU/ ZDOTU/g' *.f
sed -i -e 's/cublas_ZSCAL/ ZSCAL/g' *.f
sed -i -e 's/cublas_ZSWAP/ ZSWAP/g' *.f
sed -i -e 's/cublas_ZTRMV/ ZTRMV/g' *.f
sed -i -e 's/cublas_ZGERU/ ZGERU/g' *.f
sed -i -e 's/cublas_ZGERC/ ZGERC/g' *.f
sed -i -e 's/cublas_ZTRMM/ ZTRMM/g' *.f
sed -i -e 's/cublas_ZSYMM/ ZSYMM/g' *.f
sed -i -e 's/cublas_ZSYRK/ ZSYRK/g' *.f
sed -i -e 's/cublas_ZHERK/ ZHERK/g' *.f
sed -i -e 's/cublas_ZTRSM/ ZTRSM/g' *.f

sed -i -e 's/cublas_IDAMAX/ IDAMAX/g' *.f
sed -i -e 's/cublas_IDAMIN/ IDAMIN/g' *.f
sed -i -e 's/cublas_DASUM/ DASUM/g' *.f
sed -i -e 's/cublas_DAXPY/ DAXPY/g' *.f
sed -i -e 's/cublas_DCOPY/ DCOPY/g' *.f
sed -i -e 's/cublas_DDOT/ DDOT/g' *.f
sed -i -e 's/cublas_DNRM2/ DNRM2/g' *.f
sed -i -e 's/cublas_DROT/ DROT/g' *.f
sed -i -e 's/cublas_DROTG/ DROTG/g' *.f
sed -i -e 's/cublas_DROTM/ DROTM/g' *.f
sed -i -e 's/cublas_DROTMG/ DROTMG/g' *.f
sed -i -e 's/cublas_DSCAL/ DSCAL/g' *.f
sed -i -e 's/cublas_DSWAP/ DSWAP/g' *.f
sed -i -e 's/cublas_DGEMV/ DGEMV/g' *.f
sed -i -e 's/cublas_DGEMM/ DGEMM/g' *.f
sed -i -e 's/cublas_DGER/ DGER/g' *.f
sed -i -e 's/cublas_DSYR/ DSYR/g' *.f
sed -i -e 's/cublas_DTRMV/ DTRMV/g' *.f
sed -i -e 's/cublas_DTRSV/ DTRSV/g' *.f
sed -i -e 's/cublas_DSYMM/ DSYMM/g' *.f
sed -i -e 's/cublas_DSYRK/ DSYRK/g' *.f
sed -i -e 's/cublas_DTRMM/ DTRMM/g' *.f
sed -i -e 's/cublas_DTRSM/ DTRSM/g' *.f
sed -i -e 's/cublas_DSYR2K/ DSYR2K/g' *.f

[/codebox]

Compile the fortran.c with -DUSE_CUBLAS_THUNKING.

Compile your program with -lcudart -lcublas fortran.o
and possibly -lblas to take care of the non-translated functions.
Have fun.
Ok, I got it to work. I did integrate lapack routine with cublas.



I am using this script (executable) to add cublas_ to all blas functions. This is in-place replacement. Backup your program before using

Check the program after. Some function could be twice modify giving cublas_cublas_XXXXX



[codebox]

sed -i -e 's/ZGEMM/cublas_ZGEMM/g' *.f90

sed -i -e 's/ZGEMV/cublas_ZGEMV/g' *.f90

sed -i -e 's/ZAXPY/cublas_ZAXPY/g' *.f90

sed -i -e 's/ZCOPY/cublas_ZCOPY/g' *.f90

sed -i -e 's/ZDOTU/cublas_ZDOTU/g' *.f90

sed -i -e 's/ZSCAL/cublas_ZSCAL/g' *.f90

sed -i -e 's/ZSWAP/cublas_ZSWAP/g' *.f90

sed -i -e 's/ZTRMV/cublas_ZTRMV/g' *.f90

sed -i -e 's/ZGERU/cublas_ZGERU/g' *.f90

sed -i -e 's/ZGERC/cublas_ZGERC/g' *.f90

sed -i -e 's/ZTRMM/cublas_ZTRMM/g' *.f90

sed -i -e 's/ZSYMM/cublas_ZSYMM/g' *.f90

sed -i -e 's/ZSYRK/cublas_ZSYRK/g' *.f90

sed -i -e 's/ZHERK/cublas_ZHERK/g' *.f90

sed -i -e 's/ZTRSM/cublas_ZTRSM/g' *.f90



sed -i -e 's/IDAMAX/cublas_IDAMAX/g' *.f90

sed -i -e 's/IDAMIN/cublas_IDAMIN/g' *.f90

sed -i -e 's/DASUM/cublas_DASUM/g' *.f90

sed -i -e 's/DAXPY/cublas_DAXPY/g' *.f90

sed -i -e 's/DCOPY/cublas_DCOPY/g' *.f90

sed -i -e 's/DDOT/cublas_DDOT/g' *.f90

sed -i -e 's/DNRM2/cublas_DNRM2/g' *.f90

sed -i -e 's/DROT/cublas_DROT/g' *.f90

sed -i -e 's/DROTG/cublas_DROTG/g' *.f90

sed -i -e 's/DROTM/cublas_DROTM/g' *.f90

sed -i -e 's/DROTMG/cublas_DROTMG/g' *.f90

sed -i -e 's/DSCAL/cublas_DSCAL/g' *.f90

sed -i -e 's/DSWAP/cublas_DSWAP/g' *.f90

sed -i -e 's/DGEMV/cublas_DGEMV/g' *.f90

sed -i -e 's/DGEMM/cublas_DGEMM/g' *.f90

sed -i -e 's/DGER/cublas_DGER/g' *.f90

sed -i -e 's/DSYR/cublas_DSYR/g' *.f90

sed -i -e 's/DTRMV/cublas_DTRMV/g' *.f90

sed -i -e 's/DTRSV/cublas_DTRSV/g' *.f90

sed -i -e 's/DSYMM/cublas_DSYMM/g' *.f90

sed -i -e 's/DSYRK/cublas_DSYRK/g' *.f90

sed -i -e 's/DTRMM/cublas_DTRMM/g' *.f90

sed -i -e 's/DTRSM/cublas_DTRSM/g' *.f90

sed -i -e 's/DSYR2K/cublas_DSYR2K/g' *.f90





sed -i -e 's/ZGEMM/cublas_ZGEMM/g' *.f

sed -i -e 's/ZGEMV/cublas_ZGEMV/g' *.f

sed -i -e 's/ZAXPY/cublas_ZAXPY/g' *.f

sed -i -e 's/ZCOPY/cublas_ZCOPY/g' *.f

sed -i -e 's/ZDOTU/cublas_ZDOTU/g' *.f

sed -i -e 's/ZSCAL/cublas_ZSCAL/g' *.f

sed -i -e 's/ZSWAP/cublas_ZSWAP/g' *.f

sed -i -e 's/ZTRMV/cublas_ZTRMV/g' *.f

sed -i -e 's/ZGERU/cublas_ZGERU/g' *.f

sed -i -e 's/ZGERC/cublas_ZGERC/g' *.f

sed -i -e 's/ZTRMM/cublas_ZTRMM/g' *.f

sed -i -e 's/ZSYMM/cublas_ZSYMM/g' *.f

sed -i -e 's/ZSYRK/cublas_ZSYRK/g' *.f

sed -i -e 's/ZHERK/cublas_ZHERK/g' *.f

sed -i -e 's/ZTRSM/cublas_ZTRSM/g' *.f



sed -i -e 's/IDAMAX/cublas_IDAMAX/g' *.f

sed -i -e 's/IDAMIN/cublas_IDAMIN/g' *.f

sed -i -e 's/DASUM/cublas_DASUM/g' *.f

sed -i -e 's/DAXPY/cublas_DAXPY/g' *.f

sed -i -e 's/DCOPY/cublas_DCOPY/g' *.f

sed -i -e 's/DDOT/cublas_DDOT/g' *.f

sed -i -e 's/DNRM2/cublas_DNRM2/g' *.f

sed -i -e 's/DROT/cublas_DROT/g' *.f

sed -i -e 's/DROTG/cublas_DROTG/g' *.f

sed -i -e 's/DROTM/cublas_DROTM/g' *.f

sed -i -e 's/DROTMG/cublas_DROTMG/g' *.f

sed -i -e 's/DSCAL/cublas_DSCAL/g' *.f

sed -i -e 's/DSWAP/cublas_DSWAP/g' *.f

sed -i -e 's/DGEMV/cublas_DGEMV/g' *.f

sed -i -e 's/DGEMM/cublas_DGEMM/g' *.f

sed -i -e 's/DGER/cublas_DGER/g' *.f

sed -i -e 's/DSYR/cublas_DSYR/g' *.f

sed -i -e 's/DTRMV/cublas_DTRMV/g' *.f

sed -i -e 's/DTRSV/cublas_DTRSV/g' *.f

sed -i -e 's/DSYMM/cublas_DSYMM/g' *.f

sed -i -e 's/DSYRK/cublas_DSYRK/g' *.f

sed -i -e 's/DTRMM/cublas_DTRMM/g' *.f

sed -i -e 's/DTRSM/cublas_DTRSM/g' *.f

sed -i -e 's/DSYR2K/cublas_DSYR2K/g' *.f



[/codebox]



Another script removes the suffixes:



[codebox]

sed -i -e 's/cublas_ZGEMM/ ZGEMM/g' *.f90

sed -i -e 's/cublas_ZGEMV/ ZGEMV/g' *.f90

sed -i -e 's/cublas_ZAXPY/ ZAXPY/g' *.f90

sed -i -e 's/cublas_ZCOPY/ ZCOPY/g' *.f90

sed -i -e 's/cublas_ZDOTU/ ZDOTU/g' *.f90

sed -i -e 's/cublas_ZSCAL/ ZSCAL/g' *.f90

sed -i -e 's/cublas_ZSWAP/ ZSWAP/g' *.f90

sed -i -e 's/cublas_ZTRMV/ ZTRMV/g' *.f90

sed -i -e 's/cublas_ZGERU/ ZGERU/g' *.f90

sed -i -e 's/cublas_ZGERC/ ZGERC/g' *.f90

sed -i -e 's/cublas_ZTRMM/ ZTRMM/g' *.f90

sed -i -e 's/cublas_ZSYMM/ ZSYMM/g' *.f90

sed -i -e 's/cublas_ZSYRK/ ZSYRK/g' *.f90

sed -i -e 's/cublas_ZHERK/ ZHERK/g' *.f90

sed -i -e 's/cublas_ZTRSM/ ZTRSM/g' *.f90



sed -i -e 's/cublas_IDAMAX/ IDAMAX/g' *.f90

sed -i -e 's/cublas_IDAMIN/ IDAMIN/g' *.f90

sed -i -e 's/cublas_DASUM/ DASUM/g' *.f90

sed -i -e 's/cublas_DAXPY/ DAXPY/g' *.f90

sed -i -e 's/cublas_DCOPY/ DCOPY/g' *.f90

sed -i -e 's/cublas_DDOT/ DDOT/g' *.f90

sed -i -e 's/cublas_DNRM2/ DNRM2/g' *.f90

sed -i -e 's/cublas_DROT/ DROT/g' *.f90

sed -i -e 's/cublas_DROTG/ DROTG/g' *.f90

sed -i -e 's/cublas_DROTM/ DROTM/g' *.f90

sed -i -e 's/cublas_DROTMG/ DROTMG/g' *.f90

sed -i -e 's/cublas_DSCAL/ DSCAL/g' *.f90

sed -i -e 's/cublas_DSWAP/ DSWAP/g' *.f90

sed -i -e 's/cublas_DGEMV/ DGEMV/g' *.f90

sed -i -e 's/cublas_DGEMM/ DGEMM/g' *.f90

sed -i -e 's/cublas_DGER/ DGER/g' *.f90

sed -i -e 's/cublas_DSYR/ DSYR/g' *.f90

sed -i -e 's/cublas_DTRMV/ DTRMV/g' *.f90

sed -i -e 's/cublas_DTRSV/ DTRSV/g' *.f90

sed -i -e 's/cublas_DSYMM/ DSYMM/g' *.f90

sed -i -e 's/cublas_DSYRK/ DSYRK/g' *.f90

sed -i -e 's/cublas_DTRMM/ DTRMM/g' *.f90

sed -i -e 's/cublas_DTRSM/ DTRSM/g' *.f90

sed -i -e 's/cublas_DSYR2K/ DSYR2K/g' *.f90



sed -i -e 's/cublas_ZGEMM/ ZGEMM/g' *.f

sed -i -e 's/cublas_ZGEMV/ ZGEMV/g' *.f

sed -i -e 's/cublas_ZAXPY/ ZAXPY/g' *.f

sed -i -e 's/cublas_ZCOPY/ ZCOPY/g' *.f

sed -i -e 's/cublas_ZDOTU/ ZDOTU/g' *.f

sed -i -e 's/cublas_ZSCAL/ ZSCAL/g' *.f

sed -i -e 's/cublas_ZSWAP/ ZSWAP/g' *.f

sed -i -e 's/cublas_ZTRMV/ ZTRMV/g' *.f

sed -i -e 's/cublas_ZGERU/ ZGERU/g' *.f

sed -i -e 's/cublas_ZGERC/ ZGERC/g' *.f

sed -i -e 's/cublas_ZTRMM/ ZTRMM/g' *.f

sed -i -e 's/cublas_ZSYMM/ ZSYMM/g' *.f

sed -i -e 's/cublas_ZSYRK/ ZSYRK/g' *.f

sed -i -e 's/cublas_ZHERK/ ZHERK/g' *.f

sed -i -e 's/cublas_ZTRSM/ ZTRSM/g' *.f



sed -i -e 's/cublas_IDAMAX/ IDAMAX/g' *.f

sed -i -e 's/cublas_IDAMIN/ IDAMIN/g' *.f

sed -i -e 's/cublas_DASUM/ DASUM/g' *.f

sed -i -e 's/cublas_DAXPY/ DAXPY/g' *.f

sed -i -e 's/cublas_DCOPY/ DCOPY/g' *.f

sed -i -e 's/cublas_DDOT/ DDOT/g' *.f

sed -i -e 's/cublas_DNRM2/ DNRM2/g' *.f

sed -i -e 's/cublas_DROT/ DROT/g' *.f

sed -i -e 's/cublas_DROTG/ DROTG/g' *.f

sed -i -e 's/cublas_DROTM/ DROTM/g' *.f

sed -i -e 's/cublas_DROTMG/ DROTMG/g' *.f

sed -i -e 's/cublas_DSCAL/ DSCAL/g' *.f

sed -i -e 's/cublas_DSWAP/ DSWAP/g' *.f

sed -i -e 's/cublas_DGEMV/ DGEMV/g' *.f

sed -i -e 's/cublas_DGEMM/ DGEMM/g' *.f

sed -i -e 's/cublas_DGER/ DGER/g' *.f

sed -i -e 's/cublas_DSYR/ DSYR/g' *.f

sed -i -e 's/cublas_DTRMV/ DTRMV/g' *.f

sed -i -e 's/cublas_DTRSV/ DTRSV/g' *.f

sed -i -e 's/cublas_DSYMM/ DSYMM/g' *.f

sed -i -e 's/cublas_DSYRK/ DSYRK/g' *.f

sed -i -e 's/cublas_DTRMM/ DTRMM/g' *.f

sed -i -e 's/cublas_DTRSM/ DTRSM/g' *.f

sed -i -e 's/cublas_DSYR2K/ DSYR2K/g' *.f



[/codebox]



Compile the fortran.c with -DUSE_CUBLAS_THUNKING.



Compile your program with -lcudart -lcublas fortran.o

and possibly -lblas to take care of the non-translated functions.

Have fun.

#10
Posted 11/18/2009 08:54 PM   
Scroll To Top