Complete cuBLAS anytime soon?

Is there a project to complete the cublas library anytime soon?
If not, would it be possible to provide the source code for one typical routine? Then I can continue my project!
I did try to get one more function cudaized but it does no work properly yet.

Since I don’t get any response, I will embark on the porting of (more complete)blas to cublas myself.

As of today, I have completed 80 Blas functions(single,double,complex,double complex) up to being able to compile with nvcc without errors.
I still have to arrange the grids and blocks values and convert the for loops for all functions.

Just a note to encourage you to keep going. I am sure there are a lot of people who will appreciate access to them.

MMB

CUDA 3.0 beta has a lot more BLAS functions implemented.
To check if the ones you need are in the library, you could use something similar to this:

lib64]$ nm -D libcublas.so |grep -y gemm
000000000006af40 T cublasCgemm
0000000000070a80 T cublasDgemm
000000000002ae00 T cublasSgemm
000000000006ff80 T cublasZgemm

lib64]$ nm -D libcublas.so |grep -y trsm
0000000000075590 T cublasCtrsm
0000000000079030 T cublasDtrsm
000000000004dcf0 T cublasStrsm
00000000000b6710 T cublasZtrsm

I’ve been contemplating the BLAS/LAPACK problem recently. There are things happening…

One project is CULA [url=“http://www.culatools.com/”]http://www.culatools.com/[/url] which has a number of BLAS and LAPACK routines available - some are free, some are not. The routine to solve the matrix equation A*X=B (lapack sgesv) is free. CULA seems to be a quasi-commercial operation that works closely with Nvidia, I gather.

Another project is MAGMA [url=“http://icl.cs.utk.edu/magma/”]http://icl.cs.utk.edu/magma/[/url] - this is a collaborative project involving Jack Dongarra and Vasily Volkov: “The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current “Multicore+GPU” systems.” MAGMA is meant to be releasing a number of new routines in a couple of weeks time.

I’ve been putting CULA’s sgesv routine to great use of late. But I suspect that the MAGMA routines will eventually dominate given the people and philosophies (computing approaches, freely available routines) behind it. I’m looking forward to what routines MAGMA will be making available in the middle of this month.

One needs to distinguish between BLAS and LAPACK. I believe both CULA and MAGMA are directed at LAPACK. Obviously, they probably use BLAS to build LAPACK, but I don’t see any BLAS offered on their websites.

MMB

Hi,
I just started the testing phase of the single precision routines. I am using the original blas testing driver routines. No good results yet. It looks like I will have to look at the driver to see what is going on.

Thank you for the 3.0 cublas.
Now I need to understand how to integrated lapack routine with this.

Ok, I got it to work. I did integrate lapack routine with cublas.

I am using this script (executable) to add cublas_ to all blas functions. This is in-place replacement. Backup your program before using

Check the program after. Some function could be twice modify giving cublas_cublas_XXXXX

[codebox]

sed -i -e ‘s/ZGEMM/cublas_ZGEMM/g’ *.f90

sed -i -e ‘s/ZGEMV/cublas_ZGEMV/g’ *.f90

sed -i -e ‘s/ZAXPY/cublas_ZAXPY/g’ *.f90

sed -i -e ‘s/ZCOPY/cublas_ZCOPY/g’ *.f90

sed -i -e ‘s/ZDOTU/cublas_ZDOTU/g’ *.f90

sed -i -e ‘s/ZSCAL/cublas_ZSCAL/g’ *.f90

sed -i -e ‘s/ZSWAP/cublas_ZSWAP/g’ *.f90

sed -i -e ‘s/ZTRMV/cublas_ZTRMV/g’ *.f90

sed -i -e ‘s/ZGERU/cublas_ZGERU/g’ *.f90

sed -i -e ‘s/ZGERC/cublas_ZGERC/g’ *.f90

sed -i -e ‘s/ZTRMM/cublas_ZTRMM/g’ *.f90

sed -i -e ‘s/ZSYMM/cublas_ZSYMM/g’ *.f90

sed -i -e ‘s/ZSYRK/cublas_ZSYRK/g’ *.f90

sed -i -e ‘s/ZHERK/cublas_ZHERK/g’ *.f90

sed -i -e ‘s/ZTRSM/cublas_ZTRSM/g’ *.f90

sed -i -e ‘s/IDAMAX/cublas_IDAMAX/g’ *.f90

sed -i -e ‘s/IDAMIN/cublas_IDAMIN/g’ *.f90

sed -i -e ‘s/DASUM/cublas_DASUM/g’ *.f90

sed -i -e ‘s/DAXPY/cublas_DAXPY/g’ *.f90

sed -i -e ‘s/DCOPY/cublas_DCOPY/g’ *.f90

sed -i -e ‘s/DDOT/cublas_DDOT/g’ *.f90

sed -i -e ‘s/DNRM2/cublas_DNRM2/g’ *.f90

sed -i -e ‘s/DROT/cublas_DROT/g’ *.f90

sed -i -e ‘s/DROTG/cublas_DROTG/g’ *.f90

sed -i -e ‘s/DROTM/cublas_DROTM/g’ *.f90

sed -i -e ‘s/DROTMG/cublas_DROTMG/g’ *.f90

sed -i -e ‘s/DSCAL/cublas_DSCAL/g’ *.f90

sed -i -e ‘s/DSWAP/cublas_DSWAP/g’ *.f90

sed -i -e ‘s/DGEMV/cublas_DGEMV/g’ *.f90

sed -i -e ‘s/DGEMM/cublas_DGEMM/g’ *.f90

sed -i -e ‘s/DGER/cublas_DGER/g’ *.f90

sed -i -e ‘s/DSYR/cublas_DSYR/g’ *.f90

sed -i -e ‘s/DTRMV/cublas_DTRMV/g’ *.f90

sed -i -e ‘s/DTRSV/cublas_DTRSV/g’ *.f90

sed -i -e ‘s/DSYMM/cublas_DSYMM/g’ *.f90

sed -i -e ‘s/DSYRK/cublas_DSYRK/g’ *.f90

sed -i -e ‘s/DTRMM/cublas_DTRMM/g’ *.f90

sed -i -e ‘s/DTRSM/cublas_DTRSM/g’ *.f90

sed -i -e ‘s/DSYR2K/cublas_DSYR2K/g’ *.f90

sed -i -e ‘s/ZGEMM/cublas_ZGEMM/g’ *.f

sed -i -e ‘s/ZGEMV/cublas_ZGEMV/g’ *.f

sed -i -e ‘s/ZAXPY/cublas_ZAXPY/g’ *.f

sed -i -e ‘s/ZCOPY/cublas_ZCOPY/g’ *.f

sed -i -e ‘s/ZDOTU/cublas_ZDOTU/g’ *.f

sed -i -e ‘s/ZSCAL/cublas_ZSCAL/g’ *.f

sed -i -e ‘s/ZSWAP/cublas_ZSWAP/g’ *.f

sed -i -e ‘s/ZTRMV/cublas_ZTRMV/g’ *.f

sed -i -e ‘s/ZGERU/cublas_ZGERU/g’ *.f

sed -i -e ‘s/ZGERC/cublas_ZGERC/g’ *.f

sed -i -e ‘s/ZTRMM/cublas_ZTRMM/g’ *.f

sed -i -e ‘s/ZSYMM/cublas_ZSYMM/g’ *.f

sed -i -e ‘s/ZSYRK/cublas_ZSYRK/g’ *.f

sed -i -e ‘s/ZHERK/cublas_ZHERK/g’ *.f

sed -i -e ‘s/ZTRSM/cublas_ZTRSM/g’ *.f

sed -i -e ‘s/IDAMAX/cublas_IDAMAX/g’ *.f

sed -i -e ‘s/IDAMIN/cublas_IDAMIN/g’ *.f

sed -i -e ‘s/DASUM/cublas_DASUM/g’ *.f

sed -i -e ‘s/DAXPY/cublas_DAXPY/g’ *.f

sed -i -e ‘s/DCOPY/cublas_DCOPY/g’ *.f

sed -i -e ‘s/DDOT/cublas_DDOT/g’ *.f

sed -i -e ‘s/DNRM2/cublas_DNRM2/g’ *.f

sed -i -e ‘s/DROT/cublas_DROT/g’ *.f

sed -i -e ‘s/DROTG/cublas_DROTG/g’ *.f

sed -i -e ‘s/DROTM/cublas_DROTM/g’ *.f

sed -i -e ‘s/DROTMG/cublas_DROTMG/g’ *.f

sed -i -e ‘s/DSCAL/cublas_DSCAL/g’ *.f

sed -i -e ‘s/DSWAP/cublas_DSWAP/g’ *.f

sed -i -e ‘s/DGEMV/cublas_DGEMV/g’ *.f

sed -i -e ‘s/DGEMM/cublas_DGEMM/g’ *.f

sed -i -e ‘s/DGER/cublas_DGER/g’ *.f

sed -i -e ‘s/DSYR/cublas_DSYR/g’ *.f

sed -i -e ‘s/DTRMV/cublas_DTRMV/g’ *.f

sed -i -e ‘s/DTRSV/cublas_DTRSV/g’ *.f

sed -i -e ‘s/DSYMM/cublas_DSYMM/g’ *.f

sed -i -e ‘s/DSYRK/cublas_DSYRK/g’ *.f

sed -i -e ‘s/DTRMM/cublas_DTRMM/g’ *.f

sed -i -e ‘s/DTRSM/cublas_DTRSM/g’ *.f

sed -i -e ‘s/DSYR2K/cublas_DSYR2K/g’ *.f

[/codebox]

Another script removes the suffixes:

[codebox]

sed -i -e ‘s/cublas_ZGEMM/ ZGEMM/g’ *.f90

sed -i -e ‘s/cublas_ZGEMV/ ZGEMV/g’ *.f90

sed -i -e ‘s/cublas_ZAXPY/ ZAXPY/g’ *.f90

sed -i -e ‘s/cublas_ZCOPY/ ZCOPY/g’ *.f90

sed -i -e ‘s/cublas_ZDOTU/ ZDOTU/g’ *.f90

sed -i -e ‘s/cublas_ZSCAL/ ZSCAL/g’ *.f90

sed -i -e ‘s/cublas_ZSWAP/ ZSWAP/g’ *.f90

sed -i -e ‘s/cublas_ZTRMV/ ZTRMV/g’ *.f90

sed -i -e ‘s/cublas_ZGERU/ ZGERU/g’ *.f90

sed -i -e ‘s/cublas_ZGERC/ ZGERC/g’ *.f90

sed -i -e ‘s/cublas_ZTRMM/ ZTRMM/g’ *.f90

sed -i -e ‘s/cublas_ZSYMM/ ZSYMM/g’ *.f90

sed -i -e ‘s/cublas_ZSYRK/ ZSYRK/g’ *.f90

sed -i -e ‘s/cublas_ZHERK/ ZHERK/g’ *.f90

sed -i -e ‘s/cublas_ZTRSM/ ZTRSM/g’ *.f90

sed -i -e ‘s/cublas_IDAMAX/ IDAMAX/g’ *.f90

sed -i -e ‘s/cublas_IDAMIN/ IDAMIN/g’ *.f90

sed -i -e ‘s/cublas_DASUM/ DASUM/g’ *.f90

sed -i -e ‘s/cublas_DAXPY/ DAXPY/g’ *.f90

sed -i -e ‘s/cublas_DCOPY/ DCOPY/g’ *.f90

sed -i -e ‘s/cublas_DDOT/ DDOT/g’ *.f90

sed -i -e ‘s/cublas_DNRM2/ DNRM2/g’ *.f90

sed -i -e ‘s/cublas_DROT/ DROT/g’ *.f90

sed -i -e ‘s/cublas_DROTG/ DROTG/g’ *.f90

sed -i -e ‘s/cublas_DROTM/ DROTM/g’ *.f90

sed -i -e ‘s/cublas_DROTMG/ DROTMG/g’ *.f90

sed -i -e ‘s/cublas_DSCAL/ DSCAL/g’ *.f90

sed -i -e ‘s/cublas_DSWAP/ DSWAP/g’ *.f90

sed -i -e ‘s/cublas_DGEMV/ DGEMV/g’ *.f90

sed -i -e ‘s/cublas_DGEMM/ DGEMM/g’ *.f90

sed -i -e ‘s/cublas_DGER/ DGER/g’ *.f90

sed -i -e ‘s/cublas_DSYR/ DSYR/g’ *.f90

sed -i -e ‘s/cublas_DTRMV/ DTRMV/g’ *.f90

sed -i -e ‘s/cublas_DTRSV/ DTRSV/g’ *.f90

sed -i -e ‘s/cublas_DSYMM/ DSYMM/g’ *.f90

sed -i -e ‘s/cublas_DSYRK/ DSYRK/g’ *.f90

sed -i -e ‘s/cublas_DTRMM/ DTRMM/g’ *.f90

sed -i -e ‘s/cublas_DTRSM/ DTRSM/g’ *.f90

sed -i -e ‘s/cublas_DSYR2K/ DSYR2K/g’ *.f90

sed -i -e ‘s/cublas_ZGEMM/ ZGEMM/g’ *.f

sed -i -e ‘s/cublas_ZGEMV/ ZGEMV/g’ *.f

sed -i -e ‘s/cublas_ZAXPY/ ZAXPY/g’ *.f

sed -i -e ‘s/cublas_ZCOPY/ ZCOPY/g’ *.f

sed -i -e ‘s/cublas_ZDOTU/ ZDOTU/g’ *.f

sed -i -e ‘s/cublas_ZSCAL/ ZSCAL/g’ *.f

sed -i -e ‘s/cublas_ZSWAP/ ZSWAP/g’ *.f

sed -i -e ‘s/cublas_ZTRMV/ ZTRMV/g’ *.f

sed -i -e ‘s/cublas_ZGERU/ ZGERU/g’ *.f

sed -i -e ‘s/cublas_ZGERC/ ZGERC/g’ *.f

sed -i -e ‘s/cublas_ZTRMM/ ZTRMM/g’ *.f

sed -i -e ‘s/cublas_ZSYMM/ ZSYMM/g’ *.f

sed -i -e ‘s/cublas_ZSYRK/ ZSYRK/g’ *.f

sed -i -e ‘s/cublas_ZHERK/ ZHERK/g’ *.f

sed -i -e ‘s/cublas_ZTRSM/ ZTRSM/g’ *.f

sed -i -e ‘s/cublas_IDAMAX/ IDAMAX/g’ *.f

sed -i -e ‘s/cublas_IDAMIN/ IDAMIN/g’ *.f

sed -i -e ‘s/cublas_DASUM/ DASUM/g’ *.f

sed -i -e ‘s/cublas_DAXPY/ DAXPY/g’ *.f

sed -i -e ‘s/cublas_DCOPY/ DCOPY/g’ *.f

sed -i -e ‘s/cublas_DDOT/ DDOT/g’ *.f

sed -i -e ‘s/cublas_DNRM2/ DNRM2/g’ *.f

sed -i -e ‘s/cublas_DROT/ DROT/g’ *.f

sed -i -e ‘s/cublas_DROTG/ DROTG/g’ *.f

sed -i -e ‘s/cublas_DROTM/ DROTM/g’ *.f

sed -i -e ‘s/cublas_DROTMG/ DROTMG/g’ *.f

sed -i -e ‘s/cublas_DSCAL/ DSCAL/g’ *.f

sed -i -e ‘s/cublas_DSWAP/ DSWAP/g’ *.f

sed -i -e ‘s/cublas_DGEMV/ DGEMV/g’ *.f

sed -i -e ‘s/cublas_DGEMM/ DGEMM/g’ *.f

sed -i -e ‘s/cublas_DGER/ DGER/g’ *.f

sed -i -e ‘s/cublas_DSYR/ DSYR/g’ *.f

sed -i -e ‘s/cublas_DTRMV/ DTRMV/g’ *.f

sed -i -e ‘s/cublas_DTRSV/ DTRSV/g’ *.f

sed -i -e ‘s/cublas_DSYMM/ DSYMM/g’ *.f

sed -i -e ‘s/cublas_DSYRK/ DSYRK/g’ *.f

sed -i -e ‘s/cublas_DTRMM/ DTRMM/g’ *.f

sed -i -e ‘s/cublas_DTRSM/ DTRSM/g’ *.f

sed -i -e ‘s/cublas_DSYR2K/ DSYR2K/g’ *.f

[/codebox]

Compile the fortran.c with -DUSE_CUBLAS_THUNKING.

Compile your program with -lcudart -lcublas fortran.o

and possibly -lblas to take care of the non-translated functions.

Have fun.