nvcc + Mkl functions

loloasb · August 24, 2011, 8:16am

Hi,

I wrote a code which use MKL and CUBLAS functions.
The MKL functions used are the geqrf and the larft functions.

The problem is as follow :

When I compile with icc the execution time of the geqrf function takes 4062 ms, whereas with nvcc, it takes 61959 ms, 20x more …
For the larft function, it takes 3522 ms with icc and 8104 ms with nvcc.

I need to use this function, I know there is a CULA geqrf version but just for single precision.

I would like to test my code in double precision and so, use dgeqrf from Mkl …

Maybe MKL’s function aren’t optimized with nvcc … ?

Has someone have any ideas ?

Here is my Makefile :

CC=nvcc
CFLAG=-O3
LIBS=-lcuda -lcudart -lcula -lcublas -m64
INCLUDE_CULA=/usr/local/cula//include
LIB_CULA=/usr/local/cula//lib64
INCLUDE_MKL=/opt/intel/mkl/include

build 64:
$(CC) $(CFLAG) -DReal=float qrComplet.cu $(LIBS) -I$(INCLUDE_CULA) -L$(LIB_CULA) -I$(INCLUDE_MKL) --linker-options /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a,/opt/intel/mkl/lib/intel64/libmkl_sequential.a,/opt/intel/mkl/lib/intel64/libmkl_core.a,-lpthread -o qrComplet

Thank you.

fcs · August 24, 2011, 8:30am

The C of your cuda file is compiled by gcc by default, so it may not optimize what you want.
If you want it to be compiled with icc you have to pass the “-ccbin=icc” option to nvcc

If you haven’t apply the patch to the intel math.h you will probably encounter compilation error.
And if you use double complex cublas fonctions, you will get errors because of a difference of interpretation on 16B aligned pointers between gcc based code (as cublas is compiled with gcc) and icc based code.

Maybe the magma project (http://icl.cs.utk.edu/magma/software/index.html) will provide the hybrid implementation of the lapack fonctions you need…

Good luck!

loloasb · August 24, 2011, 8:46am

Thank you for your answer.

With the -ccbin=icc option, I’ve this error :

/usr/local/cuda/bin/…/include/host_config.h(108): catastrophic error: #error directive: – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!
#error – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!
^

make: *** [build] Error 4

Is it the error you told me ?

I’ve include mkl.h in my code.

Thanks

fcs · August 24, 2011, 9:11am

This error seems quite explicit, your version of icc is too old.

An other workaround i didn’t mention is to cpmile everything with icc.
nvcc compiler is mandatory only for kernel definition and call.
if you have only Cuda API and cublas fonctions, you can compile wwithout nvcc.
You will have to include “cuda_runtime.h” and “cublas.h” in your C file, and specifie the include dir, lib dir and link with -lcublas -lcudart -lcuda

avidday · August 24, 2011, 9:20am

If you are using CUBLAS and MKL, why are you compiling with nvcc at all? nvcc is not required to use CUBLAS.

If you have actual device code which needs to be compiled, put it in a separate .cu file containing a C/C++ wrapper function to access the code, and compile that with nvcc, then link the resulting object file with icc. People have been using MKL and CUBLAS together forever without a problem (all those TOP500 Linpack results, for example).

loloasb · August 24, 2011, 9:24am

I have the 12.0 version of icc.

I’ve the same error when I compile with icc.

This is my makefile :

CC=icc
CFLAG=-O3
LIBS=-lcuda -lcudart -lcublas -m64
LIB_CUDA=/usr/local/cuda/lib64
INCLUDE=/usr/local/cuda/include

build 64:
$(CC) $(CFLAG) -DReal=float qrCompletGPU.c -I$(INCLUDE) -L$(LIB_CUDA) $(LIBS) -lpthread -o qrComplet

…

loloasb · August 24, 2011, 9:26am

I compile with nvcc because I use Cuda kernel in my code …

avidday · August 24, 2011, 9:39am

So take the kernel out of the compilation unit shared the mkl and cublas calls, compile the CUDA code separately with nvcc, then link them afterwards. Problem solved.

loloasb · August 24, 2011, 9:43am

Even if I do that, I’ve a the same error :

/usr/local/cuda/include/host_config.h(108): catastrophic error: #error directive: – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

#error – unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

avidday · August 24, 2011, 9:48am

As has been said twice already don’t use icc with nvcc. You have an unsupported version of icc. But that doesn’t matter. Just compile the device code with nvcc+gcc, and the rest of your code with icc. Link your device code with the icc output and mkl and cublas and you are done.

loloasb · August 24, 2011, 9:59am

I didn’t use icc with nvcc. I compile my code that contains MKL ans CUBLAS functions just with icc.

avidday · August 24, 2011, 10:03am

The error message clearly says you are trying to compile CUDA code with icc. It is being generated by a macro inside a CUDA system header. So what have you included into that code that is bring CUDA headers into the compilation? To use cublas you need to include cublas.h and nothing else.

loloasb · August 24, 2011, 11:59am

Ok thanks I’ve understood. The code works now, it’s because I’ve included “cuda.h”…

But, I have to put some cuda kernel in my code, I haven’t understood how to compile the “device code” and the “host code”(MKL+ CUBLAS), separately.

Could you explain it again ?

Thanks.

loloasb · August 24, 2011, 12:05pm

The kernel I use is the transposition kernel.
I call this kernel inside loops, so I don’t understand how could I compile separatly …

avidday · August 24, 2011, 12:19pm

Make a “wrapper” host function which contains the kernel code in a .cu file, something like this:

__global__ kernel(arg1,arg2)

{

   ....

}

extern "C" int callkernel(arg1, arg2, .....)

{

    ....

    ....

    ....

kernel<<< ... >>>(arg1, arg2);

....

}

In your icc compiled code, then use callkernel to launch the kernel. Then link the resulting object from nvcc with the icc code. That is all there is to it.

loloasb · August 24, 2011, 12:24pm

Make a “wrapper” host function which contains the kernel code in a .cu file, something like this:
__global__ kernel(arg1,arg2)

{

   ....

}

extern "C" int callkernel(arg1, arg2, .....)

{

    ....

    ....

    ....

kernel<<< ... >>>(arg1, arg2);

....

}
In your icc compiled code, then use callkernel to launch the kernel. Then link the resulting object from nvcc with the icc code. That is all there is to it.

Ok thank you very much, I’m going to try it .

loloasb · August 25, 2011, 8:24am

The code compile and it works.
Thank you very much.