nvcc + Mkl functions
Hi,

I wrote a code which use MKL and CUBLAS functions.
The MKL functions used are the geqrf and the larft functions.

The problem is as follow :

When I compile with icc the execution time of the geqrf function takes 4062 ms, whereas with nvcc, it takes 61959 ms, 20x more ...
For the larft function, it takes 3522 ms with icc and 8104 ms with nvcc.

I need to use this function, I know there is a CULA geqrf version but just for single precision.

I would like to test my code in double precision and so, use dgeqrf from Mkl ...

Maybe MKL's function aren't optimized with nvcc ... ?

Has someone have any ideas ?

Here is my Makefile :

CC=nvcc
CFLAG=-O3
LIBS=-lcuda -lcudart -lcula -lcublas -m64
INCLUDE_CULA=/usr/local/cula//include
LIB_CULA=/usr/local/cula//lib64
INCLUDE_MKL=/opt/intel/mkl/include

build 64:
$(CC) $(CFLAG) -DReal=float qrComplet.cu $(LIBS) -I$(INCLUDE_CULA) -L$(LIB_CULA) -I$(INCLUDE_MKL) --linker-options /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a,/opt/intel/mkl/lib/intel64/libmkl_sequential.a,/opt/intel/mkl/lib/intel64/libmkl_core.a,-lpthread -o qrComplet

Thank you.
Hi,



I wrote a code which use MKL and CUBLAS functions.

The MKL functions used are the geqrf and the larft functions.



The problem is as follow :



When I compile with icc the execution time of the geqrf function takes 4062 ms, whereas with nvcc, it takes 61959 ms, 20x more ...

For the larft function, it takes 3522 ms with icc and 8104 ms with nvcc.



I need to use this function, I know there is a CULA geqrf version but just for single precision.



I would like to test my code in double precision and so, use dgeqrf from Mkl ...



Maybe MKL's function aren't optimized with nvcc ... ?



Has someone have any ideas ?



Here is my Makefile :



CC=nvcc

CFLAG=-O3

LIBS=-lcuda -lcudart -lcula -lcublas -m64

INCLUDE_CULA=/usr/local/cula//include

LIB_CULA=/usr/local/cula//lib64

INCLUDE_MKL=/opt/intel/mkl/include



build 64:

$(CC) $(CFLAG) -DReal=float qrComplet.cu $(LIBS) -I$(INCLUDE_CULA) -L$(LIB_CULA) -I$(INCLUDE_MKL) --linker-options /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a,/opt/intel/mkl/lib/intel64/libmkl_sequential.a,/opt/intel/mkl/lib/intel64/libmkl_core.a,-lpthread -o qrComplet



Thank you.

#1
Posted 08/24/2011 08:16 AM   
The C of your cuda file is compiled by gcc by default, so it may not optimize what you want.
If you want it to be compiled with icc you have to pass the "-ccbin=icc" option to nvcc

If you haven't apply the patch to the intel math.h you will probably encounter compilation error.
And if you use double complex cublas fonctions, you will get errors because of a difference of interpretation on 16B aligned pointers between gcc based code (as cublas is compiled with gcc) and icc based code.

Maybe the magma project (http://icl.cs.utk.edu/magma/software/index.html) will provide the hybrid implementation of the lapack fonctions you need...

Good luck!
The C of your cuda file is compiled by gcc by default, so it may not optimize what you want.

If you want it to be compiled with icc you have to pass the "-ccbin=icc" option to nvcc



If you haven't apply the patch to the intel math.h you will probably encounter compilation error.

And if you use double complex cublas fonctions, you will get errors because of a difference of interpretation on 16B aligned pointers between gcc based code (as cublas is compiled with gcc) and icc based code.



Maybe the magma project (http://icl.cs.utk.edu/magma/software/index.html) will provide the hybrid implementation of the lapack fonctions you need...



Good luck!

#2
Posted 08/24/2011 08:30 AM   
Thank you for your answer.

With the -ccbin=icc option, I've this error :

/usr/local/cuda/bin/../include/host_config.h(108): catastrophic error: #error directive: -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!
#error -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!
^

make: *** [build] Error 4

Is it the error you told me ?

I've include mkl.h in my code.

Thanks
Thank you for your answer.



With the -ccbin=icc option, I've this error :



/usr/local/cuda/bin/../include/host_config.h(108): catastrophic error: #error directive: -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

#error -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

^



make: *** [build] Error 4



Is it the error you told me ?



I've include mkl.h in my code.



Thanks

#3
Posted 08/24/2011 08:46 AM   
This error seems quite explicit, your version of icc is too old.

An other workaround i didn't mention is to cpmile everything with icc.
nvcc compiler is mandatory only for kernel definition and call.
if you have only Cuda API and cublas fonctions, you can compile wwithout nvcc.
You will have to include "cuda_runtime.h" and "cublas.h" in your C file, and specifie the include dir, lib dir and link with -lcublas -lcudart -lcuda
This error seems quite explicit, your version of icc is too old.



An other workaround i didn't mention is to cpmile everything with icc.

nvcc compiler is mandatory only for kernel definition and call.

if you have only Cuda API and cublas fonctions, you can compile wwithout nvcc.

You will have to include "cuda_runtime.h" and "cublas.h" in your C file, and specifie the include dir, lib dir and link with -lcublas -lcudart -lcuda

#4
Posted 08/24/2011 09:11 AM   
If you are using CUBLAS and MKL, why are you compiling with nvcc at all? nvcc is not required to use CUBLAS.

If you have actual device code which needs to be compiled, put it in a separate .cu file containing a C/C++ wrapper function to access the code, and compile that with nvcc, then link the resulting object file with icc. People have been using MKL and CUBLAS together forever without a problem (all those TOP500 Linpack results, for example).
If you are using CUBLAS and MKL, why are you compiling with nvcc at all? nvcc is not required to use CUBLAS.



If you have actual device code which needs to be compiled, put it in a separate .cu file containing a C/C++ wrapper function to access the code, and compile that with nvcc, then link the resulting object file with icc. People have been using MKL and CUBLAS together forever without a problem (all those TOP500 Linpack results, for example).

#5
Posted 08/24/2011 09:20 AM   
I have the 12.0 version of icc.

I've the same error when I compile with icc.

This is my makefile :

CC=icc
CFLAG=-O3
LIBS=-lcuda -lcudart -lcublas -m64
LIB_CUDA=/usr/local/cuda/lib64
INCLUDE=/usr/local/cuda/include

build 64:
$(CC) $(CFLAG) -DReal=float qrCompletGPU.c -I$(INCLUDE) -L$(LIB_CUDA) $(LIBS) -lpthread -o qrComplet

...
I have the 12.0 version of icc.



I've the same error when I compile with icc.



This is my makefile :



CC=icc

CFLAG=-O3

LIBS=-lcuda -lcudart -lcublas -m64

LIB_CUDA=/usr/local/cuda/lib64

INCLUDE=/usr/local/cuda/include



build 64:

$(CC) $(CFLAG) -DReal=float qrCompletGPU.c -I$(INCLUDE) -L$(LIB_CUDA) $(LIBS) -lpthread -o qrComplet



...

#6
Posted 08/24/2011 09:24 AM   
[quote name='avidday' date='24 August 2011 - 09:20 AM' timestamp='1314177620' post='1283492']
If you are using CUBLAS and MKL, why are you compiling with nvcc at all? nvcc is not required to use CUBLAS.

If you have actual device code which needs to be compiled, put it in a separate .cu file containing a C/C++ wrapper function to access the code, and compile that with nvcc, then link the resulting object file with icc. People have been using MKL and CUBLAS together forever without a problem (all those TOP500 Linpack results, for example).
[/quote]

I compile with nvcc because I use Cuda kernel in my code ...
[quote name='avidday' date='24 August 2011 - 09:20 AM' timestamp='1314177620' post='1283492']

If you are using CUBLAS and MKL, why are you compiling with nvcc at all? nvcc is not required to use CUBLAS.



If you have actual device code which needs to be compiled, put it in a separate .cu file containing a C/C++ wrapper function to access the code, and compile that with nvcc, then link the resulting object file with icc. People have been using MKL and CUBLAS together forever without a problem (all those TOP500 Linpack results, for example).





I compile with nvcc because I use Cuda kernel in my code ...

#7
Posted 08/24/2011 09:26 AM   
[quote name='loloasb' date='24 August 2011 - 12:26 PM' timestamp='1314178014' post='1283496']
I compile with nvcc because I use Cuda kernel in my code ...
[/quote]
So take the kernel out of the compilation unit shared the mkl and cublas calls, compile the CUDA code separately with nvcc, then link them afterwards. Problem solved.
[quote name='loloasb' date='24 August 2011 - 12:26 PM' timestamp='1314178014' post='1283496']

I compile with nvcc because I use Cuda kernel in my code ...



So take the kernel out of the compilation unit shared the mkl and cublas calls, compile the CUDA code separately with nvcc, then link them afterwards. Problem solved.

#8
Posted 08/24/2011 09:39 AM   
[quote name='avidday' date='24 August 2011 - 09:39 AM' timestamp='1314178750' post='1283505']
So take the kernel out of the compilation unit shared the mkl and cublas calls, compile the CUDA code separately with nvcc, then link them afterwards. Problem solved.
[/quote]

Even if I do that, I've a the same error :

/usr/local/cuda/include/host_config.h(108): catastrophic error: #error directive: -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!
#error -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!
[quote name='avidday' date='24 August 2011 - 09:39 AM' timestamp='1314178750' post='1283505']

So take the kernel out of the compilation unit shared the mkl and cublas calls, compile the CUDA code separately with nvcc, then link them afterwards. Problem solved.





Even if I do that, I've a the same error :



/usr/local/cuda/include/host_config.h(108): catastrophic error: #error directive: -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

#error -- unsupported ICC configuration! Only ICC 11.1 on Linux x86_64 is supported!

#9
Posted 08/24/2011 09:43 AM   
As has been said twice already don't use icc with nvcc. You have an unsupported version of icc. But that doesn't matter. Just compile the **device code** with nvcc+gcc, and the rest of your code with icc. Link your device code with the icc output and mkl and cublas and you are done.
As has been said twice already don't use icc with nvcc. You have an unsupported version of icc. But that doesn't matter. Just compile the **device code** with nvcc+gcc, and the rest of your code with icc. Link your device code with the icc output and mkl and cublas and you are done.

#10
Posted 08/24/2011 09:48 AM   
[quote name='avidday' date='24 August 2011 - 09:48 AM' timestamp='1314179297' post='1283508']
As has been said twice already don't use icc with nvcc. You have an unsupported version of icc. But that doesn't matter. Just compile the **device code** with nvcc+gcc, and the rest of your code with icc. Link your device code with the icc output and mkl and cublas and you are done.
[/quote]

I didn't use icc with nvcc. I compile my code that contains MKL ans CUBLAS functions just with icc.
[quote name='avidday' date='24 August 2011 - 09:48 AM' timestamp='1314179297' post='1283508']

As has been said twice already don't use icc with nvcc. You have an unsupported version of icc. But that doesn't matter. Just compile the **device code** with nvcc+gcc, and the rest of your code with icc. Link your device code with the icc output and mkl and cublas and you are done.





I didn't use icc with nvcc. I compile my code that contains MKL ans CUBLAS functions just with icc.

#11
Posted 08/24/2011 09:59 AM   
[quote name='loloasb' date='24 August 2011 - 12:59 PM' timestamp='1314179965' post='1283510']
I didn't use icc with nvcc. I compile my code that contains MKL ans CUBLAS functions just with icc.
[/quote]
The error message clearly says you are trying to compile CUDA code with icc. It is being generated by a macro inside a CUDA system header. So what have you included into that code that is bring CUDA headers into the compilation? To use cublas you need to include cublas.h and nothing else.
[quote name='loloasb' date='24 August 2011 - 12:59 PM' timestamp='1314179965' post='1283510']

I didn't use icc with nvcc. I compile my code that contains MKL ans CUBLAS functions just with icc.



The error message clearly says you are trying to compile CUDA code with icc. It is being generated by a macro inside a CUDA system header. So what have you included into that code that is bring CUDA headers into the compilation? To use cublas you need to include cublas.h and nothing else.

#12
Posted 08/24/2011 10:03 AM   
[quote name='avidday' date='24 August 2011 - 10:03 AM' timestamp='1314180237' post='1283514']
The error message clearly says you are trying to compile CUDA code with icc. It is being generated by a macro inside a CUDA system header. So what have you included into that code that is bring CUDA headers into the compilation? To use cublas you need to include cublas.h and nothing else.
[/quote]

Ok thanks I've understood. The code works now, it's because I've included "cuda.h"...

But, I have to put some cuda kernel in my code, I haven't understood how to compile the "device code" and the "host code"(MKL+ CUBLAS), separately.

Could you explain it again ?

Thanks.
[quote name='avidday' date='24 August 2011 - 10:03 AM' timestamp='1314180237' post='1283514']

The error message clearly says you are trying to compile CUDA code with icc. It is being generated by a macro inside a CUDA system header. So what have you included into that code that is bring CUDA headers into the compilation? To use cublas you need to include cublas.h and nothing else.





Ok thanks I've understood. The code works now, it's because I've included "cuda.h"...



But, I have to put some cuda kernel in my code, I haven't understood how to compile the "device code" and the "host code"(MKL+ CUBLAS), separately.



Could you explain it again ?



Thanks.

#13
Posted 08/24/2011 11:59 AM   
The kernel I use is the transposition kernel.
I call this kernel inside loops, so I don't understand how could I compile separatly ...
The kernel I use is the transposition kernel.

I call this kernel inside loops, so I don't understand how could I compile separatly ...

#14
Posted 08/24/2011 12:05 PM   
[quote name='loloasb' date='24 August 2011 - 03:05 PM' timestamp='1314187514' post='1283560']
The kernel I use is the transposition kernel.
I call this kernel inside loops, so I don't understand how could I compile separatly ...
[/quote]
Make a "wrapper" host function which contains the kernel code in a .cu file, something like this:

[code]__global__ kernel(arg1,arg2)
{
....
}

extern "C" int callkernel(arg1, arg2, .....)
{
....
....
....

kernel<<< ... >>>(arg1, arg2);

....
}[/code]

In your icc compiled code, then use callkernel to launch the kernel. Then link the resulting object from nvcc with the icc code. That is all there is to it.
[quote name='loloasb' date='24 August 2011 - 03:05 PM' timestamp='1314187514' post='1283560']

The kernel I use is the transposition kernel.

I call this kernel inside loops, so I don't understand how could I compile separatly ...



Make a "wrapper" host function which contains the kernel code in a .cu file, something like this:



__global__ kernel(arg1,arg2)

{

....

}



extern "C" int callkernel(arg1, arg2, .....)

{

....

....

....



kernel<<< ... >>>(arg1, arg2);



....

}




In your icc compiled code, then use callkernel to launch the kernel. Then link the resulting object from nvcc with the icc code. That is all there is to it.

#15
Posted 08/24/2011 12:19 PM   
Scroll To Top

Add Reply