Matlab & CUDA Cuda scripts executed from Matlab

jasonroger · January 20, 2012, 9:04am

Good morning to everyone,

I am a student who’s not very expert neither in Matlab nor in Cuda yet. My question is pretty basilar. At the moment, I have a Cuda script which reads a matrix from a binary file. To be more precise, such a matrix is obtained from previous calculations on Matlab, so I basically save it on a binary file and then the Cuda script reads it. Is there a way, an open source solution, which could allow me to launch the Cuda script directly from Matlab, in order to avoid the writing of a binary file?
I use Linux (CentOS), Matlab 7.12.0 (R2011a) and Cuda 4.0. I do not own Parallel Computing Toolbox (PCT) on Matlab and I have also read some about a Cuda plugin for Matlab, but it does not seem to be supported any longer by Cuda, isn’t it?

Many thanks for your kind attention and my best regards to you all!

Jason.
_

Gaszton · January 20, 2012, 9:14am

Hi, with the parallel computing toolbox you can call cuda kernels (complied to ptx) from matlab.
It is very easy and convenient to use ptx cuda kernels from matlab.
without the toolbox, i dont know any other way

jasonroger · January 20, 2012, 10:13am

Thank you very much Gaszton,

I was also thinking about another possibile solution. What if I wrote the Cuda script on a separated file (.cu), then I compiled it with nvcc and finally I linked it, as an extern library, to a mex file which runs the non - Cuda part of my application. Do you think it could work?

Thank you all for your kind attention and my best regards,

Jason.

melonakos · January 21, 2012, 5:57pm

Jason, you have 3 options:

Yes, you can do what you want to do in calling CUDA code as an external library. The upside is that you don’t have to buy anything. The downside is that you’re stuck with maintaining low-level code, and you’ll burn a lot of time hassling with it. The article you reference is the right place to get started on that and there are tons of posts in these forums from people who struggled to get that stuff to work.
You can buy PCT from MathWorks. But it is slower than the CPU for most problems and likely lacks the functions you need anyway.
You can buy Jacket from AccelerEyes (that’s me). You have to pay for it ($350 for academic).

For a comparison of 2 vs 3, see http://accelereyes.com/compare

Assuming that #1 is the path that you’ll go, you’re welcome to post on our forums if you have any specific MATLAB integration questions: http://forums.accelereyes.com

Cheers!

jasonroger · January 23, 2012, 10:48am

Thank you very much for your clear and detailed answer, J.Melonakos!
External Image
Ok, I will give a try at solution number 1 and then I will see what happens, thank you very much again!

short · January 23, 2012, 8:53pm

Hi,

You can avoid dumping your results to a binary, and instead use ArrayFire(which is free) using Matlab’s MEX interface. This way, you could move your data in CPU memory to GPU calling array class constructor and use ArrayFire functions to do simple to complex operations like FFT,convolutions, image processing, etc.

#include <string.h>

#include "mex.h" 

void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])

{

float* data_cpu = (float *)mxGetPr(prhs[0]);

int M = 100, N = 100;

// data_gpu is in GPU memory

array data_gpu = array(M,N,data_cpu);

// Do basic to complex math on data_gpu

array res =  fft(data_gpu);

print(res);

}

You can also integrate custom CUDA code with ArrayFire… See pi_cuda example!!

Mcewen · September 2, 2015, 9:39pm

Hi,

It seems to be the right place to post my similar question. I am quite new with CUDA computing and I mostly use Matlab. What is the best solution to use cuda code in Matlab ? Basically, we can use PTX files or MEX files. I am looking, as everybody, for the fastest computing way…
So far, I am using ptx files compiled with nvcc. It is not convenient for debugging and for using CUDA libraries. I did not try MEX files yet.
It seems that the main advantage of MEX implementation is the possibility to use CUDA libraries (for FFT, etc.)? What about the computation cost ?

Thanks in advance!

Ewen

CudaaduC · September 2, 2015, 11:23pm

I work with CUDA mex files regularly and that is the best method of calling CUDA code.

This link goes through the process;

[url]http://www.orangeowlsolutions.com/archives/498[/url]

and there are lots of examples of CUDA mex files on that site.

Since MATLAB stores arrays in contiguous column major C style format, it makes it very easy to pass pointers either direction.

Keep in mind that MATLAB uses 64 bit double by default, so make sure you cast to single/float when using GPU accelerated code unless you have a GPU with high DP performance.

There is very little overhead with mex files (which are essentially dlls), other than some MATLAB specific overhead the first time you call.

Here is an example of cpp file for a mex version of sparse group lasso using both cuBLAS and cuSPARSE;

[url]https://github.com/OlegKonings/BCI_EEG_blk_diag_admm_multi_lambda/blob/master/GroupMextest/GroupMextest/GLmex.cpp[/url]

Mcewen · September 3, 2015, 8:07am

Thanks CudaaduC!

I will try using MEX files and keep you in touch.
It seems that we can compile the files from Visual Studio or Matlab. What is the best solution ?

Best.

CudaaduC · September 3, 2015, 7:03pm

I have used both methods, but prefer to compile from Visual Studio.

Initially Visual Studio can be a pain because of the default CUDA settings like the -G flag for debug mode which throws off some first time users because the CUDA code runs much more slowly with the -G flag than in release mode with optimizations applied.

Either way make sure you compile for the highest possible arch/code generation for your project and try toggling the “use_fast_math” flag as that can make a huge performance difference if you are willing to theoretically lose some precision.

In my limited tests over time comparing results with and without fast math flag (compared to MATLAB 64 bit calculations for the same set of computations) I found little accuracy differences between the two compile settings. Your results may vary and I suggest you examine the CUDA math documentation;

[url]http://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__SINGLE.html#group__CUDA__MATH__SINGLE[/url]

Mcewen · September 7, 2015, 3:53pm

Hi,

I have tried compiling mex files both from Visual Studio and Matlab, and I had troubles in both cases…

First of all, I noticed there is a difference of philosophy. Whith Matlab, the .cu is directly compiled whereas with VS I have to work with a .cpp function that contains a wrapper calling the .cu file. I find that compiling a single .cu into a .ptx file, directly called in Matlab, is much simpler!

Using Matlab, I first tried with mexcuda as mentioned on the Mathworks page: Compile MEX-function for GPU computation - MATLAB mexcuda - MathWorks France . I get this error:

>> mexcuda mexGPUExample.cu
Undefined function 'mexcuda' for input arguments of type
'char'.

I have also tried to use the mex call by:

>> mex mexGPUExample.cu
Building with 'NVIDIA CUDA Compiler'.
Error using mex
nvcc fatal   : Unsupported gpu architecture 'compute_13'

What is the proper way to compile .cu files in Matlab (mex or mexcuda) ?

With VS, I followed the procedure given in your links (http://www.orangeowlsolutions.com/archives/498) and I get these errors:

MEX_test_double_cudaWrapper.obj : error LNK2019: unresolved external symbol mexPrintf referenced in function mexFunction
...Debug\VS_compile_MEX.mexw64 : fatal error LNK1120: 1 unresolved externals

Any help to solve one of these issues is greately appreciated!
Thanks!

CudaaduC · September 7, 2015, 6:49pm

Something is not configured or linked correctly. These links show images what the properties in Visual Studio should look like;

[url]Imgur: The magic of the Internet

What GPU are you using and did you install CUDA after Visual Studio was already installed on your PC?

njuffa · September 7, 2015, 9:31pm

Recent versions of CUDA (7.x) no longer support GPUs with compute capability < 2.0. Here, compilation for compute capability 1.3 was attempted. What GPU is in your system? Set the nvcc flags to produce code for the appropriate compute capability, either with the -arch switch or the -gencode switch.

Mcewen · September 8, 2015, 9:52am

Thank you for your replies and the printscreens. It’s working now with visual studio ! I will do some tests and try double/float implementations.

I have a Quadro K2100M (CC=3.0) with CUDA 7.0, Matlab R2014a and VS2010. I have installed CUDA after VS.

Still no success with Matlab but I am OK to use VS…

Thanks!

Mcewen · September 9, 2015, 2:29pm

Hey guys,

My basic mex compilation with only doubles was OK.
I have now problems when adding float, int, scalars.

MEX_..._cudaWrapper.obj : error LNK2019: unresolved external symbol _mxCreateDoubleMatrix_730 referenced in function _mexFunction
1>MEX_..._cudaWrapper.obj : error LNK2019: unresolved external symbol _mxGetPr referenced in function _mexFunction
1>MEX_..._cudaWrapper.obj : error LNK2019: unresolved external symbol _mxGetScalar referenced in function _mexFunction

I have tried to include matrix.h but there is no change… Do I have to include it ?

I have put in the linker inputs

C:\Program Files\MATLAB\R2014a\extern\lib\win64\microsoft\libmex.lib
C:\Program Files\MATLAB\R2014a\extern\lib\win64\microsoft\libmx.lib
$(CudaToolkitLibdir)\cudart.lib
curand.lib

and the additionnal include directories is

C:\Program Files\MATLAB\R2014a\extern\include

BTW, is it the proper way to put int as an input ?

int Fs = (int)mxGetScalar(prhs[1]);

Thanks in advance for your help!