I am a student who’s not very expert neither in Matlab nor in Cuda yet. My question is pretty basilar. At the moment, I have a Cuda script which reads a matrix from a binary file. To be more precise, such a matrix is obtained from previous calculations on Matlab, so I basically save it on a binary file and then the Cuda script reads it. Is there a way, an open source solution, which could allow me to launch the Cuda script directly from Matlab, in order to avoid the writing of a binary file?
I use Linux (CentOS), Matlab 7.12.0 (R2011a) and Cuda 4.0. I do not own Parallel Computing Toolbox (PCT) on Matlab and I have also read some about a Cuda plugin for Matlab, but it does not seem to be supported any longer by Cuda, isn’t it?
Many thanks for your kind attention and my best regards to you all!
Hi, with the parallel computing toolbox you can call cuda kernels (complied to ptx) from matlab.
It is very easy and convenient to use ptx cuda kernels from matlab.
without the toolbox, i dont know any other way
I was also thinking about another possibile solution. What if I wrote the Cuda script on a separated file (.cu), then I compiled it with nvcc and finally I linked it, as an extern library, to a mex file which runs the non - Cuda part of my application. Do you think it could work?
Thank you all for your kind attention and my best regards,
Yes, you can do what you want to do in calling CUDA code as an external library. The upside is that you don’t have to buy anything. The downside is that you’re stuck with maintaining low-level code, and you’ll burn a lot of time hassling with it. The article you reference is the right place to get started on that and there are tons of posts in these forums from people who struggled to get that stuff to work.
You can buy PCT from MathWorks. But it is slower than the CPU for most problems and likely lacks the functions you need anyway.
You can buy Jacket from AccelerEyes (that’s me). You have to pay for it ($350 for academic).
Assuming that #1 is the path that you’ll go, you’re welcome to post on our forums if you have any specific MATLAB integration questions: http://forums.accelereyes.com
Thank you very much for your clear and detailed answer, J.Melonakos! External Image
Ok, I will give a try at solution number 1 and then I will see what happens, thank you very much again!
You can avoid dumping your results to a binary, and instead use ArrayFire(which is free) using Matlab’s MEX interface. This way, you could move your data in CPU memory to GPU calling array class constructor and use ArrayFire functions to do simple to complex operations like FFT,convolutions, image processing, etc.
#include <string.h>
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
float* data_cpu = (float *)mxGetPr(prhs[0]);
int M = 100, N = 100;
// data_gpu is in GPU memory
array data_gpu = array(M,N,data_cpu);
// Do basic to complex math on data_gpu
array res = fft(data_gpu);
print(res);
}
You can also integrate custom CUDA code with ArrayFire… See pi_cuda example!!
It seems to be the right place to post my similar question. I am quite new with CUDA computing and I mostly use Matlab. What is the best solution to use cuda code in Matlab ? Basically, we can use PTX files or MEX files. I am looking, as everybody, for the fastest computing way…
So far, I am using ptx files compiled with nvcc. It is not convenient for debugging and for using CUDA libraries. I did not try MEX files yet.
It seems that the main advantage of MEX implementation is the possibility to use CUDA libraries (for FFT, etc.)? What about the computation cost ?
and there are lots of examples of CUDA mex files on that site.
Since MATLAB stores arrays in contiguous column major C style format, it makes it very easy to pass pointers either direction.
Keep in mind that MATLAB uses 64 bit double by default, so make sure you cast to single/float when using GPU accelerated code unless you have a GPU with high DP performance.
There is very little overhead with mex files (which are essentially dlls), other than some MATLAB specific overhead the first time you call.
Here is an example of cpp file for a mex version of sparse group lasso using both cuBLAS and cuSPARSE;
I have used both methods, but prefer to compile from Visual Studio.
Initially Visual Studio can be a pain because of the default CUDA settings like the -G flag for debug mode which throws off some first time users because the CUDA code runs much more slowly with the -G flag than in release mode with optimizations applied.
Either way make sure you compile for the highest possible arch/code generation for your project and try toggling the “use_fast_math” flag as that can make a huge performance difference if you are willing to theoretically lose some precision.
In my limited tests over time comparing results with and without fast math flag (compared to MATLAB 64 bit calculations for the same set of computations) I found little accuracy differences between the two compile settings. Your results may vary and I suggest you examine the CUDA math documentation;
I have tried compiling mex files both from Visual Studio and Matlab, and I had troubles in both cases…
First of all, I noticed there is a difference of philosophy. Whith Matlab, the .cu is directly compiled whereas with VS I have to work with a .cpp function that contains a wrapper calling the .cu file. I find that compiling a single .cu into a .ptx file, directly called in Matlab, is much simpler!
Recent versions of CUDA (7.x) no longer support GPUs with compute capability < 2.0. Here, compilation for compute capability 1.3 was attempted. What GPU is in your system? Set the nvcc flags to produce code for the appropriate compute capability, either with the -arch switch or the -gencode switch.
My basic mex compilation with only doubles was OK.
I have now problems when adding float, int, scalars.
MEX_..._cudaWrapper.obj : error LNK2019: unresolved external symbol _mxCreateDoubleMatrix_730 referenced in function _mexFunction
1>MEX_..._cudaWrapper.obj : error LNK2019: unresolved external symbol _mxGetPr referenced in function _mexFunction
1>MEX_..._cudaWrapper.obj : error LNK2019: unresolved external symbol _mxGetScalar referenced in function _mexFunction
I have tried to include matrix.h but there is no change… Do I have to include it ?