Matlab and CUDA: A Tutorial Very basic, zero-order introduction
  1 / 2    
As a result of my past couple of week's work with CUDA ( a lot of /argh.gif' class='bbc_emoticon' alt=':argh:' /> ) I've written up my notes in a very basic 22pp tutorial, using example codes. Much of the material is on these fora, but rather scattered around. Perhaps people will find the tutorial useful.

The Nvidia matlab package, while impressive, seems to me to rather miss the mark for a basic introduction to CUDA on matlab.

Happy to hear back from people with corrections and suggestions; it's meant to be an evolving document.

(Tutorial revised 6/26/08 - cleanup, corrections, and modest additions)

(Tutorial revised again 8/19/08 - minor additions)

Changed to external link 9/16/09: [url="http://faculty.washington.edu/dushaw/epubs/Matlab_CUDA_Tutorial_8_08.pdf"]http://faculty.washington.edu/dushaw/epub...torial_8_08.pdf[/url]
(Nvidia attachment seems to have been lost, alas.)

(Tutorial revised again 2/12/10 - minor additions)

[url="http://faculty.washington.edu/dushaw/epubs/Matlab_CUDA_Tutorial_2_10.pdf"]http://faculty.washington.edu/dushaw/epub...torial_2_10.pdf[/url]
As a result of my past couple of week's work with CUDA ( a lot of /argh.gif' class='bbc_emoticon' alt=':argh:' /> ) I've written up my notes in a very basic 22pp tutorial, using example codes. Much of the material is on these fora, but rather scattered around. Perhaps people will find the tutorial useful.



The Nvidia matlab package, while impressive, seems to me to rather miss the mark for a basic introduction to CUDA on matlab.



Happy to hear back from people with corrections and suggestions; it's meant to be an evolving document.



(Tutorial revised 6/26/08 - cleanup, corrections, and modest additions)



(Tutorial revised again 8/19/08 - minor additions)



Changed to external link 9/16/09: http://faculty.washington.edu/dushaw/epub...torial_8_08.pdf

(Nvidia attachment seems to have been lost, alas.)



(Tutorial revised again 2/12/10 - minor additions)



http://faculty.washington.edu/dushaw/epub...torial_2_10.pdf

#1
Posted 06/25/2008 12:12 AM   
May God Bless You!
May God Bless You!

Ignorance Rules; Knowledge Liberates!

#2
Posted 06/25/2008 04:03 AM   
I've cleaned up the tutorial document a bit, editorial corrections, clarified things here and there and added a new section on array dimensioning conventions. The file for download in the entry that started this thread has been updated. Cheers!
I've cleaned up the tutorial document a bit, editorial corrections, clarified things here and there and added a new section on array dimensioning conventions. The file for download in the entry that started this thread has been updated. Cheers!

#3
Posted 06/26/2008 11:30 PM   
excellent tutorial, but I have to disagree on your motivation, I guess you underestimate the importance of MPI here. As long as our are on a single machine, you're right )replace MPI with OpenMP or hand-crafted pthresds) . But there is no way of using CUDA instead of MPI for distributed memory clusters.
excellent tutorial, but I have to disagree on your motivation, I guess you underestimate the importance of MPI here. As long as our are on a single machine, you're right )replace MPI with OpenMP or hand-crafted pthresds) . But there is no way of using CUDA instead of MPI for distributed memory clusters.

#4
Posted 06/26/2008 11:57 PM   
[quote name='Dominik Göddeke' date='Jun 26 2008, 04:57 PM']excellent tutorial, but I have to disagree on your motivation, I guess you underestimate the importance of MPI here. As long as our are on a single machine, you're right )replace MPI with OpenMP or hand-crafted pthresds) . But there is no way of using CUDA instead of MPI for distributed memory clusters.
[right][snapback]400811[/snapback][/right]
[/quote]

Thanks for the reply. I may not have written that paragraph quite right, but I think we are in agreement. One needs to use the proper tool for the job - in many ways CUDA and MPI are complementary, yin and yang, etc. That was what I intended to say.
[quote name='Dominik Göddeke' date='Jun 26 2008, 04:57 PM']excellent tutorial, but I have to disagree on your motivation, I guess you underestimate the importance of MPI here. As long as our are on a single machine, you're right )replace MPI with OpenMP or hand-crafted pthresds) . But there is no way of using CUDA instead of MPI for distributed memory clusters.

[snapback]400811[/snapback]






Thanks for the reply. I may not have written that paragraph quite right, but I think we are in agreement. One needs to use the proper tool for the job - in many ways CUDA and MPI are complementary, yin and yang, etc. That was what I intended to say.

#5
Posted 06/27/2008 12:23 AM   
Great tutorial!

I just wanted to point out that we are in the process of building a full CUDA engine for MATLAB programs (named Jacket) that may be of interest to people who read this thread. We just launched a free beta release that you can grab at:

[url="http://www.accelereyes.com"]http://www.accelereyes.com[/url]

We would love to hear your thoughts regarding Jacket and insights on how it can be improved to make MATLAB GPU Computing as beneficial to the community as possible.

Best,

John Melonakos
Great tutorial!



I just wanted to point out that we are in the process of building a full CUDA engine for MATLAB programs (named Jacket) that may be of interest to people who read this thread. We just launched a free beta release that you can grab at:



http://www.accelereyes.com



We would love to hear your thoughts regarding Jacket and insights on how it can be improved to make MATLAB GPU Computing as beneficial to the community as possible.



Best,



John Melonakos

John Melonakos ([email="john.melonakos@accelereyes.com"]john.melonakos@accelereyes.com[/email])

#6
Posted 07/06/2008 12:25 AM   
I'll add here a few notes that I may include in the next version of the tutorial (call them a "patch" to the tutorial for now, if you like, or a TO DO list):

-> On the other mail list, I inquired about using cudaMallocHost in a mex file to speed up the host-device communications. This is to use "pinned memory". Matlab is fairly memory sensitive, so it is not generally possible to allocate memory this way. I suspect, however, that one could actually sometimes get away with using cudaMallocHost() in a mex file - it is a matter of avoiding any calls to matlab. So one could cudaMallocHost(), compute away, and then clear out the CUDA variables before Matlab is aware of what is going on and complain/crash (its the "don't ask, don't tell" memory policy).

-> I was puzzled over why my "surf" command wasn't working. Well, it would work, but nothing would appear in the figure. It develops that surf does not work on single precision data! (at least for me) If A is single precision, then one needs to do "surf(X,Y,double(A))" to see the data. I think this qualifies as a bug in Matlab, if you ask me.

-> C and CUDA are "row major" in how data are organized in arrays, Matlab and cublas are "column major" (nomenclature).

-> I think I need to say something about the grids/blocks/threads/warps and kernel efficiency, but I don't think I altogether understand those quite yet...as integral to CUDA as they are...
I'll add here a few notes that I may include in the next version of the tutorial (call them a "patch" to the tutorial for now, if you like, or a TO DO list):



-> On the other mail list, I inquired about using cudaMallocHost in a mex file to speed up the host-device communications. This is to use "pinned memory". Matlab is fairly memory sensitive, so it is not generally possible to allocate memory this way. I suspect, however, that one could actually sometimes get away with using cudaMallocHost() in a mex file - it is a matter of avoiding any calls to matlab. So one could cudaMallocHost(), compute away, and then clear out the CUDA variables before Matlab is aware of what is going on and complain/crash (its the "don't ask, don't tell" memory policy).



-> I was puzzled over why my "surf" command wasn't working. Well, it would work, but nothing would appear in the figure. It develops that surf does not work on single precision data! (at least for me) If A is single precision, then one needs to do "surf(X,Y,double(A))" to see the data. I think this qualifies as a bug in Matlab, if you ask me.



-> C and CUDA are "row major" in how data are organized in arrays, Matlab and cublas are "column major" (nomenclature).



-> I think I need to say something about the grids/blocks/threads/warps and kernel efficiency, but I don't think I altogether understand those quite yet...as integral to CUDA as they are...

#7
Posted 07/09/2008 01:10 AM   
Hi
when I try to compile nvmex example, I get this error:
[quote]System error: Can't locate [b]mexutils.pm[/b] in @INC (@INC contains:
D:/MATLAB701/sys/perl/win32/lib D:/MATLAB701/sys/perl/win32/site/lib . D:\MATLAB~1\bin\ D:\MATLAB701\bin\win32) at d:\MATLAB701\bin\nvmex.pl line 165.
BEGIN failed--compilation aborted at d:\MATLAB701\bin\nvmex.pl line 165.
Command executed: perl d:\MATLAB701\bin\nvmex.pl -f nvmexopts.bat Szeta.cu -IC:\cuda\include -LC:\cuda\lib
[/quote]

Where can I find this file?
Thanks
Hi

when I try to compile nvmex example, I get this error:

System error: Can't locate mexutils.pm in @INC (@INC contains:

D:/MATLAB701/sys/perl/win32/lib D:/MATLAB701/sys/perl/win32/site/lib . D:\MATLAB~1\bin\ D:\MATLAB701\bin\win32) at d:\MATLAB701\bin\nvmex.pl line 165.

BEGIN failed--compilation aborted at d:\MATLAB701\bin\nvmex.pl line 165.

Command executed: perl d:\MATLAB701\bin\nvmex.pl -f nvmexopts.bat Szeta.cu -IC:\cuda\include -LC:\cuda\lib





Where can I find this file?

Thanks

#8
Posted 07/17/2008 09:03 AM   
[quote name='amirk' date='Jul 17 2008, 01:03 AM']Hi
when I try to compile nvmex example, I get this error:
Where can I find this file?
Thanks
[right][snapback]412001[/snapback][/right]
[/quote]

I have exactly the same problem! /argh.gif' class='bbc_emoticon' alt=':argh:' />
[quote name='amirk' date='Jul 17 2008, 01:03 AM']Hi

when I try to compile nvmex example, I get this error:

Where can I find this file?

Thanks

[snapback]412001[/snapback]






I have exactly the same problem! /argh.gif' class='bbc_emoticon' alt=':argh:' />

#9
Posted 08/09/2008 10:56 PM   
mexutils.pm ...

I don't know if this is any help, but such a file does not exist on my linux machine. I also do not have an nvmex.pl; rather, I have nvmex a bash script (rather than this perl script). mexutils.pm is a perl module, I believe. I suspect you will have to look to matlab/mathworks for some sort of perl package on windows? (I presume you've searched your machine for this file already, so it is not a matter of having the right search path.)

----------------------------
This link any help?:

[url="http://www.mathworks.com/support/solutions/data/1-1TNK6Y.html?solution=1-1TNK6Y"]http://www.mathworks.com/support/solutions...lution=1-1TNK6Y[/url]
mexutils.pm ...



I don't know if this is any help, but such a file does not exist on my linux machine. I also do not have an nvmex.pl; rather, I have nvmex a bash script (rather than this perl script). mexutils.pm is a perl module, I believe. I suspect you will have to look to matlab/mathworks for some sort of perl package on windows? (I presume you've searched your machine for this file already, so it is not a matter of having the right search path.)



----------------------------

This link any help?:



http://www.mathworks.com/support/solutions...lution=1-1TNK6Y

#10
Posted 08/10/2008 07:13 PM   
[quote name='Boxed Cylon' date='Aug 10 2008, 11:13 AM']mexutils.pm ...

I don't know if this is any help, but such a file does not exist on my linux machine.  I also do not have an nvmex.pl; rather, I have nvmex a bash script (rather than this perl script).  mexutils.pm is a perl module, I believe.  I suspect you will have to look to matlab/mathworks for some sort of perl package on windows?  (I presume you've searched your machine for this file already, so it is not a matter of having the right search path.)

----------------------------
This link any help?:

[url="http://www.mathworks.com/support/solutions/data/1-1TNK6Y.html?solution=1-1TNK6Y"]http://www.mathworks.com/support/solutions...lution=1-1TNK6Y[/url]
[right][snapback]423525[/snapback][/right]
[/quote]

Its fine, thank you. I solved this bug, mexutils.pm is a Perl module supplied by Matlab. Apparently my instalation was corrupted and rather old. Upgrading it got the mexutils.pm file back in matlab/bin. My bad, sorry.
[quote name='Boxed Cylon' date='Aug 10 2008, 11:13 AM']mexutils.pm ...



I don't know if this is any help, but such a file does not exist on my linux machine.  I also do not have an nvmex.pl; rather, I have nvmex a bash script (rather than this perl script).  mexutils.pm is a perl module, I believe.  I suspect you will have to look to matlab/mathworks for some sort of perl package on windows?  (I presume you've searched your machine for this file already, so it is not a matter of having the right search path.)



----------------------------

This link any help?:



http://www.mathworks.com/support/solutions...lution=1-1TNK6Y

[snapback]423525[/snapback]






Its fine, thank you. I solved this bug, mexutils.pm is a Perl module supplied by Matlab. Apparently my instalation was corrupted and rather old. Upgrading it got the mexutils.pm file back in matlab/bin. My bad, sorry.

#11
Posted 08/10/2008 09:41 PM   
I've developed my small HOWTO a little bit according to the notes above. The main addition is the discussion of multiple processors, warps and threads - I am still a little uncertain about those topics, so I'd be happy to hear of any corrections to misperceptions/poor discussion in the document. I aim for clarity above all things...

I've added the link to the latest version above in the first entry of this thread, but here it is also:

[url="http://forums.nvidia.com/index.php?act=Attach&type=post&id=9257"]http://forums.nvidia.com/index.php?act=Att...pe=post&id=9257[/url]

Other than to make any corrections that anyone might suggest, I think I am done with this document for the foreseeable future. I hope people find it useful.
I've developed my small HOWTO a little bit according to the notes above. The main addition is the discussion of multiple processors, warps and threads - I am still a little uncertain about those topics, so I'd be happy to hear of any corrections to misperceptions/poor discussion in the document. I aim for clarity above all things...



I've added the link to the latest version above in the first entry of this thread, but here it is also:



http://forums.nvidia.com/index.php?act=Att...pe=post&id=9257



Other than to make any corrections that anyone might suggest, I think I am done with this document for the foreseeable future. I hope people find it useful.

#12
Posted 08/20/2008 06:40 AM   
Thanks for your work !
It seems to be very usefull for a CUDA-Noob like me !
Thanks for your work !

It seems to be very usefull for a CUDA-Noob like me !

#13
Posted 01/23/2009 04:58 PM   
It's not sticky yet ? :D
It's not sticky yet ? :D

#14
Posted 07/16/2009 03:38 PM   
I am bumping this thread to say that I've updated this tutorial somewhat:

* Mention of Fermi and Tesla
* Mention of CULA & MAGMA
* Mention of AccelerEyes and GP-You
* Memory management discussion
* Profiler no longer a separate download/install
* Some reorganization and update of CUDA distribution file names

[url="http://faculty.washington.edu/dushaw/epubs/Matlab_CUDA_Tutorial_2_10.pdf"]http://faculty.washington.edu/dushaw/epub...torial_2_10.pdf[/url] (also listed at the top of the thread)

As always, happy to hear of suggestions of things to correct, add, or develop (insofar as I can).
I am bumping this thread to say that I've updated this tutorial somewhat:



* Mention of Fermi and Tesla

* Mention of CULA & MAGMA

* Mention of AccelerEyes and GP-You

* Memory management discussion

* Profiler no longer a separate download/install

* Some reorganization and update of CUDA distribution file names



http://faculty.washington.edu/dushaw/epub...torial_2_10.pdf (also listed at the top of the thread)



As always, happy to hear of suggestions of things to correct, add, or develop (insofar as I can).

#15
Posted 02/12/2010 02:09 AM   
  1 / 2    
Scroll To Top