Getting started with parallel programming Suggested reading
One question that I've been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I'm posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn't have to specific to CUDA.

[list]
[*] From HPC course at UNC which also covers CUDA. [url="http://www.cs.unc.edu/~prins/Classes/633/"]http://www.cs.unc.edu/~prins/Classes/633/[/url]
[*] From Mark Harris: It’s not a textbook, but I always recommend these course notes on PRAM algorithms (the CRCW PRAM model maps very closely to CUDA, especially within a thread block using shared memory) by Sid Chatterjee & Jan Prins. They are concise and provide good examples for reductions, scan, Brent’s Theorem, etc. [url="http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf"]http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf[/url]
[*] He also requires reading from Kumar et al. Introduction to Parallel Computing: Design and Analysis of Algorithms.
[*] Designing and Building Parallel Programs, I. Foster, Addison-Wesley, 1995. [url="http://www-unix.mcs.anl.gov/dbpp/"]http://www-unix.mcs.anl.gov/dbpp/[/url]
[*] IBM’s redbook “RS/6000 SP: Practical MPI Programming” is very famous for MPI users. It has rich contents about parallel approach even though they publish 10 years ago. [url="http://www.redbooks.ibm.com/abstracts/sg245380.html"]http://www.redbooks.ibm.com/abstracts/sg245380.html[/url]
[*] Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (Author) is good for beginners.
[*] Parallel Programming with MPI by Peter Pacheco
[*] Parallel and Distributed Computation: Numerical Methods (Optimization and Neural Computation) by Dimitri P. Bertsekas
[*] Multiple people suggest: The Art of Multiprocessor Programming by Maurice Herlihy (Author), Nir Shavit (Author) [url="http://www.amazon.com/Art-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916"]http://www.amazon.com/Art-Multiprocessor-P...y/dp/0123705916[/url]
[*] Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) by Barbara Chapman (Author), Gabriele Jost (Author), Ruud van der Pas (Author), David J. Kuck (Foreword)
[*] Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation) by William Gropp (Author), Ewing Lusk (Author), Anthony Skjellum (Author)
[*] From David Kirk: We use Tim Mattson's book as a companion to the CUDA material.
[*] David Kirk & Wen Mei Hwu's CUDA text book is available on the course website. That is one rev out of date, but pretty close. [url="http://courses.ece.illinois.edu/ece498/al/"]http://courses.ece.illinois.edu/ece498/al/[/url]
[*] From Paulius Micikevicius: My personal favorite book on parallel algorithms is "Introduction to Parallel Computing" by Grama et al. It covers basic interconnect topologies, algorithms, analysis, MPI and OpenMP. [url="http://www.amazon.com/Introduction-Parallel-Computing-Ananth-Grama/dp/0201648652"]http://www.amazon.com/Introduction-Paralle...a/dp/0201648652[/url]
[*] If one is leaning slightly more towards the theoretical side of parallel algorithms, then "Introduction to Parallel Algorithms" by Joseph Jaja is a good source. Contains a more thorough treatment of algorithms based on prefix sums (things like various tree and graph algorithms).
[*] If one wants to go completely to the theoretical side (P-completeness, etc.), then "Limits to Parallel Computation: P-Completeness Theory" by Ray Greenlaw is an excellent book. It's certainly not applicable to introductory courses, in the same way that NP-completeness isn't applicable to introductory algorithms courses.
[/list]
One question that I've been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I'm posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn't have to specific to CUDA.




  • From HPC course at UNC which also covers CUDA. http://www.cs.unc.edu/~prins/Classes/633/
  • From Mark Harris: It’s not a textbook, but I always recommend these course notes on PRAM algorithms (the CRCW PRAM model maps very closely to CUDA, especially within a thread block using shared memory) by Sid Chatterjee & Jan Prins. They are concise and provide good examples for reductions, scan, Brent’s Theorem, etc. http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf
  • He also requires reading from Kumar et al. Introduction to Parallel Computing: Design and Analysis of Algorithms.
  • Designing and Building Parallel Programs, I. Foster, Addison-Wesley, 1995. http://www-unix.mcs.anl.gov/dbpp/
  • IBM’s redbook “RS/6000 SP: Practical MPI Programming” is very famous for MPI users. It has rich contents about parallel approach even though they publish 10 years ago. http://www.redbooks.ibm.com/abstracts/sg245380.html
  • Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (Author) is good for beginners.
  • Parallel Programming with MPI by Peter Pacheco
  • Parallel and Distributed Computation: Numerical Methods (Optimization and Neural Computation) by Dimitri P. Bertsekas
  • Multiple people suggest: The Art of Multiprocessor Programming by Maurice Herlihy (Author), Nir Shavit (Author) http://www.amazon.com/Art-Multiprocessor-P...y/dp/0123705916
  • Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) by Barbara Chapman (Author), Gabriele Jost (Author), Ruud van der Pas (Author), David J. Kuck (Foreword)
  • Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation) by William Gropp (Author), Ewing Lusk (Author), Anthony Skjellum (Author)
  • From David Kirk: We use Tim Mattson's book as a companion to the CUDA material.
  • David Kirk & Wen Mei Hwu's CUDA text book is available on the course website. That is one rev out of date, but pretty close. http://courses.ece.illinois.edu/ece498/al/
  • From Paulius Micikevicius: My personal favorite book on parallel algorithms is "Introduction to Parallel Computing" by Grama et al. It covers basic interconnect topologies, algorithms, analysis, MPI and OpenMP. http://www.amazon.com/Introduction-Paralle...a/dp/0201648652
  • If one is leaning slightly more towards the theoretical side of parallel algorithms, then "Introduction to Parallel Algorithms" by Joseph Jaja is a good source. Contains a more thorough treatment of algorithms based on prefix sums (things like various tree and graph algorithms).
  • If one wants to go completely to the theoretical side (P-completeness, etc.), then "Limits to Parallel Computation: P-Completeness Theory" by Ray Greenlaw is an excellent book. It's certainly not applicable to introductory courses, in the same way that NP-completeness isn't applicable to introductory algorithms courses.

#1
Posted 09/24/2009 04:41 PM   
[quote name='jocohen' post='591697' date='Sep 24 2009, 12:41 PM']One question that I've been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I'm posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn't have to specific to CUDA.

[list]
[*] From HPC course at UNC which also covers CUDA. [url="http://www.cs.unc.edu/~prins/Classes/633/"]http://www.cs.unc.edu/~prins/Classes/633/[/url]
[*] From Mark Harris: It’s not a textbook, but I always recommend these course notes on PRAM algorithms (the CRCW PRAM model maps very closely to CUDA, especially within a thread block using shared memory) by Sid Chatterjee & Jan Prins. They are concise and provide good examples for reductions, scan, Brent’s Theorem, etc. [url="http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf"]http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf[/url]
[*] He also requires reading from Kumar et al. Introduction to Parallel Computing: Design and Analysis of Algorithms.
[*] Designing and Building Parallel Programs, I. Foster, Addison-Wesley, 1995. [url="http://www-unix.mcs.anl.gov/dbpp/"]http://www-unix.mcs.anl.gov/dbpp/[/url]
[*] IBM’s redbook “RS/6000 SP: Practical MPI Programming” is very famous for MPI users. It has rich contents about parallel approach even though they publish 10 years ago. [url="http://www.redbooks.ibm.com/abstracts/sg245380.html"]http://www.redbooks.ibm.com/abstracts/sg245380.html[/url]
[*] Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (Author) is good for beginners.
[*] Parallel Programming with MPI by Peter Pacheco
[*] Parallel and Distributed Computation: Numerical Methods (Optimization and Neural Computation) by Dimitri P. Bertsekas
[*] Multiple people suggest: The Art of Multiprocessor Programming by Maurice Herlihy (Author), Nir Shavit (Author) [url="http://www.amazon.com/Art-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916"]http://www.amazon.com/Art-Multiprocessor-P...y/dp/0123705916[/url]
[*] Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) by Barbara Chapman (Author), Gabriele Jost (Author), Ruud van der Pas (Author), David J. Kuck (Foreword)
[*] Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation) by William Gropp (Author), Ewing Lusk (Author), Anthony Skjellum (Author)
[*] From David Kirk: We use Tim Mattson's book as a companion to the CUDA material.
[*] David Kirk & Wen Mei Hwu's CUDA text book is available on the course website. That is one rev out of date, but pretty close. [url="http://courses.ece.illinois.edu/ece498/al/"]http://courses.ece.illinois.edu/ece498/al/[/url]
[*] From Paulius Micikevicius: My personal favorite book on parallel algorithms is "Introduction to Parallel Computing" by Grama et al. It covers basic interconnect topologies, algorithms, analysis, MPI and OpenMP. [url="http://www.amazon.com/Introduction-Parallel-Computing-Ananth-Grama/dp/0201648652"]http://www.amazon.com/Introduction-Paralle...a/dp/0201648652[/url]
[*] If one is leaning slightly more towards the theoretical side of parallel algorithms, then "Introduction to Parallel Algorithms" by Joseph Jaja is a good source. Contains a more thorough treatment of algorithms based on prefix sums (things like various tree and graph algorithms).
[*] If one wants to go completely to the theoretical side (P-completeness, etc.), then "Limits to Parallel Computation: P-Completeness Theory" by Ray Greenlaw is an excellent book. It's certainly not applicable to introductory courses, in the same way that NP-completeness isn't applicable to introductory algorithms courses.
[/list][/quote]

This may be the same book referred to as "Tim Mattson's book" above. The title is "Patterns for Parallel Programming" by Mattson, Sanders & Massingill. Addison-Wesley.
[quote name='jocohen' post='591697' date='Sep 24 2009, 12:41 PM']One question that I've been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I'm posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn't have to specific to CUDA.




  • From HPC course at UNC which also covers CUDA. http://www.cs.unc.edu/~prins/Classes/633/
  • From Mark Harris: It’s not a textbook, but I always recommend these course notes on PRAM algorithms (the CRCW PRAM model maps very closely to CUDA, especially within a thread block using shared memory) by Sid Chatterjee & Jan Prins. They are concise and provide good examples for reductions, scan, Brent’s Theorem, etc. http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf
  • He also requires reading from Kumar et al. Introduction to Parallel Computing: Design and Analysis of Algorithms.
  • Designing and Building Parallel Programs, I. Foster, Addison-Wesley, 1995. http://www-unix.mcs.anl.gov/dbpp/
  • IBM’s redbook “RS/6000 SP: Practical MPI Programming” is very famous for MPI users. It has rich contents about parallel approach even though they publish 10 years ago. http://www.redbooks.ibm.com/abstracts/sg245380.html
  • Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (Author) is good for beginners.
  • Parallel Programming with MPI by Peter Pacheco
  • Parallel and Distributed Computation: Numerical Methods (Optimization and Neural Computation) by Dimitri P. Bertsekas
  • Multiple people suggest: The Art of Multiprocessor Programming by Maurice Herlihy (Author), Nir Shavit (Author) http://www.amazon.com/Art-Multiprocessor-P...y/dp/0123705916
  • Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) by Barbara Chapman (Author), Gabriele Jost (Author), Ruud van der Pas (Author), David J. Kuck (Foreword)
  • Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation) by William Gropp (Author), Ewing Lusk (Author), Anthony Skjellum (Author)
  • From David Kirk: We use Tim Mattson's book as a companion to the CUDA material.
  • David Kirk & Wen Mei Hwu's CUDA text book is available on the course website. That is one rev out of date, but pretty close. http://courses.ece.illinois.edu/ece498/al/
  • From Paulius Micikevicius: My personal favorite book on parallel algorithms is "Introduction to Parallel Computing" by Grama et al. It covers basic interconnect topologies, algorithms, analysis, MPI and OpenMP. http://www.amazon.com/Introduction-Paralle...a/dp/0201648652
  • If one is leaning slightly more towards the theoretical side of parallel algorithms, then "Introduction to Parallel Algorithms" by Joseph Jaja is a good source. Contains a more thorough treatment of algorithms based on prefix sums (things like various tree and graph algorithms).
  • If one wants to go completely to the theoretical side (P-completeness, etc.), then "Limits to Parallel Computation: P-Completeness Theory" by Ray Greenlaw is an excellent book. It's certainly not applicable to introductory courses, in the same way that NP-completeness isn't applicable to introductory algorithms courses.




This may be the same book referred to as "Tim Mattson's book" above. The title is "Patterns for Parallel Programming" by Mattson, Sanders & Massingill. Addison-Wesley.

#2
Posted 09/24/2009 05:47 PM   
[quote name='jocohen' post='591697' date='Sep 24 2009, 07:41 PM']One question that I've been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I'm posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn't have to specific to CUDA.[/quote]
Thanks for the list - I do have a question though. What does nVidia suggest to do with a serial code that
needs to be ported to the GPU. I have two such kernels that were ported to the GPU, one successfully and the
other i got ~x4 factor (which is not enough).

Such code would look like this:
[code]for ( int iSample = 0; iSample < 1000; iSample++ )
{
for ( int i = -val; i < val; i++ )
{
pRes[ iSample + i ] += someValue * i; (**)
}
}[/code]
to make my life harder the line marked with (**) might also look like this:
[code] pRes[ ( rand() % 1000 ) + i ] += someValue * i;[/code]

This is real production code, the main reason for the code being so "nice and user friendly"
is because the algorithm tries to do some sort of averaging.

Hey - I didnt write the algorithm... some mad scientist wrote it... ;)

thanks
eyal
[quote name='jocohen' post='591697' date='Sep 24 2009, 07:41 PM']One question that I've been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I'm posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn't have to specific to CUDA.

Thanks for the list - I do have a question though. What does nVidia suggest to do with a serial code that

needs to be ported to the GPU. I have two such kernels that were ported to the GPU, one successfully and the

other i got ~x4 factor (which is not enough).



Such code would look like this:

for ( int iSample = 0; iSample < 1000; iSample++ )

{

for ( int i = -val; i < val; i++ )

{

pRes[ iSample + i ] += someValue * i; (**)

}

}


to make my life harder the line marked with (**) might also look like this:

pRes[ ( rand() % 1000 ) + i ] += someValue * i;




This is real production code, the main reason for the code being so "nice and user friendly"

is because the algorithm tries to do some sort of averaging.



Hey - I didnt write the algorithm... some mad scientist wrote it... ;)



thanks

eyal

#3
Posted 09/24/2009 07:51 PM   
This link is good for beginners... (I myself referred to it when I ventured onto parallel programming)
[url="https://computing.llnl.gov/tutorials/parallel_comp/"]https://computing.llnl.gov/tutorials/parallel_comp/[/url]
This link is good for beginners... (I myself referred to it when I ventured onto parallel programming)

https://computing.llnl.gov/tutorials/parallel_comp/

Think... Code... Think... Code... at the end break your head in debugging the code :)

#4
Posted 09/25/2009 06:39 AM   
This list would be useful to sticky.
This list would be useful to sticky.

#5
Posted 09/25/2009 06:33 PM   
[quote name='jgoffeney' post='592188' date='Sep 25 2009, 11:33 AM']This list would be useful to sticky.[/quote]
I agree!
[quote name='jgoffeney' post='592188' date='Sep 25 2009, 11:33 AM']This list would be useful to sticky.

I agree!

#6
Posted 09/25/2009 06:39 PM   
[code]for ( int iSample = 0; iSample < 1000; iSample++ )
{
for ( int i = -val; i < val; i++ )
{
pRes[ iSample + i ] += someValue * i; (**)
}
}[/code]

You can always find parallelism in "2*val" interval.... and then do a sliding window...


[code] pRes[ ( rand() % 1000 ) + i ] += someValue * i;[/code]

Do the same here... Any result deviated by race condition can be explained as equivalent to another sequential algorithm that ran with a different random seed....
for ( int iSample = 0; iSample < 1000; iSample++ )

{

for ( int i = -val; i < val; i++ )

{

pRes[ iSample + i ] += someValue * i; (**)

}

}




You can always find parallelism in "2*val" interval.... and then do a sliding window...





pRes[ ( rand() % 1000 ) + i ] += someValue * i;




Do the same here... Any result deviated by race condition can be explained as equivalent to another sequential algorithm that ran with a different random seed....

Ignorance Rules; Knowledge Liberates!

#7
Posted 02/12/2010 10:20 AM   
Scroll To Top