One question that I’ve been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I’m posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn’t have to specific to CUDA.
[*] From HPC course at UNC which also covers CUDA. http://www.cs.unc.edu/~prins/Classes/633/
[*] From Mark Harris: It’s not a textbook, but I always recommend these course notes on PRAM algorithms (the CRCW PRAM model maps very closely to CUDA, especially within a thread block using shared memory) by Sid Chatterjee & Jan Prins. They are concise and provide good examples for reductions, scan, Brent’s Theorem, etc. http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf
[*] He also requires reading from Kumar et al. Introduction to Parallel Computing: Design and Analysis of Algorithms.
[*] Designing and Building Parallel Programs, I. Foster, Addison-Wesley, 1995. http://www-unix.mcs.anl.gov/dbpp/
[*] IBM’s redbook “RS/6000 SP: Practical MPI Programming†is very famous for MPI users. It has rich contents about parallel approach even though they publish 10 years ago. http://www.redbooks.ibm.com/abstracts/sg245380.html
[*] Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (Author) is good for beginners.
[*] Parallel Programming with MPI by Peter Pacheco
[*] Parallel and Distributed Computation: Numerical Methods (Optimization and Neural Computation) by Dimitri P. Bertsekas
[*] Multiple people suggest: The Art of Multiprocessor Programming by Maurice Herlihy (Author), Nir Shavit (Author) http://www.amazon.com/Art-Multiprocessor-P…y/dp/0123705916
[*] Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) by Barbara Chapman (Author), Gabriele Jost (Author), Ruud van der Pas (Author), David J. Kuck (Foreword)
[*] Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation) by William Gropp (Author), Ewing Lusk (Author), Anthony Skjellum (Author)
[*] From David Kirk: We use Tim Mattson’s book as a companion to the CUDA material.
[*] David Kirk & Wen Mei Hwu’s CUDA text book is available on the course website. That is one rev out of date, but pretty close. http://courses.ece.illinois.edu/ece498/al/
[*] From Paulius Micikevicius: My personal favorite book on parallel algorithms is “Introduction to Parallel Computing” by Grama et al. It covers basic interconnect topologies, algorithms, analysis, MPI and OpenMP. http://www.amazon.com/Introduction-Paralle…a/dp/0201648652
[*] If one is leaning slightly more towards the theoretical side of parallel algorithms, then “Introduction to Parallel Algorithms” by Joseph Jaja is a good source. Contains a more thorough treatment of algorithms based on prefix sums (things like various tree and graph algorithms).
[*] If one wants to go completely to the theoretical side (P-completeness, etc.), then “Limits to Parallel Computation: P-Completeness Theory” by Ray Greenlaw is an excellent book. It’s certainly not applicable to introductory courses, in the same way that NP-completeness isn’t applicable to introductory algorithms courses.