layout | title |
---|---|
page |
Stanford CME 213/ME 339 Spring 2021 homepage |
Introduction to parallel computing using MPI, openMP, and CUDA
This is the website for CME 213 Introduction to parallel computing using MPI, openMP, and CUDA. This material was created by Eric Darve, with the help of course staff and students.
Extensions can be requested in advance for exceptional circumstances (e.g., travel, sickness, injury, COVID-related issues) and for OAE-approved accommodations.
Submissions after the deadline and late by at most two days (+48 hours after the deadline) will be accepted with a 10% penalty. No submissions will be accepted two days after the deadline.
See Gradescope for all the current assignments and their due dates. Post on Slack if you cannot access the Gradescope class page. The 6-letter code to join the class is given on Canvas.
Datasheet on the Quadro RTX 6000
Final project instructions and starter code:
Slides and videos explaining the final project:
- Overview of the final project; [Slides](Lecture Slides/Lecture_14.pdf)
- 33 Final Project 1, Overview; Video
- 34 Final Project 2, Regularization; Video
- 35 Final Project 3, CUDA GEMM and MPI; Video
See also the Module 8 videos on MPI.
CME 213 First Live Lecture; Video, [Slides](Lecture Slides/Lecture_01.pdf)
- [Tutorial slides](Lecture Slides/cpp tutorial/Tutorial_01.pdf)
- [Tutorial code](Lecture Slides/cpp tutorial/code.zip)
- [Slides](Lecture Slides/Lecture_02.pdf)
- 01 Homework 1; Video
- 02 Why Parallel Computing; Video
- 03 Top 500; Video
- 04 Example of Parallel Computation; Video
- 05 Shared memory processor; Video
- [Reading assignment 1](Reading Assignments/Introduction_Parallel_Computing)
- Homework 1; starter code
- C++ threads; [Slides](Lecture Slides/Lecture_03.pdf); Code
- Introduction to OpenMP; [Slides](Lecture Slides/Lecture_04.pdf); Code
- 06 C++ threads; Video
- 07 Promise and future; Video
- 08 mutex; Video
- 09 Introduction to OpenMP; Video
- 10 OpenMP Hello World; Video
- 11 OpenMP for loop; Video
- 12 OpenMP clause; Video
- [Reading assignment 2](Reading Assignments/OpenMP)
- OpenMP, for loops, advanced OpenMP; [Slides](Lecture Slides/Lecture_05.pdf); Code
- OpenMP, sorting algorithms; [Slides](Lecture Slides/Lecture_06.pdf); Code
- 13 OpenMP tasks; Video
- 14 OpenMP depend; Video
- 15 OpenMP synchronization; Video
- 16 Sorting algorithms Quicksort Mergesort; Video
- 17 Sorting Algorithms Bitonic Sort; Video
- 18 Bitonic Sort Exercise; Video
- [Reading assignment 3](Reading Assignments/OpenMP_advanced)
- Homework 2; starter code; radix sort tutorial
- Introduction to GPU computing; [Slides](Lecture Slides/Lecture_07.pdf)
- Introduction to CUDA and
nvcc
; [Slides](Lecture Slides/Lecture_08.pdf); Code - 19 GPU computing introduction; Video
- 20 Graphics Processing Units; Video
- 21 Introduction to GPU programming; Video
- 22 icme-gpu; Video
- 23 a First CUDA program; Video
- 23 b First CUDA program part 2; Video
- 24 nvcc CUDA compiler; Video
- [Reading assignment 4](Reading Assignments/CUDA_intro)
- Homework 3; starter code
- GPU memory and matrix transpose; [Slides](Lecture Slides/Lecture_09.pdf); Code
- CUDA occupancy, branching, homework 4; [Slides](Lecture Slides/Lecture_10.pdf)
- 25 GPU memory; Video
- 26 Matrix transpose; Video
- 27 Latency, concurrency, and occupancy; Video
- 28 CUDA branching; Video
- 29 Homework 4; Video
- [Reading assignment 5](Reading Assignments/GPU_performance)
- Homework 4; starter code
- 30 NVIDIA guest lecture, openACC; Video; [Slides](Lecture Slides/CME213_2021_OpenACC.pdf)
- 31 NVIDIA guest lecture, CUDA optimization; Video; [Slides](Lecture Slides/CME213_2021_Optimization.pdf)
- [Reading assignment 6](Reading Assignments/NVIDIA_openACC_optimization)
- 32 NVIDIA guest lecture, CUDA profiling; Video; [Slides](Lecture Slides/CME213_2021_CUDA_Profiling.pdf)
- [Reading assignment 7](Reading Assignments/NVIDIA_CUDA_profiling)
The slides and videos below are needed for the final project.
- Introduction to MPI; [Slides](Lecture Slides/Lecture_16.pdf); Code
- 37 MPI Introduction; Video
- 38 MPI Hello World; Video
- 39 MPI Send Recv; Video
- 40 MPI Collective Communications; Video
Material for the May 17 group activity:
- generate_sequence.cpp
- 36 Instructions for Monday, May 17 group activity; Video; [Slides](Lecture Slides/Lecture_15.pdf)
- MPI Advanced Send and Recv; [Slides](Lecture Slides/Lecture_17.pdf); Code
- 41 MPI Process Mapping; Video
- 42 MPI Buffering; Video
- 43 MPI Send Recv Deadlocks; Video
- 44 MPI Non-blocking; Video
- 45 MPI Send Modes; Video
- Parallel efficiency and MPI communicators; [Slides](Lecture Slides/Lecture_18.pdf); Code
- 46 MPI Matrix-vector product 1D schemes; Video
- 47 MPI Matrix vector product 2D scheme; Video
- 48 Parallel Speed-up; Video
- 49 Isoefficiency; Video
- 50 MPI Communicators; Video
- [Reading assignment 8](Reading Assignments/MPI)
- Parallel Programming Models by Elliott Slaughter; [Slides](Lecture Slides/CME213_2021_Legion.pdf); Video
Lawrence Livermore National Lab Resources
- LLNL Tutorial and Training Materials
- LLNL Introduction to Parallel Computing tutorial
- LLNL POSIX threads programming
- LLNL openMP tutorial
- LLNL MPI tutorial
- LLNL Advanced MPI slides
- OpenMP LLNL guide
- OpenMP guide by Yliluoma
- OpenMP 5.0 Reference Guide
- OpenMP API Specification
- Tutorials
- CUDA Programming Guides and References
- CUDA C++ Programming Guide
- CUDA C++ Best Practices Guide
- CUDA occupancy calculator
- CUDA compiler driver NVCC
- OpenACC
- OpenACC Programming and Best Practices Guide
- OpenACC 2.7 API Reference Card
- Compilers that support OpenACC
- OpenACC Specification (Version 3.0)
Open MPI hwloc documentation
- Legion and Regent
- StarPU
- Charm++
- PaRSEC
- Chapel
- X10
- TaskTorrent and documentation