For the task of the exercise, please refer to task/exercise2.md
📂 ex2B/
│
├── 📂 apps/
│ └── 📄 main.c
│
├── 📂 include/
│ └── 📄 hybrid_qsort.h
│
├── 📂 report/
│ └── 📝 FANTUZZI_ex2_report.pdf
│
├── 📂 scalability/
│ ├── 🌐 scalability_analysis.html
│ └── 🔎 scalability_analysis.Rmd
│
├── 📂 slides/
│ └── 📄 HPC_ex2_slides.pdf
│
├── 📂 source/
│ └── 📄 hybrid_qsort.c
│
├── 📂 timings/
│ ├── ⏳ getdata_omp.sh
│ ├── ⏳ getdata_strong.sh
│ ├── ⏳ getdata_weak.sh
│ ├── 📊 omp_L1_640M.csv
│ ├── 📊 omp_standard_640M.csv
│ ├── 📊 StrongScalability_160M_1-64.csv
│ ├── 📊 StrongScalability_160M_65-128.csv
│ ├── 📊 WeakScalability_160M_1-64.csv
│ └── 📊 WeakScalability_160M_65-128.csv
│
├── 📂 task/
│ └── 📄 exercise2.md
│
├── 🏗️ build.sh
│
├── 📝 CMakeLists.txt
│
└── 📰 README.md
I decided to provide a CMakeLists.txt
to facilitate the compilation of the C scripts.
To build on ORFEO, you simply need to run the command:
sbatch build.sh
If you want to build the files in your pc, do instead:
mkdir build && cd build
cmake ..
make
Note: this second way assumes that you've already installed MPI
and openMP
libraries on your pc!
If the build phase was successful (hopefully it should be 😅), you will find the executable apps/main.x
inside the folder apps/
Once built the executable, the first thing to do is setting the desired openMP options (since the code is hybrid, you should be interested in doing that!). For example, to specify the number of threads to use:
export OMP_NUM_THREADS=...
When completed this preliminary part, the code can be run with mpirun
with the following syntax:
mpirun -np P --map-by socket /path/to/main.x N
where:
- P is the number of MPI processes to use
- N is the overall size of the array to sort (if not specified, the default will be N=100000)
- --map-by socket is the optimal allocation type in relation to the algorithm's structure (but feel free to try other configurations)
Since the primary goal of the exercise was to benchmark the code in order to analyze its scalability (both weak and strong), I prepared some bash scripts to automatise the "timings-gathering" phase. You find them in the folder timings/
, and it should be clear what they do refer to.
As an example, let's take into consideration the script timings/getdata_strong.sh
(same considerations apply for the other scripts). It was meant to get data about strong scalability, varying the number of processes from 1 to 128. It can be lauched with the sbatch
command, followed by the name of the csv which will be created (and filled with numbers!)
sbatch getdata_strong.sh yourcsv_name.csv
Given the time limitations on the ORFEO cluster, I needed to run it twice (one loop from 1 to 64 processes and another from 65 to 128). The script is now set to perform the whole loop, but feel free to change it according to your preferences!
Note: the script timings/getdata_omp.sh
was used to compare the standard omp version with the "L1-optimized" one. If you try to run it, you will get timings about the "L1-optimized" version. If you want to get timings with the original version, you will need to change a bit the source code an then re-build the executable. More precisely, once open the file source/hybrid_qsort.c
, you just need to uncomment lines 208 and 391 and comment the lines 207 and 390.