mpi-histo

A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems. The program is written in C and uses SPMD architecture to fork processes and compute histogram. There are two implementations:

histo.c => Using point to point communication MPI_Send, MPI_Recv to communicate between processes
histo.c => Using collective communication MPI_Broadcast, MPI_Scatter and MPI_Gather to communicate between processes

Compilation and Execution

The programs are compiled using mpicc and run using mpiexec. The compilation and execution instructions are shown below:

Compilation: mpicc mpi_histo.c -o <executable_name>

Execution: mpiexec -n <no_processes> <executable_name> <bin_count> <data_min> <data_max> <data_count>

Taken from command line arguments:

no_processes  : Total number of processes to start.
bin_count     : Total number of bin_count for histogram.
data_min      : Minimum value for data
data_max      : Maximum value for data
data_count    : Number of random data points to generate.

By default this program creates <data_count> number of random data points between data_min and data_max and computes the histogram for <bin_count> number of data items. The code can be modified according to scenarios or make it read a large data file and compute the histogram on real data.

Example:
Compilation: mpicc mpi_histo.c -o mpi_histo
Execution: mpiexec -n 2 ./mpi_histo 10 1 100 10

Performance Analysis

The histogram compuation was done for 50000000, 100000000 and 200000000 data points using 1 to 64 cores on a 4 node CHPC kingspeak cluster. The results obtained are described here.

1. Timing Performance

Total Execution time: It is the time taken by the program to complete the execution.
Application Time: This includes the time taken by the process to create and distribute data and compute histogram.
Histogram time: This is the time taken by one process to compute histogram on scattered data.

From the figures, for a serial program (process = 1) as the number of data elements increases the time taken to compute the histogram also increases. But as the number of processes is increased the total time to compute the histogram decreases as the work is divided among the processes. We can also see that the total time and the application time is greater than the time taken to compute the histogram. The reason behind this is, total execution time and application time also includes the time taken to scatter the data among the processes whereas, histogram time only accounts the maximum time taken by a process to compute the histogram.

2. Speedup Analysis

The speedup of a parallel program is given as: S(n,p) = Tserial(n) / Tparallel(n, p). From the figures we can see that the speedup for total time of the program is less than application and histogram computation time for a program. Similar to timing reports, total execution time of a program includes the time taken by process 0 to populate and scatter the data. But in the case of histogram time the speedup is drastically increased and is nearly a linear speedup.

Note: As the number of processes increases it adds a communication overhead on the system thus reducing the execution time.

Timing data obtained for my execution obtained from 4 Node (kingspeak) CHPC cluster:

Processes	Total Time	Application Time	Histogram Time	Data Count
1	74.423788	73.756131	73.569383	50000000
4	19.308111	18.636843	18.439549	50000000
16	5.504118	4.754371	4.615857	50000000
32	3.926105	3.230799	2.302202	50000000
64	3.436093	2.481364	1.150515	50000000
1	149.605372	148.257896	147.896947	100000000
4	38.644788	37.317932	36.934477	100000000
16	10.867961	9.537738	9.233756	100000000
32	7.780161	6.458228	4.580765	100000000
64	6.301298	4.971914	2.294652	100000000
1	298.528679	295.804794	294.842392	200000000
4	77.303529	74.629462	73.983486	200000000
16	21.717287	19.049401	18.434128	200000000
32	15.902424	13.048625	9.19137	200000000
64	12.687857	9.89958	4.63593	200000000

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
img		img
README.md		README.md
histo.c		histo.c
mpi_histo.c		mpi_histo.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mpi-histo

Compilation and Execution

Performance Analysis

1. Timing Performance

2. Speedup Analysis

Timing data obtained for my execution obtained from 4 Node (kingspeak) CHPC cluster:

About

Releases

Packages

Languages

rajshrestha86/mpi-histo

Folders and files

Latest commit

History

Repository files navigation

mpi-histo

Compilation and Execution

Performance Analysis

1. Timing Performance

2. Speedup Analysis

Timing data obtained for my execution obtained from 4 Node (kingspeak) CHPC cluster:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages