Comparision of OpenMP and CUDA-based, Monolithic and Levelwise Dynamic PageRank algorithms.
This experiment (cuda-on-fixed) was for comparing performance between:
- Find static pagerank of updated graph using nvGraph.
- Find dynamic pagerank of updated graph using nvGraph.
- Find static monolithic CUDA based pagerank of updated graph.
- Find dynamic monolithic CUDA based pagerank of updated graph.
- Find static levelwise CUDA based pagerank of updated graph.
- Find dynamic levelwise CUDA based pagerank of updated graph.
Each approach was attempted on a number of graphs, running each with multiple
batch sizes (1
, 5
, 10
, 50
, ...). Each batch size was run with 5
different updates to graph, and each specific update was run 5 times for each
approach to get a good time measure. Levelwise pagerank is the
[STIC-D algorithm], without ICD optimizations. Indeed, dynamic levelwise
pagerank is faster than the static approach for many batch sizes. In
order to measure error, nvGraph pagerank is taken as a reference.
This experiment (compare-monolithic) was for comparing performance between:
- Find dynamic pagerank of updated graph using Monolithic PageRank.
- Find dynamic pagerank of updated graph using Levelwise PageRank.
- Find dynamic pagerank of updated graph using pure CPU/GPU HyPR.
- Find static pagerank of updated graph using plain STIC-D PageRank (CPU only).
- Find incremental pagerank of updated using nvGraph PageRank (GPU only).
This study was carried out to extend the Levelwise strategy of PageRank computation in the STIC-D paper to perform dynamic PageRank on the CPU as well as the GPU. This levelwise computation of PageRank is computationally more efficient because it processes SCCs in topological order, avoiding unnecessary recomputation of SCCs that are dependent upon ranks of vertices in other SCCs which have not yet converged. We have compared it to our monolithic CPU/GPU implementations, along with other state-of-the art implementations, such as nvGraph and HyPR in batch sizes of 500, 1000, 2000, 5000, and 10000 edges. We also contrasted the performance of a batched update to a series of single edge updates (cumulative).
Results indicate that Levelwise approach is in general faster than Monolithic PageRank on the CPU, and the opposite is true on the GPU. This likely has to do with the fact that processing a large number of small levels is inefficient. Hence with Levelwise PageRank, smaller levels/components should be combined and processed at a time in order to help improve GPU usage efficiency.
- STIC-D: algorithmic techniques for efficient parallel pagerank computation on real-world graphs
- PageRank Algorithm, Mining massive Datasets (CS246), Stanford University
- SuiteSparse Matrix Collection