Platform driver for the LDBC Graphalytics benchmark using SuiteSparse:GraphBLAS and LAGraph.
To execute the Graphalytics benchmark on GraphBLAS, follow the steps in the Graphalytics tutorial on Running Benchmark with the GraphBLAS-specific instructions listed below.
This project implements the GraphBLAS platform driver for the LDBC Graphalytics benchmark. It consists of the following components:
- The platform driver is written in Java and implements the classes required by the
ldbc_graphalytics
framework. - The algorithms (BFS, PR, etc.) are implemented in C in the LAGraph library (currently on the
dev
branch). - The C++ wrapper for LAGraph is defined in the
src/main/c
directory. - The Java driver uses shell scripts to run the benchmark, see e.g. the
execute-job.sh
script that invokes the binary program for a given algorithm. - The graphs (stored in
.v
and.e
files) are converted to a vertex relabelling file (.vtx
) and a matrix stored in Matrix Market format (.mtx
). The vertex relabelling is a bijective mapping that maps between the sparse UINT64 IDs in the original data to a dense contiguous set of IDs between 1 and |V|. The mapping is implemented in therelabel.py
Python script that internally uses DuckDB.
On Debian/Fedora-based Linux distributions, you may install the prerequisite packages and dependencies listed below using a singe comman:
scripts/install-dependencies.sh
Make sure you have the following software packages installed:
- Apache Maven 3+
- CMake 3.10+
- C++ compiler: GCC, Clang, or ICC
- Python 3.8+
- DuckDB Python package (
duckdb
)
On Linux, you may use the following script to install these dependencies:
scripts/install-prerequisites.sh
The implementation depends on two C libraries, SuiteSparse:GraphBLAS and LAGraph. We require very recent versions of these libraries, so it is best to compile them from their source code.
-
To install SuiteSparse:GraphBLAS v8.0.0+, run:
scripts/install-graphblas.sh
-
To install LAGraph (
dev
branch), run:scripts/install-lagraph.sh
To only build the C++ wrapper (for quick test builds), run the following script:
bin/sh/build-wrapper-only.sh
-
To initialize the benchmark package, run:
scripts/init.sh ${GRAPHS_DIR} ${MATRICES_DIR}
where
GRAPHS_DIR
is the directory of the graphs and the validation data. The argument is optional and its default value is~/graphs
.MATRICES_DIR
is the directory of the pre-generated matrix files (in Matrix Market format). The argument is optional and its default value is~/matrices
.
This script creates a Maven package (
graphalytics-${GRAPHALYTICS_VERSION}-graphblas-${PROJECT_VERSION}.tar.gz
). Then, it decompresses the package, initializes a configuration directoryconfig
(based on the content of theconfig-template
directory) and sets default values of the directories (see above) and the number of threads.Note that the project uses the Build Number Maven plug-in to ensure reproducibility. Hence, builds fail if the local Git repository contains uncommitted changes. To build it regardless (for testing), run it as follows:
scripts/init-for-testing.sh ${GRAPHS_DIR} ${MATRICES_DIR}
-
Navigate to the directory created by the
init.sh
script:cd graphalytics-*-graphblas-*/
-
Edit the configuration files (e.g. graphs to be included in the benchmark) in the
config
directory.- To conduct benchmark runs, edit the
config/benchmark.properties
file and replace theinclude = benchmarks/custom.properties
to select the dataset size you wish to use, e.g.include = benchmarks/xl.properties
- Inspect
config/platform.properties
and check whether the value ofplatform.graphblas.num-threads
was set correctly.
- To conduct benchmark runs, edit the
-
Run the benchmark with the following command:
bin/sh/run-benchmark.sh
The workflow of the GraphBLAS implementation is illustrated in the following figure. Note that the "raw graph files" and the "configuration" are provided by the user, while the rest of the data artifacts (intermediate data sets, outputs, etc.) are created automatically by the framework.