cuBool is a linear Boolean algebra library primitives and operations for work with sparse matrices written on the NVIDIA CUDA platform. The primary goal of the library is implementation, testing and profiling algorithms for solving formal-language-constrained problems, such as context-free and regular path queries with various semantics for graph databases. The library provides C-compatible API, written in the GraphBLAS style.
The library is shipped with python package pycubool - wrapper for cuBool library C API. This package exports library features and primitives in high-level format with automated resources management and fancy syntax sugar.
The primary library primitives are sparse matrix and sparse vector of boolean values. The library provides the most popular operations for matrix manipulation, such as construction from values, transpose, sub-matrix/sub-vector extraction, matrix-to-vector reduce, element-wise addition, matrix-matrix, matrix-vector, vector-matrix multiplication, and Kronecker product.
As a fallback library provides sequential backend for mentioned above operations for computations on CPU side only. This backend is selected automatically if Cuda compatible device is not presented in the system. This can be quite handy for prototyping algorithms on a local computer for later running on a powerful server.
PyPI package web page is following link.
- C API for performance-critical computations
- Python package for every-day tasks
- Cuda backend for computations
- Cpu backend for computations
- Matrix/vector creation (empty, from data, with random data)
- Matrix-matrix operations (multiplication, element-wise addition, element-wise multiplication, kronecker product)
- Matrix-vector operations (matrix-vector and vector-matrix multiplication)
- Vector-vector operations (element-wise addition, element-wise multiplication)
- Matrix operations (equality, transpose, reduce to vector, extract sub-matrix)
- Vector operations (equality, reduce to value, extract sub-vector)
- Matrix/vector data extraction (as lists, as list of pairs)
- Matrix/vector syntax sugar (pretty string printing, slicing, iterating through non-zero values)
- IO (import/export matrix from/to
.mtx
file format) - GraphViz (export single matrix or set of matrices as a graph with custom color and label settings)
- Debug (matrix string debug markers, logging)
- Linux-based OS (tested on Ubuntu 20.04)
- Windows (coming soon)
- macOS (coming soon)
Create sparse matrices, compute matrix-matrix product and print the result to the output:
import pycubool as cb
a = cb.Matrix.empty(shape=(2, 3))
a[0, 0] = True
a[1, 2] = True
b = cb.Matrix.empty(shape=(3, 4))
b[0, 1] = True
b[0, 2] = True
b[1, 3] = True
b[2, 1] = True
print(a, b, a.mxm(b), sep="\n")
Sparse Boolean matrix-matrix multiplication evaluation results are listed bellow. Machine configuration: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM, GeForce GTX 1070 GPU with 8Gb VRAM.
The matrix data is selected from the SuiteSparse Matrix Collection link.
Matrix name | # Rows | Nnz M | Nnz/row | Max Nnz/row | Nnz M^2 |
---|---|---|---|---|---|
SNAP/amazon0312 | 400,727 | 3,200,440 | 7.9 | 10 | 14,390,544 |
LAW/amazon-2008 | 735,323 | 5,158,388 | 7.0 | 10 | 25,366,745 |
SNAP/web-Google | 916,428 | 5,105,039 | 5.5 | 456 | 29,710,164 |
SNAP/roadNet-PA | 1,090,920 | 3,083,796 | 2.8 | 9 | 7,238,920 |
SNAP/roadNet-TX | 1,393,383 | 3,843,320 | 2.7 | 12 | 8,903,897 |
SNAP/roadNet-CA | 1,971,281 | 5,533,214 | 2.8 | 12 | 12,908,450 |
DIMACS10/netherlands_osm | 2,216,688 | 4,882,476 | 2.2 | 7 | 8,755,758 |
Detailed comparison is available in the full paper text at link.
If you are running Linux-based OS (tested on Ubuntu 20.04) you can download the official
PyPI pycubool python package, which includes compiled library source code
with Cuda and Sequential computations support. Installation process
requires only python3
to be installed on your machine. Python can be installed
as follows link.
If all requirements are satisfied, run the following command to install PyPI package:
$ python3 -m pip install pycubool
The following links give a brief start guide for new users:
- C API example - complete C++ application to compute transitive closure of an example graph
- Python API usage - complete and detailed set of the python API usage examples for all lib features
This section gives instructions to build the library from sources. These steps are required if you want to build library for your specific platform with custom build settings.
- Linux-based OS (tested on Ubuntu 20.04)
- CMake Version 3.15 or higher
- CUDA Compatible GPU device (to run Cuda computations)
- GCC Compiler
- NVIDIA CUDA toolkit (to build Cuda backend)
- Python 3 (for
pycubool
library) - Git (to get source code)
Skip this section if you want to build library with only sequential backend without cuda backend support.
Before the CUDA setup process, validate your system NVIDIA driver with nvidia-smi
command. Install required driver via ubuntu-drivers devices
and
apt install <driver>
commands respectively.
The following commands grubs the required GCC compilers for the CC and CXX compiling respectively. CUDA toolkit, shipped in the default Ubuntu package manager, has version number 10 and supports only GCC of the version 8.4 or less.
$ sudo apt update
$ sudo apt install gcc-8 g++-8
$ sudo apt install nvidia-cuda-toolkit
$ sudo apt install nvidia-cuda-dev
$ nvcc --version
If everything successfully installed, the last version command will output something like this:
$ nvcc: NVIDIA (R) Cuda compiler driver
$ Copyright (c) 2005-2019 NVIDIA Corporation
$ Built on Sun_Jul_28_19:07:16_PDT_2019
$ Cuda compilation tools, release 10.1, V10.1.243
Bonus Step: In order to have CUDA support in the CLion IDE, you will have to
overwrite global alias for the gcc
and g++
compilers:
$ sudo rm /usr/bin/gcc
$ sudo rm /usr/bin/g++
$ sudo ln -s /usr/bin/gcc-8 /usr/bin/gcc
$ sudo ln -s /usr/bin/g++-8 /usr/bin/g++
This step can be easily undone by removing old aliases and creating new one for the desired gcc version on your machine. Also you can safely omit this step if you want to build library from the command line only.
Useful links:
- NVIDIA Drivers installation Ubuntu
- CUDA Linux installation guide
- CUDA Hello world program
- CUDA CMake tutorial
Run the following commands in the command shell to download the repository,
make build
directory, configure cmake build
and run compilation process.
First of all, get the source code and project dependencies:
$ git clone https://github.com/JetBrains-Research/cuBool.git
$ cd cuBool
$ git submodule update --init --recursive
Make the build directory and go into it:
$ mkdir build
$ cd build
Configure build in Release mode with tests and run actual compilation process:
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DCUBOOL_BUILD_TESTS=ON
$ cmake --build . --target all -j `nproc`
$ bash ./scripts/run_tests_all.sh
By default, the following cmake options will be automatically enabled:
CUBOOL_WITH_CUDA
- build library with actual cuda backendCUBOOL_WITH_SEQUENTIAL
- build library witt cpu based backendCUBOOL_WITH_TESTS
- build library unit-tests collection
Note: in order to provide correct GCC version for CUDA sources compiling, you will have to provide custom paths to the CC and CXX compilers before the actual compilation process as follows:
$ export CC=/usr/bin/gcc-8 $ export CXX=/usr/bin/g++-8 $ export CUDAHOSTCXX=/usr/bin/g++-8
Export env variable PYTHONPATH="/build_dir_path/python/:$PYTHONPATH"
if
you want to use pycubool
without installation into default python packages dir.
This variable will help python find package if you import it as import pycubool
in your python scripts.
To run regression tests within your build directory, open folder /build_dir_path/python
and
run the following command:
$ export PYTHONPATH="`pwd`:$PYTHONPATH"
$ cd tests
$ python3 -m unittest discover -v
Note: after the build process, the shared library object will be placed
inside the build directory in the folder with python wrapper python/pycubool/
.
So, the wrapper will be able to automatically locate required lib file.
Pycubool Python-package can be configured before import by the following environment variables:
-
CUBOOL_PATH
- custom path to compiled cubool library object. Without that variable by default the package will try to find library in the package folder. -
CUBOOL_BACKEND
- allows to select backend for execution. By default library selects cuda if present. Pass valuecpu
to force cpu computations, even if cuda backend is presented and supported for selection. -
CUBOOL_MEM
- type of the memory to use if run computations in cuda backend. By default library uses device memory. If pass in the variable valuemanaged
, then backend will be configured to use cuda managed memory for resources allocation.
The following C++ code snipped demonstrates, how library functions and primitives can be used for the transitive closure evaluation of the directed graph, represented as an adjacency matrix with boolean values. The transitive closure provides info about reachable vertices in the graph:
/**
* Performs transitive closure for directed graph
*
* @param A Adjacency matrix of the graph
* @param T Reference to the handle where to allocate and store result
*
* @return Status on this operation
*/
cuBool_Status TransitiveClosure(cuBool_Matrix A, cuBool_Matrix* T) {
cuBool_Matrix_Duplicate(A, T); /* Duplicate A to result T */
cuBool_Index total = 0;
cuBool_Index current;
cuBool_Matrix_Nvals(*T, ¤t); /* Query current nvals value */
while (current != total) { /* Iterate, while new values are added */
total = current;
cuBool_MxM(*T, *T, *T, CUBOOL_HINT_ACCUMULATE); /* T += T x T */
cuBool_Matrix_Nvals(*T, ¤t);
}
return CUBOOL_STATUS_SUCCESS;
}
The following Python code snippet demonstrates, how the library python wrapper can be used to compute the same transitive closure problem for the directed graph within python environment:
import pycubool as cb
def transitive_closure(a: cb.Matrix):
"""
Evaluates transitive closure for the provided
adjacency matrix of the graph.
:param a: Adjacency matrix of the graph
:return: The transitive closure adjacency matrix
"""
t = a.dup() # Duplicate matrix where to store result
total = 0 # Current number of values
while total != t.nvals:
total = t.nvals
t.mxm(t, out=t, accumulate=True) # t += t * t
return t
cuBool
├── .github - GitHub Actions CI setup
├── docs - documents, text files and various helpful stuff
├── scripts - short utility programs
├── cubool - library core source code
│ ├── include - library public C API
│ ├── sources - source-code for implementation
│ │ ├── core - library core and state management
│ │ ├── io - logging and i/o stuff
│ │ ├── utils - auxilary class shared among modules
│ │ ├── backend - common interfaces
│ │ ├── cuda - cuda backend
│ │ └── sequential - fallback cpu backend
│ ├── utils - testing utilities
│ └── tests - gtest-based unit-tests collection
├── python - pycubool related sources
│ ├── pycubool - cubool library wrapper for python (similar to pygraphblas)
│ ├── tests - regression tests for python wrapper
│ ├── examples - short script files with python api usage examples
│ └── data - generate data for pycubool regression tests
├── deps - project dependencies
│ ├── cub - cuda utility, required for nsparse
│ ├── gtest - google test framework for unit testing
│ ├── naive - GEMM implementation for squared dense boolean matrices
│ ├── nsparse - SpGEMM implementation for csr matrices
│ └── nsparse-um - SpGEMM implementation for csr matrices with unified memory (configurable)
└── CMakeLists.txt - library cmake config, add this as sub-directory to your project
- Egor Orachyov (Github: EgorOrachyov)
- Pavel Alimov (Github : Krekep)
- Semyon Grigorev (Github: gsvgit)
@MISC{cuBool,
author = {Orachyov, Egor and Alimov, Pavel and Grigorev, Semyon},
title = {cuBool: sparse Boolean linear algebra for Nvidia Cuda},
year = 2021,
url = {https://github.com/JetBrains-Research/cuBool},
note = {Version 1.2.0}
}
In this section listed all the related papers, which were used as an algorithmic foundation for implementation of sparse linear boolean algebra operations (sparse matrix-matrix multiplication, sparse matrix-vector multiplication, sparse vector-matrix multiplication, matrix-matrix element-wise addition and etc.):
- High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU, Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka
- GPU Merge Path - A GPU Merging Algorithm, Oded Green, Robert McColl, David A. Bader
- Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format, Joseph L. Greathouse, Mayank Daga
- Atomic Reduction Based Sparse Matrix-Transpose Vector Multiplication on GPUs, Yuan Tao, Yangdong Deng, Shuai Mu, Mingfa Zhu, Limin Xiao, Li Ruan, Zhibin Huang
This project is licensed under MIT License. License text can be found in the license file.
This is a research project of the Programming Languages and Tools Laboratory at JetBrains-Research. Laboratory website link.
The name of the library is formed by a combination of words Cuda and Boolean, what literally means Cuda with Boolean and sounds very similar to the name of the programming language COBOL.