IMIB is a benchmark framework for evaluating single-attribute in-memory secondary indexes in their lookup speed, maintenance cost, and memory consumption. It contains various benchmark cases explicitly implemented to evaluate the latency of equality and range lookups, inserts, deletes, and bulk operations (bulk inserts and bulk loads) of included index implementations. Furthermore, the benchmark cases for evaluating inserts, bulk inserts, and bulk loads also measure the corresponding index implementation's allocated memory.
Implementation | Data structure | C++ class |
---|---|---|
Unsync ART | radix tree | ART_unsynchronized::Tree |
MP Judy | radix tree | judyLArray |
TLX B+ Tree | B+ tree | tlx::btree_map |
Abseil B-Tree | B-tree | absl::btree_map |
BB-Tree | k-ary search tree | BBTree |
PG Skip List | skip list | goodliffe::multi_skip_list |
RH Flat Map | hash map | robin_hood::unordered_flat_map |
RH Node Map | hash map | robin_hood::unordered_node_map |
TSL Robin Map | hash map | tsl::robin_map |
TSL Sparse Map | hash map | tsl::sparse_map |
STD Hash Map | hash map | std::unordered_map |
The Unsync ART additionally requires the tbb
library.
git submodule update --init --recursive
mkdir build
cd build/
cmake .. -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -GNinja
ninja
Run the test suite:
./imiTest
Run the benchmark:
./imiBench <key type> <iterations> <data binary file> <equality lookup file> <range lookup file> <result file prefix>
Generate unsigned integer datasets:
./scripts/generate_uint_data.py <data size> <subset count> <equality lookup count> <range lookup selectivities> <range lookup count>
Generate the datasets from the project root directory:
./scripts/generate_uint_data.py 10000000 5 1000000 0.0001 1000000
The above command generates five index entry datasets, with the largest having 10 million entries. The resulting datasets have 2, 4, 6, 8, and 10 million entries. Besides, a dataset specifying equality lookup keys is generated. This contains 1 million search keys. Also, a dataset specifying ranges for range lookups is generated. This contains 1 million search key ranges. Each resulting range lookup has a selectivity of 0.0001 (0.01%).
Run the benchmark script to execute the imiBench
binary with the various sized datasets:
cd build/
../scripts/bench_sparse_dense.py
The benchmark script runs the imiBench
binary for the datasets that are present in resources/data
. For example, if only index entries with dense 64-bit unsigned integer keys are to be used, only files with the prefix dense_uint64
should be present in this directory.
Visualize the Results:
If multiple json result files are generated for all four combinations of sparse/dense ascending/shuffled datasets, graphs showing for all datasets and the four various data characteristic combinations can be visualized with the following script. This has to be executed in the directory with the json files (assuming the build
directory in this example):
../scripts/plot_multi_json_results.py subplots
If you use this benchmark framework in your work, please cite us.
@inproceedings{Weisgut21ExperimentalIndexEvaluation,
author = {Marcel Weisgut},
title = {Experimental Index Evaluation for Partial Indexes in Horizontally
Partitioned In-Memory Databases},
booktitle = {Proceedings of the 32nd GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken)},
series = {{CEUR} Workshop Proceedings},
year = {2021},
}