Skip to content

Latest commit

 

History

History
80 lines (52 loc) · 3.48 KB

README.md

File metadata and controls

80 lines (52 loc) · 3.48 KB

ddsketch

Build Status Coverage Status Codacy Badge License

This repo contains a C++14 port of the implementation for the distributed quantile sketch algorithm DDSketch [1].

The port is based on the Python implementation - reference-commit

DDSketch has relative-error guarantees for any quantile q in [0, 1]. That is if the true value of the qth-quantile is x then DDSketch returns a value y such that |x-y| / x < e where e is the relative error parameter. (The default here is set to 0.01)

DDSketch is also fully mergeable, meaning that multiple sketches from distributed systems can be combined in a central node.

The original implementation can be found here:

Installation

The ddsketch.h header needs to be copied and included into the application you are building.

As the implementation uses some Modern C++ features, you will need to compile your application with -std=c++14

Usage

#include "ddsketch.h"

constexpr auto kDesiredRelativeAccuracy = 0.01;
ddsketch::DDSketch sketch(kDesiredRelativeAccuracy);

for (auto value = 1; value <= 100; ++value) {
    sketch.add(value);
}

const auto quantiles = {
    0.01, 0.05, 0.10, 0.20, 0.25,
    0.40, 0.50, 0.60, 0.75, 0.85,
    0.95, 0.96, 0.97, 0.98, 0.99
};

std::cout.precision(std::numeric_limits<double>::max_digits10);

for (const auto quantile : quantiles) {
    const auto computed_quantile = sketch.get_quantile_value(quantile);

    std::cout << "Quantile: " << quantile << "\n"
              << "Computed Quantile Value: " << computed_quantile << "\n";
}

Build

The build system uses CMake. You can compile the example application by running the following commands:

mkdir build && cd build

cmake ../ -DBUILD_EXAMPLES=ON
cmake --build .

examples/DDSketch_Examples

Performance

Below, we will attempt to benchmark the insertion rate of the algorithm

Given the fact that ddsketch only computes a logarithm for every input value, the insert operations are quite fast.

CPU Avg Insert Rate Ns per Insert
Intel i7-7820HQ (Virtual-Machine) 8.5 million 117
AMD EPYC 11.4 million 84

License

Apache 2.0

References

Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019