DuckDB requires CMake to be installed and a C++11
compliant compiler. GCC 4.9 and newer, Clang 3.9 and newer and VisualStudio 2017 are tested on each revision.
Run make
in the root directory to compile the sources. For development, use make debug
to build a non-optimized debug versoin. You may run make unit
and make allunit
to verify that your version works properly after making changes.
A command line utility based on sqlite3
can be found in either build/release/tools/shell/shell
(release, the default) or build/debug/tools/shell/shell
(debug).
As DuckDB is an embedded database, there is no database server to launch or client to connect to a running server. However, the database server can be embedded directly into an application using the C or C++ bindings. The main build process creates the shared library build/release/src/libduckdb.[so|dylib|dll]
that can be linked against. A static library is built as well.
For examples on how to embed DuckDB into your application, see the examples folder.
After compiling, benchmarks can be executed from the root directory by executing ./build/release/benchmark/benchmark_runner
.
DuckDB is implemented in C++ 11, should compile with GCC and clang, uses CMake to build and Catch2 for testing. In addition, we use Jenkins as a CI platform. DuckDB uses some components from various Open-Source databases and draws inspiration from scientific publications. Here is an overview:
- Parser: We use the PostgreSQL parser that was repackaged as a stand-alone library. The translation to our own parse tree is inspired by Peloton.
- Shell: We have adapted the SQLite shell to work with DuckDB.
- Tests: We use the SQL Logic Tests from SQLite to test DuckDB.
- Query fuzzing: We use SQLsmith to generate random queries for additional testing.
- Date Math: We use the date math component from MonetDB.
- SQL Window Functions: DuckDB's window functions implementation uses Segment Tree Aggregation as described in the paper "Efficient Processing of Window Functions in Analytical SQL Queries" by Viktor Leis, Kan Kundhikanjana, Alfons Kemper and Thomas Neumann.
- Execution engine: The vectorized execution engine is inspired by the paper "MonetDB/X100: Hyper-Pipelining Query Execution" by Peter Boncz, Marcin Zukowski and Niels Nes.
- Optimizer: DuckDB's optimizer draws inspiration from the papers "Dynamic programming strikes back" by Guido Moerkotte and Thomas Neumman as well as "Unnesting Arbitrary Queries" by Thomas Neumann and Alfons Kemper.
- Concurrency control: Our MVCC implementation is inspired by the paper "Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems" by Thomas Neumann, Tobias Mühlbauer and Alfons Kemper.
- Storage: DuckDB uses DataBlocks for persistent storage as described in the paper "Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation" by Harald Lang, Tobias Mühlbauer, Florian Funke, Peter Boncz, Thomas Neumann and Alfons Kemper
- Regular Expression: DuckDB uses Google's RE2 regular expression engine.
- Continuous Benchmarking (CB™), runs TPC-H, TPC-DS and some microbenchmarks on every commit