Skip to content

The C++ Abstraction Layer

Rahul Iyer edited this page Jul 1, 2015 · 10 revisions

Writing Code for the MADlib® C++ Abstraction Layer

Preamble

The MADlib C++ Abstraction Layer provides a means for writing platform-independent user-defined functions in C++. It provides a complete abstraction, performs all necessary type checking and embraces the Eigen C++ Library for providing an intuitive and clean interface to high-performance linear-algebra functions (LAPACK).

Example:

AnyValue student_t_cdf(AbstractDBInterface &db, AnyValue args) {
    AnyValue::iterator arg(args);

    // Arguments from SQL call
    const int64_t nu = *arg++;
    const double t = *arg;

    /* We want to ensure nu > 0 */
    if (nu <= 0)
        throw std::domain_error("Student-t distribution undefined for "
            "degree of freedom <= 0");

    return studentT_cdf(nu, t);
}

Features

  • Performs type checking of function argument
    • Lossless conversion of pass-by-value is done implicitly (e.g., from uint32_t to uint64_t)
    • Implicit lossy conversion will throw an exception (e.g., from uint64_t to uint32_t)
  • Supports pass-by-reference whenever possible (performance!). However, if the user code asks for a mutable object but the database prohibits direct modification, a copy is automatically created.
    • Well-behaved/non-hacky user code cannot accidentally corrupt database data
  • The only supported means for user code to communicate with the DBMS backend is through the interface provided by AbstractDBInterface/AbstractAllocator (truly platform-independent!).
  • Integration of Armadillo for linear algebra operations (Armadillo itself is a C++ wrapper for LAPACK). This allows for intuitive math notation in C++ code.

Interesting Implementation Details

PostgreSQL port

  • Overloads the global throw/nothrow variants of operator new and operator delete to use palloc/pfree
  • All memory allocation is funneled through the PGAllocator class
  • All callbacks into the backend (in particular palloc/pfree) occur within PG_TRY/PG_CATCH blocks. This ensures that any postgres exception raised by ereport will return back to the calling C++ function. There we throw a C++ exception, which is caught at just above the C/C++ boundary. From there the PostgreSQL exception is rethrown. This procedure ensures that the C++ stack is always unwound properly (otherwise the longjump done by ereport would lead to behavior that is undefined by the C++ standard).
  • Callbacks into the DBMS backend where no exceptions are permitted (operator new (std::nothrow)) deactivate interrupt processing for the duration of their callback (interrupts are, of course, still properly recorded while processing is disabled). This is to ensure that no database signals get lost -- otherwise, e.g., it would be indistinguishable for the caller of a failed operator new (std::nothrow) whether the NULL pointer is due to a SIGINT or because of full memory. The rationale here is that by disabling interrupts, the SIGINT would get preserved and be dealt with at an appropriate later point.