Skip to content

MADlib Installer Notes (v0.2beta)

agorajek edited this page May 20, 2011 · 2 revisions

Contents:

  1. Overview
  2. Config files
  3. Adding a Method to MADlib Package
  4. Adding a DB Port to MADlib Package

1. Overview

MADlib installer framework consists of the following scripts and config files:

./madpack
    madpack.py         # Main madpack executable 

./config/              # Default config dir   
    Methods.yml            # list of all MADlib methods
    Ports.yml              # list of supported database ports (example: postrgres, greenplum)
    Version.yml            # MADlib version info

./madpack/              # General Python code directory   
    configyml.py           # yaml files helper
    ...

./ports/               # Port specific files
    <db_port>/             # General dir for db_port specific code and config
        config/                # Config dir for greenplum port
            Methods.yml            # list of all MADlib methods for this port (greenplum)
        ...

2. Config files

2.1 Ports.yml

List of database ports supported by MADlib. Attributes:

  • name : descriptive name of the new DB port
  • id : ID of this port (used in the port specs file .yml)
  • dbapi2 : Python API module for this port (DBAPI2 compliant: http://www.python.org/dev/peps/pep-0249/)

Example:

ports:
    - name:   PostgreSQL
      id:     postgres
      dbapi2: pygresql.pgdb
    - name:   Greenplum DB
      id:     greenplum
      dbapi2: pygresql.pgdb

2.2 Version.yml

The one and only source of MADlib version id. Should be manually updated with each release.

Example:

version: v0.1.1beta

2.3 Methods.yml

List of all MADlib methods and their dependencies. The default version of this file is located in ./config dir and used if no DB port specific Methods.yml exist. The port specific version (if exists) can be found in ./ports/PORTID/config directory.

Example:

methods:
    - name:    sketch
    - name:    bayes
    - name:    k-means
      depends: ['svec']
    - name:    svec

3. Adding a Method to MADlib Package

Once a new method has been developed according to (...), reviewed and is ready for including in MADlib all the relevant Methods.yml files must be edited:

  • default: ./config/Methods.yml
  • port specific (if exist): ./ports/.../config/Methods.yml

4. Adding a DB Port to MADlib Package

Apart from adjusting the MADlib code to work with a new DB platform or creating a separate code set for that platform, adding support for a new database platform involves registering it in the ./config/Ports.yml file.