Skip to content

MADlib Installer Notes (v0.1alpha)

agorajek edited this page May 13, 2011 · 1 revision

These instructions assume you have Greenplum in $GPHOME, and a gpadmin user account with sudo privileges.

  1. Make sure you have Python setuptools installed. If you're using Greenplum, you'll need to do this manually. Log into your gpadmin account, make sure you've sourced the greenplum paths, and do the following:
     cd /tmp
     wget http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11-py2.6.egg#md5=bfa92100bd772d5a213eedd356d64086
     sh setuptools-0.6c11-py2.6.egg
  1. Install the python libraries used by the madlib installer.
     export CFLAGS="-L$GPHOME/ext/python/lib/ -L $GPHOME/lib/"
     $GPHOME/ext/python/bin/easy_install argparse hashlib pyyaml sqlparse psycopg2
  1. Change directory into the madlib-contrib root. You now have two choices: build an rpm for distribution and install it, or simply install the python code you have. The former is more like what will happen eventually, the latter is easier for madlib developers.

    a. Option 1: Build rpm and Install.

     python setup.py bdist_rpm
     cd dist
     rpm -Uvh madlib-0.01-1.noarch.rpm 
     sudo chown -R gpadmin $GPHOME/ext/python/lib/python2.6/site-packages/mad*
  b. **Option 2: Install directly from the repo.**
     python setup.py install
  1. In your newly-installed madpy extension, use vi (or substitute your favorite editor) to edit the madpy/Config.yml file to reflect your information. You'll likely only need to change the connect_args, but you may want to change the other fields as well.
     vi $GPHOME/ext/python/lib/python2.6/site-packages/madpy/Config.yml
  1. Make sure you have already defined the apprioriate madlib schema in the appropriate database (the schema and database specified in your madpy/Config.yml in the previous step) and that PLpgSQL and PL/Pythonu languages are installed in your database:
     CREATE SCHEMA <your_madlib_schema>;    
     CREATE LANGUAGE plpgsql;
     CREATE LANGUAGE plpythonu;
  1. Now that the python libraries are installed in the filesystem, it's time to build the database extensions and install them.
     madpack install
  1. To undo things, you want to uninstall the extensions from the database, and remove the rpm if you installed that way.
     madpack uninstall
     sudo rpm -e madlib

Adding methods to the Package Manager config

Information about packages is stored in two places.

  1. Installation configuration is in madpy/Config.yml. The format is fairly straightforward: you specify a unique name for your method (which should be the directory name under methods in the repo) and a desired port to install (which should be the directory name under <yourmethod>/src.) If you like, you can also place a Config.yml file into some directory //Config.yml in your filesystem, and run madpack -c /<path-to-dir> install.)
  2. Each port directory should have an Install.yml file that specifies SQL scripts to roll "forward" (fw) and "backward" (bw). A module key is also required to hold the module name (but is unused as of now so this may change). See sketch/src/extended_sql/pg_gp/Install.yml for an example. The depends key holds a list of modules that this one depends on, which will be installed before this package is attempted. See profile/src/extended_sql/pg_gp/Install.yml for an example.

Note: the madpack script will attempt to run make install in the port directory, which you can use to generate appropriate SQL install directory references via the use of pgxs. This requires you to configure two important things:

  1. The Makefile for your method should end with the line
include config.mk

Do not create this file yourself; it will be autogenerated (and deleted) during the madpack installation process. See sketch/src/extended_sql/pg_gp/Makefile for an example. 2. SQL scripts should use the string MADLIB_SCHEMA as the schema before any function or table names; this will be replaced by the value of the target_schema in Config.yml. See sketch/src/extended_sql/pg_gp/sketches.sql.in for an example. 3. SQL scripts are now passed through the m4 preprocessor, which allows you to place conditional text into your SQL. For example, in [[sketches.sql.in|https://github.com/madlib/madlib-contrib/blob/master/methods/sketch/src/extended_sql/pg_gp/sketches.sql.in] I have:

CREATE AGGREGATE MADLIB_SCHEMA.fmcount(anyelement)
(
    sfunc = MADLIB_SCHEMA.fmsketch_trans,
    stype = bytea, 
    finalfunc = MADLIB_SCHEMA.fmsketch_getcount,
    ifdef(`GREENPLUM',`prefunc = MADLIB_SCHEMA.fmsketch_merge,')
    initcond = '' 
);

You can now register macro definitions in the Config.yml file via the key prep_flags, as in "prep_flags: -DMADLIB -DGREENPLUM"

More information?

Try running madpack -h which provides fairly extensive help.

If you run into trouble, here are some manual steps to clean things out. I assume here you specified a schema called madlib. If not, replace your schema name in the below:

% psql <your database name>

psql (8.2.13)
Type "help" for help.

<your database name>=# drop schema madlib cascade;
<your database name>=# ^D\q
% rm -rf $GPHOME/ext/python/lib/python2.6/site-packages/mad*
%