Skip to content

MADlib v1.6

Compare
Choose a tag to compare
@iyerr3 iyerr3 released this 22 Mar 02:29
· 221 commits to placeholder since this release

Release Date: 2014-June-30

New features:

  • Added a new unified 'margins' function that computes marginal effects for
    linear, logistic, multilogistic, and cox proportional hazards regression. The
    new function also introduces support for interaction terms in the independent
    array.
  • Updated convergence for 'elastic_net_train' by checking the change in the
    loglikelihood instead of the l2-norm of the change in coefficients. This allows
    for faster convergence in problems with multiple optimal solutions.
    The default threshold for convergence has been reduced from 1e-4 to 1e-6.
  • Added a new helper function to convert categorical variables to indicator
    variables which can be used directly in regression methods. The function
    currently only supports dummy encoding.
  • Improved performance for cox proportional hazards: average improvement of
    20 fold on GPDB and 2.5 fold on HAWQ.
  • Improved performance on ARIMA by 30%.
  • Added new functionality to export linear and logistic regression models as a
    PMML object. The new module relies on PyXB to create PMML elements.
  • Added a function ('array_scalar_add') to 'add' a scalar to an array.
  • Added 'numeric' type support for all functions that take 'anyarray' as
    argument.
  • Made usability and aesthetic enhancements to documentation.

Bug Fixes:

  • Prepended python module name to sys.path before executing madlib function
    to avoid conflicts with user-defined modules.
  • Added a check in K-Means to ensure dimensionality of all data points are
    the same and also equal to the dimensionality of any provided initial centroids
    (MADLIB-713, MADLIB-789).
  • Added a check in multinomial regression to quit early and cleanly if model
    size is greater than the maximum permissible memory (MADLIB-667).
  • Fixed a minor bug with incorrect column names in the decision trees module
    (MADLIB-763).
  • Fixed a bug in Kmeans that resulted in incorrect number of centroids for
    particular datasets (MADLIB-857).
  • Fixed bug when grouping columns have same name as one of the output table
    column names (MADLIB-833).

Deprecated Functions:

  • Modules profile and quantile have been deprecated in favor of the 'summary'
    function.
  • Module 'svd_mf' has been deprecated in favor of the improved 'svd' function.
  • Functions 'margins_logregr' and 'margins_mlogregr' have been deprecated in
    favor of the 'margins' function.