Skip to content

MADlib v1.4

Compare
Choose a tag to compare
@iyerr3 iyerr3 released this 22 Mar 19:00
· 347 commits to placeholder since this release

Release Date: 2013-Nov-25

New Features:

  • Improved interface for Multinomial logistic regression:
    • Added a new interface that accepts an 'output_table' parameter and
      stores the model details in the output table instead of returning as a struct
      data type. The updated function also builds a summary table that includes
      all parameters and meta-parameters used during model training.
    • The output table has been reformatted to present the model coefficients
      and related metrics for each category in a separate row. This replaces the
      old output format of model stats for all categories combined in a
      single array.
  • Variance Estimators
    • Added Robust Variance estimator for Cox PH models (Lin and Wei, 1989).
      It is useful in calculating variances in a dataset with potentially
      noisy outliers. Namely, the standard errors are asymptotically normal even
      if the model is wrong due to outliers.
    • Added Clustered Variance estimator for Cox PH models. It is used
      when data contains extra clustering information besides covariates and
      are asymptotically normal estimates.
  • NULL Handling:
    • Modified behavior of regression modules to 'omit' rows containing NULL
      values for any of the dependent and independent variables. The number of
      rows skipped is provided as part of the output table.
      This release includes NULL handling for following modules:
      • Linear, Logistic, and Multinomial logistic regression, as well as
        Cox Proportional Hazards
      • Huber-White sandwich estimators for linear, logistic, and multinomial
        logistic regression as well as Cox Proportional Hazards
      • Clustered variance estimators for linear, logistic, and multinomial
        logistic regression as well as Cox Proportional Hazards
      • Marginal effects for logistic and multinomial logistic regression

Deprecated functions:
- Multinomial logistic regression function has been renamed to
'mlogregr_train'. Old function ('mlogregr') has been deprecated,
and will be removed in the next major version update.

- For all multinomial regression estimator functions (list given below),
changes in the argument list were made to collate all optimizer specific
arguments in a single string. An example of the new optimizer parameter is
'max_iter=20, optimizer=irls, precision=0.0001'.
This is in contrast to the original argument list that contained 3 arguments:
'max_iter', 'optimizer', and 'precision'. This change allows adding new
optimizer-specific parameters without changing the argument list.
Affected functions:
    - robust_variance_mlogregr
    - clustered_variance_mlogregr
    - margins_mlogregr

Bug Fixes:
- Fixed an overflow problem in LDA by using INT64 instead of INT32.
- Fixed integer to boolean cast bug in clustered variance for logistic
regression. After this fix, integer columns are accepted for binary
dependent variable using the 'integer to bool' cast rules.
- Fixed two bugs in SVD:
- The 'example' option for online help has been fixed
- Column names for sparse input tables in the 'svd_sparse' and
'svd_sparse_native' functions are no longer restricted to 'row_id',
'col_id' and 'value'.