MADlib v1.4
Release Date: 2013-Nov-25
New Features:
- Improved interface for Multinomial logistic regression:
- Added a new interface that accepts an 'output_table' parameter and
stores the model details in the output table instead of returning as a struct
data type. The updated function also builds a summary table that includes
all parameters and meta-parameters used during model training. - The output table has been reformatted to present the model coefficients
and related metrics for each category in a separate row. This replaces the
old output format of model stats for all categories combined in a
single array.
- Added a new interface that accepts an 'output_table' parameter and
- Variance Estimators
- Added Robust Variance estimator for Cox PH models (Lin and Wei, 1989).
It is useful in calculating variances in a dataset with potentially
noisy outliers. Namely, the standard errors are asymptotically normal even
if the model is wrong due to outliers. - Added Clustered Variance estimator for Cox PH models. It is used
when data contains extra clustering information besides covariates and
are asymptotically normal estimates.
- Added Robust Variance estimator for Cox PH models (Lin and Wei, 1989).
- NULL Handling:
- Modified behavior of regression modules to 'omit' rows containing NULL
values for any of the dependent and independent variables. The number of
rows skipped is provided as part of the output table.
This release includes NULL handling for following modules:- Linear, Logistic, and Multinomial logistic regression, as well as
Cox Proportional Hazards - Huber-White sandwich estimators for linear, logistic, and multinomial
logistic regression as well as Cox Proportional Hazards - Clustered variance estimators for linear, logistic, and multinomial
logistic regression as well as Cox Proportional Hazards - Marginal effects for logistic and multinomial logistic regression
- Linear, Logistic, and Multinomial logistic regression, as well as
- Modified behavior of regression modules to 'omit' rows containing NULL
Deprecated functions:
- Multinomial logistic regression function has been renamed to
'mlogregr_train'. Old function ('mlogregr') has been deprecated,
and will be removed in the next major version update.
- For all multinomial regression estimator functions (list given below),
changes in the argument list were made to collate all optimizer specific
arguments in a single string. An example of the new optimizer parameter is
'max_iter=20, optimizer=irls, precision=0.0001'.
This is in contrast to the original argument list that contained 3 arguments:
'max_iter', 'optimizer', and 'precision'. This change allows adding new
optimizer-specific parameters without changing the argument list.
Affected functions:
- robust_variance_mlogregr
- clustered_variance_mlogregr
- margins_mlogregr
Bug Fixes:
- Fixed an overflow problem in LDA by using INT64 instead of INT32.
- Fixed integer to boolean cast bug in clustered variance for logistic
regression. After this fix, integer columns are accepted for binary
dependent variable using the 'integer to bool' cast rules.
- Fixed two bugs in SVD:
- The 'example' option for online help has been fixed
- Column names for sparse input tables in the 'svd_sparse' and
'svd_sparse_native' functions are no longer restricted to 'row_id',
'col_id' and 'value'.