Extends boosted decision trees to multivariate, longitudinal, and hierarchically
clustered data. Additionally, functions are provided for easy tuning by cross-validated grid search over n.trees, shrinkage,interaction.depth
, and n.minobsinnode
.
The package depends on the most recent version of gbm
, which includes multi-threaded tree-fitting. It can be installed here (eventually deprecated):
devtools::install_github("patr1ckm/gbm")
The package can be installed as follows:
devtools::install_github("patr1ckm/mvtboost")
2017-07-22
For Mac OSX, clang++
(from clang4
) is required to compile gbm
to use openmp
multithreading. For R 3.4.0, the instructions are taken from http://thecoatlessprofessor.com/programming/openmp-in-r-on-os-x/#after-3-4-0.
- Make sure
Xtools
is installed - Download and install
clang4
: https://uofi.app.box.com/v/r-macos-clang-pkg - Verify that a file
~/.R/Makevars
has been created and contains the following:
CC=/usr/local/clang4/bin/clang
CXX=/usr/local/clang4/bin/clang++
LDFLAGS=-L/usr/local/clang4/lib
If it hasn't been, create it.
Tree boosting for multivariate outcomes. Estimates a multivariate additive model of decision trees by iteratively selecting predictors that explain covariance in the outcomes.
library(dplyr)
data("mpg",package="ggplot2")
Y <- mpg %>% select(cty, hwy)
X <- mpg %>% select(-cty, -hwy) %>%
mutate_if(is.character, as.factor)
out <- mvtb(Y=Y,X=X, # data
n.trees=1000, # number of trees
shrinkage=.01, # shrinkage or learning rate
interaction.depth=3) # tree or interaction depth
Mixed effects tree boosting, useful for longitudinal or hierarchically clustered data. At
each iteration, the terminal node means of each tree are forced to vary by group and shrunk
proportional to group size using lme4::lmer
. Tuning is done by passing vectors
of meta-parameters as arguments.
library(dplyr)
data("mpg",package="ggplot2")
y <- mpg$cty
X <- mpg %>% select(-cty, -hwy) %>%
mutate_if(is.character, as.factor)
out <- metb(y=y, X=X, id="manufacturer",
n.trees=100,
shrinkage=.01,
interaction.depth=3,
num_threads=8)
New functions are provided that allow easy grid tuning by cross validation: gbm.cverr, mvtb_grid
, and lmerboost
. The grid is defined as expand.grid(1:cv.folds, ...)
where ...
contains vectors of
candidate meta-parameter values passed to n.trees
, shrinkage
, interaction.depth
, and n.minobsinnode
.
With gbm.cverr
, tuning the number of trees can be carried out by including trees until 1) the cross validation error is minimized or 2) a maximum amount of computation time is reached. This avoids the problem of not including enough trees, or for including more trees than is necessary.
out <- gbm.cverr(x = X, y = y,
distribution = 'gaussian',
cv.folds = 2,
nt.start = 100,
nt.inc = 100,
max.time = 1,
seed = 12345,
interaction.depth = c(1, 5),
shrinkage = 0.01,
n.minobsinnode = c(5, 50),
verbose = TRUE)
out$gbm.fit
summary(out$gbm.fit)
Currently limited to continuous outcomes (generalized outcomes will be added in the future).
The package is experimental, and the interface is subject to change until version
1.0, but usually maintains the original gbm
interface (2.1.1 and below).
vignette("mvtboost_wellbeing")
Miller P.J., Lubke G.H, McArtor D.B., Bergeman C.S. (2015) Finding structure in data: A data mining alternative to multivariate multiple regression. Psychological Methods. arXiv
Miller P.J., McArtor D.B., Lubke G.H (2017). Abstract: A Gradient Boosting Machine for Hierarchically Clustered Data. Multivariate Behavior Research. arXiv