From b0b5d4dbb1f886049411ce8e7cc230177ef9fe38 Mon Sep 17 00:00:00 2001 From: be-green Date: Fri, 24 Jan 2020 18:11:31 -0600 Subject: [PATCH 01/12] warnings vignette --- vignettes/baggr_model_convergence.Rmd | 78 +++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 vignettes/baggr_model_convergence.Rmd diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd new file mode 100644 index 00000000..a904c581 --- /dev/null +++ b/vignettes/baggr_model_convergence.Rmd @@ -0,0 +1,78 @@ +--- +title: "Understanding Model Convergence" +author: "Brice Green" +date: "`r Sys.Date()`" +output: + rmarkdown::html_vignette: + toc: true +vignette: > + %\VignetteIndexEntry{baggr} +%\VignetteEngine{knitr::rmarkdown} +%\VignetteEncoding{UTF-8} +bibliography: baggr.bib +--- + +The `baggr` package makes running Bayesian meta-analysis models very easy, but that ease can also obscure some things that go on behind the scenes. Especially with low-data settings, say with only a few measurements, you may get cryptic error messages when you run your models, like + +``` +Warning: There were 110 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See +http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +``` + +This vignette will walk through the basic intuition for what those mean, whether to be concerned about them, and how to tune the parameters underneath the hood. There are already a number of good resources that go more in depth on the topic so first I'll point you there: + +[Visual MCMC Diagnostics Using the Bayesplot Package](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html) + +[Scale Reduction in the Stan Manual](https://mc-stan.org/docs/2_21/reference-manual/notation-for-samples-chains-and-draws.html) + +[Divergent Transitions in the Stan Manual](https://mc-stan.org/docs/2_21/reference-manual/divergent-transitions.html) + +These go more in depth than this discussion is going to, but if you find yourself needing a deeper discussion then these would be good starting points. + +# Behind the scenes: basics of MCMC + +The baggr package uses fully Bayesian methods to estimate meta-analysis models, meaning that it takes a prior (which if not specified is assigned by default), and jointly with the data uses that prior to estimate a posterior according to Bayes' rule + + +$$P(\theta | X) = \dfrac{P(X | \theta) P(\theta)}{P(X)}$$ + +In order to calculate $P(X)$, we need some computationally expensive machinery that can calculate + +$$P(X) = \int P(X | \theta) d\theta $$ + +which is (in general) really difficult except for a few specific situations. In order to make these types of models easy to run, we use a method that approximates this integral to a high degree of numerical accuracy. + +MCMC methods leverage an accept/reject proposal scheme by generating a potential value for $\theta$, which is some set of parmeters, and using the relative likelihoods ($P(\theta|X)$) at that new value to determine the probability of whether to accept or reject those proposals. The simplest of these methods, Metropolis-Hastings, pretty much ends there. But for meta-analysis models (and other models with of high dimension), the spaces that the random walk has to explore get really complicated. + +So instead of plain-old Metropolis-Hastings, we use Hamiltonian Monte Carlo with No-U-Turn sampling, powered on the back end by Stan. This algorithm leverages the gradient, or the rate of change in the posterior density at a given parameter value, in order to more efficiently generate new parameter proposals. + +# Warnings like "divergent transitions" + +The gradient is passed into a Hamiltonian equation, which simulates a path for where it thinks the posterior is headed. But this trajectory is based on the local behavior of the posterior, and depending on the geometry, if we move too far away from that point we might get more and more off base. Stan, on the back end, uses the `adapt_delta` parameter to determine how often it needs to check its simulated paths, re-evaluating the trajectory it is using. + +If you make `adapt_delta` really big, you end up moving really slowly, but you also are less likely to fly off the true trajectory of the posterior distribution. But if you make it small, you go faster, etc. You get the idea. How high you should set the parameter depends on the extent to which the algorithm is properly modeling the paths. So if you encounter divergent transitions, it is often good to raise `adapt_delta`. This discussion is really oversimplifying things to build the intuition, but you can find an (accessible) deeper discussion [here](https://arxiv.org/abs/1701.02434) +if it is of interest! + +When you get a warning that there are a certain number of "divergent transitions," it means that the samples have gone far away from the anticipated path, and (at worse) are reduced to a random walk. This means that they may not be fully exploring the posterior distribution, and so you may need more iterations to get a good approximation of the distribution. + +If you get this warning and want to re-estimate your model with a higher `adapt_delta` setting, you can change the argument in the call to the `baggr` function like so: + +```{r} +# the ... parameters in the call to baggr() are +# passed to rstan::sampling +# for documentation see ?rstan::sampling + +baggr(schools, model = "rubin", + pooling = "partial", + prior_hypermean = normal(0, 10), + prior_hypersd = cauchy(0, 3), + control = list( + adapt_delta = 0.9 # 0.8 by default + )) +``` + +If you still get a lot of divergent transitions after raising the `adapt_delta` really high, it may indicate an issue with your model! This is what is commonly referred to as the [folk theorem of statistical computing](https://statmodeling.stat.columbia.edu/2008/05/13/the_folk_theore/). + +# R hat being too low + +The other warning you get, that r_hat statistics are too low, indicates that the From 6c145795cd2a2f53469ac529c6106058b208828e Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 10:59:26 -0600 Subject: [PATCH 02/12] update convergence vignette, link rstan::sampling in the ... args for baggr --- R/baggr.R | 2 +- man/baggr.Rd | 190 -------- man/convert_inputs.Rd | 64 --- man/forest_plot.Rd | 51 -- vignettes/baggr_model_convergence.R | 26 ++ vignettes/baggr_model_convergence.Rmd | 41 +- vignettes/baggr_model_convergence.html | 614 +++++++++++++++++++++++++ 7 files changed, 671 insertions(+), 317 deletions(-) delete mode 100644 man/baggr.Rd delete mode 100644 man/convert_inputs.Rd delete mode 100644 man/forest_plot.Rd create mode 100644 vignettes/baggr_model_convergence.R create mode 100644 vignettes/baggr_model_convergence.html diff --git a/R/baggr.R b/R/baggr.R index 68cb4cb1..29513fd3 100644 --- a/R/baggr.R +++ b/R/baggr.R @@ -57,7 +57,7 @@ #' with the same columns as `data` argument #' @param warn print an additional warning if Rhat exceeds 1.05 #' @param ... extra options passed to Stan function, e.g. \code{control = list(adapt_delta = 0.99)}, -#' number of iterations etc. +#' number of iterations etc. For more details see [rstan::sampling()] #' @return `baggr` class structure: a list including Stan model fit #' alongside input data, pooling metrics, various model properties. #' If test data is used, mean value of -2*lpd is reported as `mean_lpd` diff --git a/man/baggr.Rd b/man/baggr.Rd deleted file mode 100644 index e63d8a8d..00000000 --- a/man/baggr.Rd +++ /dev/null @@ -1,190 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/baggr.R -\name{baggr} -\alias{baggr} -\title{Bayesian aggregate treatment effects model} -\usage{ -baggr( - data, - model = NULL, - pooling = "partial", - effect = NULL, - covariates = c(), - prior_hypermean = NULL, - prior_hypersd = NULL, - prior_hypercor = NULL, - prior_beta = NULL, - prior = NULL, - ppd = FALSE, - test_data = NULL, - quantiles = seq(0.05, 0.95, 0.1), - outcome = "outcome", - group = "group", - treatment = "treatment", - warn = TRUE, - ... -) -} -\arguments{ -\item{data}{data frame with summary or individual level data to meta-analyse} - -\item{model}{if \code{NULL}, detected automatically from input data -otherwise choose from -\code{"rubin"}, \code{"mutau"}, \code{"individual"}, \code{"quantiles"} -(see Details).} - -\item{pooling}{Type of pooling; -choose from \code{"none"}, \code{"partial"} (default) and \code{"full"}. -If you are not familiar with the terms, consult the vignette; -"partial" can be understood as random effects and "full" as fixed effects} - -\item{effect}{Label for effect. Will default to "mean" in most cases, "log OR" in logistic model, -quantiles in \code{quantiles} model etc. -These labels are used in various print and plot outputs. -Comparable models (e.g. in \link{baggr_compare}) should have same \code{effect}.} - -\item{covariates}{Character vector with column names in \code{data}. The corresponding columns are used as -covariates (fixed effects) in the meta-regression model (in case of aggregate data). -In the case of individual level data the model does not differentiate between group-level -variables (same values of the covariate for all rows related to a given group) and -individual-level covariates.} - -\item{prior_hypermean}{prior distribution for hypermean; you can use "plain text" notation like -\code{prior_hypermean=normal(0,100)} or \code{uniform(-10, 10)}. -See Details:Priors below for more possible specifications. -If unspecified, the priors will be derived automatically based on data -(and printed out in the console).} - -\item{prior_hypersd}{prior for hyper-standard deviation, used -by Rubin and \verb{"mutau"`` models; same rules apply as for }_hypermean`;} - -\item{prior_hypercor}{prior for hypercorrelation matrix, used by the \code{"mutau"} model} - -\item{prior_beta}{prior for regression coefficients if \code{covariates} are specified; will default to -experimental normal(0, 10^2) distribution} - -\item{prior}{alternative way to specify all priors as a named list with \code{hypermean}, -\code{hypersd}, \code{hypercor}, \code{beta}, analogous to \code{prior_} arguments above, -e.g. \code{prior = list(hypermean = normal(0,10), beta = uniform(-50, 50))}} - -\item{ppd}{logical; use prior predictive distribution? (\emph{p.p.d.}) Default is no. -If \code{ppd=TRUE}, Stan model will sample from the prior distributions -and ignore \code{data} in inference. However, \code{data} argument might still -be used to infer the correct model and to set the default priors.} - -\item{test_data}{data for cross-validation; NULL for no validation, otherwise a data frame -with the same columns as \code{data} argument} - -\item{quantiles}{if \code{model = "quantiles"}, a vector indicating which quantiles of data to use -(with values between 0 and 1)} - -\item{outcome}{character; column name in (individual-level) -\code{data} with outcome variable values} - -\item{group}{character; column name in \code{data} with grouping factor; -it's necessary for individual-level data, for summarised data -it will be used as labels for groups when displaying results} - -\item{treatment}{character; column name in (individual-level) \code{data} with treatment factor;} - -\item{warn}{print an additional warning if Rhat exceeds 1.05} - -\item{...}{extra options passed to Stan function, e.g. \code{control = list(adapt_delta = 0.99)}, -number of iterations etc.} -} -\value{ -\code{baggr} class structure: a list including Stan model fit -alongside input data, pooling metrics, various model properties. -If test data is used, mean value of -2*lpd is reported as \code{mean_lpd} -} -\description{ -Bayesian inference on parameters of an average treatment effects model -that's appropriate to the supplied -individual- or group-level data, using Hamiltonian Monte Carlo in Stan. -(For overall package help file see \link{baggr-package}) -} -\details{ -Running \code{baggr} requires 1/ data preparation, 2/ choice of model, 3/ choice of priors. -All three are discussed in depth in the package vignette (\code{vignette("baggr")}). - -\strong{Data.} For aggregate data models you need a data frame with columns -\code{tau} and \code{se} or \code{tau}, \code{mu}, \code{se.tau}, \code{se.mu}. -An additional column can be used to provide labels for each group -(by default column \code{group} is used if available, but this can be -customised -- see the example below). -For individual level data three columns are needed: outcome, treatment, group. These -are identified by using the \code{outcome}, \code{treatment} and \code{group} arguments. - -Both aggregate and individual-level data can include extra \code{covariates} columns -(specified as a character vector of column names) to be used in regression (for -individual data) or -\href{https://handbook-5-1.cochrane.org/chapter_9/9_6_4_meta_regression.htm}{meta-regression} -(for summary data). We also refer to impact of these covariates as \emph{fixed effects}. - -Many data preparation steps can be done through a helper function \link{prepare_ma}. -It can convert individual to summary-level data, calculate -odds/risk ratios (with/without corrections) in binary data, standardise variables and more. -Using it will automatically format data inputs to work with \code{baggr()}. - -\strong{Models.} Available models are: -\itemize{ -\item for the \strong{continuous variable} means: -\code{"rubin"} model for average treatment effect, \code{"mutau"} -version which takes into account means of control groups, \code{"full"}, -which works with individual-level data -\item for \strong{continuous variable quantiles}: `"quantiles"`` model -(see Meager, 2019 in references) -\item for \strong{binary data}: \code{"logit"} model can be used on individual-level data; -you can also analyse continuous statistics such as -log odds ratios and logs risk ratios using the models listed above; -see \code{vignette("baggr_binary")} for tutorial with examples -} - -If no model is specified, the function tries to infer the appropriate -model automatically. -Additionally, the user must specify type of pooling. -The default is always partial pooling. - -\strong{Priors.} It is optional to specify priors yourself, -as the package will try propose an appropriate -prior for the input data if you do not pass a \code{prior} argument. -To set the priors yourself, use \code{prior_} arguments. For specifying many priors at once -(or re-using between models), a single \code{prior = list(...)} argument can be used instead. -Appropriate examples are given in \code{vignette("baggr")}. - -\strong{Outputs.} Standard functions for analysing the \code{baggr} object are -\itemize{ -\item \link{treatment_effect} for distribution of hyperparameters -\item \link{group_effects} for distributions of group-specific parameters -\item \link{fixed_effects} for coefficients in (meta-)regression -\item \link{effect_draw} and \link{effect_plot} for posterior predictive distributions -\item \link{baggr_compare} for comparing multiple \code{baggr} models -\item \link{loocv} for cross-validation -} -} -\examples{ -df_pooled <- data.frame("tau" = c(1, -1, .5, -.5, .7, -.7, 1.3, -1.3), - "se" = rep(1, 8), - "state" = datasets::state.name[1:8]) -baggr(df_pooled) #baggr automatically detects the input data -# correct labels, different pooling & passing some options to Stan -baggr(df_pooled, group = "state", pooling = "full", iter = 500) -#change the priors: -baggr(df_pooled, prior_hypermean = normal(5,5)) - -# "mu & tau" model, using a built-in dataset -# prepare_ma() can summarise individual-level data -\donttest{ -microcredit_summary_data <- prepare_ma(microcredit_simplified, - outcome = "consumerdurables") -baggr(microcredit_summary_data, model = "mutau", - pooling = "partial", prior_hypercor = lkj(1), - prior_hypersd = normal(0,10), - prior_hypermean = multinormal(c(0,0),matrix(c(10,3,3,10),2,2))) -} - - -} -\author{ -Witold Wiecek, Rachael Meager -} diff --git a/man/convert_inputs.Rd b/man/convert_inputs.Rd deleted file mode 100644 index 7de3f1d6..00000000 --- a/man/convert_inputs.Rd +++ /dev/null @@ -1,64 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/convert_inputs.R -\name{convert_inputs} -\alias{convert_inputs} -\title{Convert inputs for baggr models} -\usage{ -convert_inputs( - data, - model, - quantiles, - group = "group", - outcome = "outcome", - treatment = "treatment", - covariates = c(), - test_data = NULL -) -} -\arguments{ -\item{data}{`data.frame`` with desired modelling input} - -\item{model}{valid model name used by baggr; -see \link{baggr} for allowed models -if \code{model = NULL}, this function will try to find appropriate model -automatically} - -\item{quantiles}{vector of quantiles to use (only applicable if \code{model = "quantiles"})} - -\item{group}{name of the column with grouping variable} - -\item{outcome}{name of column with outcome variable (designated as string)} - -\item{treatment}{name of column with treatment variable} - -\item{covariates}{Character vector with column names in \code{data}. -The corresponding columns are used as -covariates (fixed effects) in the meta-regression model.} - -\item{test_data}{same format as \code{data} argument, gets left aside for -testing purposes (see \link{baggr})} -} -\value{ -R structure that's appropriate for use by \link{baggr} Stan models; -\code{group_label}, \code{model} and \code{n_groups} are included as attributes -and are necessary for \link{baggr} to work correctly -} -\description{ -Converts data to Stan inputs, checks integrity of data -and suggests default model if needed. Typically all of this is -done automatically by \link{baggr}, \strong{this function is only for debugging} -or running models "by hand". -} -\details{ -Typically this function is only called within \link{baggr} and you do -not need to use it yourself. It can be useful to understand inputs -or to run models which you modified yourself. -} -\examples{ -# simple meta-analysis example, -# this is the formatted input for Stan models in baggr(): -convert_inputs(schools, "rubin") -} -\author{ -Witold Wiecek -} diff --git a/man/forest_plot.Rd b/man/forest_plot.Rd deleted file mode 100644 index 033012f0..00000000 --- a/man/forest_plot.Rd +++ /dev/null @@ -1,51 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/forest_plot.R -\name{forest_plot} -\alias{forest_plot} -\title{Draw a forest plot for a baggr model} -\usage{ -forest_plot( - bg, - show = c("inputs", "posterior", "both", "covariates"), - print = show, - prob = 0.95, - digits = 3, - ... -) -} -\arguments{ -\item{bg}{a \link{baggr} class object} - -\item{show}{if \code{"inputs"}, then plotted points and lines correspond to raw inputs for each group; -if \code{"posterior"} -- to posterior distribution; -you can also plot \code{"both"} inputs and posteriors; -if \code{"covariates"}, then fixed effect coefficients are plotted} - -\item{print}{which values to print next to the plot: values of \code{"inputs"} -or \code{"posterior"} means? -(if \code{show="covariates"}, it must be \code{"posterior"})} - -\item{prob}{width of the intervals (lines) for the plot} - -\item{digits}{number of digits to display when printing out mean and SD -in the plot} - -\item{...}{other arguments passed to \link{forestplot}} -} -\description{ -The forest plot functionality in \emph{baggr} is a simple interface for -calling the \link{forestplot} function. By default the forest plot -displays raw (unpooled) estimates for groups and the treatment effect -estimate underneath. This behaviour can be modified to display pooled -group estimates. -} -\examples{ -bg <- baggr(schools, iter = 500) -forest_plot(bg) -forest_plot(bg, show = "posterior", print = "inputs", digits = 2) - -} -\seealso{ -\link{forestplot} function and its associated vignette for examples; -\link{effect_plot} and \link{baggr_plot} for non-forest plots of baggr results -} diff --git a/vignettes/baggr_model_convergence.R b/vignettes/baggr_model_convergence.R new file mode 100644 index 00000000..96d598a3 --- /dev/null +++ b/vignettes/baggr_model_convergence.R @@ -0,0 +1,26 @@ +## ------------------------------------------------------------------------ +# the ... parameters in the call to baggr() are +# passed to rstan::sampling +# for documentation see ?rstan::sampling +library(baggr) + +fit <- baggr(schools, model = "rubin", + pooling = "partial", + prior_hypermean = normal(0, 10), + prior_hypersd = cauchy(0, 3), + control = list( + adapt_delta = 0.9 # 0.8 by default + )) + +fit + +## ------------------------------------------------------------------------ +# runs for 10,000 iterations per chain instead of 2,000 +fit <- baggr(schools, model = "rubin", pooling = "partial", + prior_hypermean = normal(0,1), prior_hypersd = cauchy(0,2), + iter = 10000, control = list( + adapt_delta = 0.95 # like above, to address divergences + )) + +fit + diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd index a904c581..88730969 100644 --- a/vignettes/baggr_model_convergence.Rmd +++ b/vignettes/baggr_model_convergence.Rmd @@ -1,14 +1,14 @@ --- -title: "Understanding Model Convergence" + title: "Understanding Model Convergence" author: "Brice Green" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > - %\VignetteIndexEntry{baggr} -%\VignetteEngine{knitr::rmarkdown} -%\VignetteEncoding{UTF-8} + %\VignetteIndexEntry{baggr_model_convergence} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} bibliography: baggr.bib --- @@ -27,11 +27,11 @@ This vignette will walk through the basic intuition for what those mean, whether [Divergent Transitions in the Stan Manual](https://mc-stan.org/docs/2_21/reference-manual/divergent-transitions.html) -These go more in depth than this discussion is going to, but if you find yourself needing a deeper discussion then these would be good starting points. +These go more in depth than this discussion is going to, but if you find yourself needing a stronger treatment of the topics then these would be good starting points. # Behind the scenes: basics of MCMC -The baggr package uses fully Bayesian methods to estimate meta-analysis models, meaning that it takes a prior (which if not specified is assigned by default), and jointly with the data uses that prior to estimate a posterior according to Bayes' rule +The baggr package uses fully Bayesian methods to estimate meta-analysis models, meaning that it takes a prior (which if not specified by the user is assigned by default), and jointly with the data uses that prior to estimate a posterior according to Bayes' rule $$P(\theta | X) = \dfrac{P(X | \theta) P(\theta)}{P(X)}$$ @@ -48,7 +48,7 @@ So instead of plain-old Metropolis-Hastings, we use Hamiltonian Monte Carlo with # Warnings like "divergent transitions" -The gradient is passed into a Hamiltonian equation, which simulates a path for where it thinks the posterior is headed. But this trajectory is based on the local behavior of the posterior, and depending on the geometry, if we move too far away from that point we might get more and more off base. Stan, on the back end, uses the `adapt_delta` parameter to determine how often it needs to check its simulated paths, re-evaluating the trajectory it is using. +The gradient is passed into a Hamiltonian equation, which simulates a path for where it thinks the posterior is headed. But this trajectory is based on the local behavior of the posterior, and depending on the geometry, if we move too far away from that point we might get more and more off base. Stan, on the back end, uses the `adapt_delta` parameter to determine how often it needs to check its simulated paths, re-evaluating the trajectory it is using. If you make `adapt_delta` really big, you end up moving really slowly, but you also are less likely to fly off the true trajectory of the posterior distribution. But if you make it small, you go faster, etc. You get the idea. How high you should set the parameter depends on the extent to which the algorithm is properly modeling the paths. So if you encounter divergent transitions, it is often good to raise `adapt_delta`. This discussion is really oversimplifying things to build the intuition, but you can find an (accessible) deeper discussion [here](https://arxiv.org/abs/1701.02434) if it is of interest! @@ -61,18 +61,37 @@ If you get this warning and want to re-estimate your model with a higher `adapt_ # the ... parameters in the call to baggr() are # passed to rstan::sampling # for documentation see ?rstan::sampling - -baggr(schools, model = "rubin", +fit <- baggr(schools, model = "rubin", pooling = "partial", + refresh = 0, # silence printing MCMC draws prior_hypermean = normal(0, 10), prior_hypersd = cauchy(0, 3), control = list( adapt_delta = 0.9 # 0.8 by default )) + +fit ``` If you still get a lot of divergent transitions after raising the `adapt_delta` really high, it may indicate an issue with your model! This is what is commonly referred to as the [folk theorem of statistical computing](https://statmodeling.stat.columbia.edu/2008/05/13/the_folk_theore/). -# R hat being too low +# $\widehat{R}$ being too low + +The other warning you get, that “r hat” statistics are too high, indicates that the effective sample size (ESS) of your draws are too low and your chains have not converged. Because MCMC is not a deterministic algorithm, we run multiple chains of samples in order to check that each has converged to a good enough answer. + +We want to check that each chain is “mixing well,” meaning that we have enough effective draws after accounting for the correlation between samples. We also want to check that markov chains initialized at different values get to similar places. ESS gives us a measure of the first, and $\widehat{R}$ gives a measure of the second. + +If you have a low ESS or high $\widehat{R}$, you are best off running each chain for more iterations until you get a higher ESS/lower $\widehat{R}$. To run chains for longer, you change the iter parameter in the `...` arguments to your `baggr()` call. -The other warning you get, that r_hat statistics are too low, indicates that the +```{r} +# runs for 10,000 iterations per chain instead of 2,000 +fit <- baggr(schools, model = "rubin", pooling = "partial", + prior_hypermean = normal(0,1), prior_hypersd = cauchy(0,2), + refresh = 0, # don't print sampling + iter = 10000, + control = list( + adapt_delta = 0.95 # like above, to address divergences + )) + +fit +``` diff --git a/vignettes/baggr_model_convergence.html b/vignettes/baggr_model_convergence.html new file mode 100644 index 00000000..15c36c20 --- /dev/null +++ b/vignettes/baggr_model_convergence.html @@ -0,0 +1,614 @@ + + + + + + + + + + + + + + + + + +Understanding Model Convergence + + + + + + + + + + + + + + + + + + + + + +

Understanding Model Convergence

+

Brice Green

+

2020-01-25

+ + + +

The baggr package makes running Bayesian meta-analysis models very easy, but that ease can also obscure some things that go on behind the scenes. Especially with low-data settings, say with only a few measurements, you may get cryptic error messages when you run your models, like

+
Warning: There were 110 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See
+http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+

This vignette will walk through the basic intuition for what those mean, whether to be concerned about them, and how to tune the parameters underneath the hood. There are already a number of good resources that go more in depth on the topic so first I’ll point you there:

+

Visual MCMC Diagnostics Using the Bayesplot Package

+

Scale Reduction in the Stan Manual

+

Divergent Transitions in the Stan Manual

+

These go more in depth than this discussion is going to, but if you find yourself needing a deeper discussion then these would be good starting points.

+
+

Behind the scenes: basics of MCMC

+

The baggr package uses fully Bayesian methods to estimate meta-analysis models, meaning that it takes a prior (which if not specified is assigned by default), and jointly with the data uses that prior to estimate a posterior according to Bayes’ rule

+

\[P(\theta | X) = \dfrac{P(X | \theta) P(\theta)}{P(X)}\]

+

In order to calculate \(P(X)\), we need some computationally expensive machinery that can calculate

+

\[P(X) = \int P(X | \theta) d\theta \]

+

which is (in general) really difficult except for a few specific situations. In order to make these types of models easy to run, we use a method that approximates this integral to a high degree of numerical accuracy.

+

MCMC methods leverage an accept/reject proposal scheme by generating a potential value for \(\theta\), which is some set of parmeters, and using the relative likelihoods (\(P(\theta|X)\)) at that new value to determine the probability of whether to accept or reject those proposals. The simplest of these methods, Metropolis-Hastings, pretty much ends there. But for meta-analysis models (and other models with of high dimension), the spaces that the random walk has to explore get really complicated.

+

So instead of plain-old Metropolis-Hastings, we use Hamiltonian Monte Carlo with No-U-Turn sampling, powered on the back end by Stan. This algorithm leverages the gradient, or the rate of change in the posterior density at a given parameter value, in order to more efficiently generate new parameter proposals.

+
+
+

Warnings like “divergent transitions”

+

The gradient is passed into a Hamiltonian equation, which simulates a path for where it thinks the posterior is headed. But this trajectory is based on the local behavior of the posterior, and depending on the geometry, if we move too far away from that point we might get more and more off base. Stan, on the back end, uses the adapt_delta parameter to determine how often it needs to check its simulated paths, re-evaluating the trajectory it is using.

+

If you make adapt_delta really big, you end up moving really slowly, but you also are less likely to fly off the true trajectory of the posterior distribution. But if you make it small, you go faster, etc. You get the idea. How high you should set the parameter depends on the extent to which the algorithm is properly modeling the paths. So if you encounter divergent transitions, it is often good to raise adapt_delta. This discussion is really oversimplifying things to build the intuition, but you can find an (accessible) deeper discussion here if it is of interest!

+

When you get a warning that there are a certain number of “divergent transitions,” it means that the samples have gone far away from the anticipated path, and (at worse) are reduced to a random walk. This means that they may not be fully exploring the posterior distribution, and so you may need more iterations to get a good approximation of the distribution.

+

If you get this warning and want to re-estimate your model with a higher adapt_delta setting, you can change the argument in the call to the baggr function like so:

+ +
## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 1).
+## Chain 1: 
+## Chain 1: Gradient evaluation took 0 seconds
+## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 1: Adjust your expectations accordingly!
+## Chain 1: 
+## Chain 1: 
+## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
+## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
+## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
+## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
+## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
+## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
+## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
+## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
+## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
+## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
+## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
+## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
+## Chain 1: 
+## Chain 1:  Elapsed Time: 0.098 seconds (Warm-up)
+## Chain 1:                0.063 seconds (Sampling)
+## Chain 1:                0.161 seconds (Total)
+## Chain 1: 
+## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 2).
+## Chain 2: 
+## Chain 2: Gradient evaluation took 0 seconds
+## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 2: Adjust your expectations accordingly!
+## Chain 2: 
+## Chain 2: 
+## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
+## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
+## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
+## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
+## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
+## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
+## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
+## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
+## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
+## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
+## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
+## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
+## Chain 2: 
+## Chain 2:  Elapsed Time: 0.132 seconds (Warm-up)
+## Chain 2:                0.062 seconds (Sampling)
+## Chain 2:                0.194 seconds (Total)
+## Chain 2: 
+## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 3).
+## Chain 3: 
+## Chain 3: Gradient evaluation took 0 seconds
+## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 3: Adjust your expectations accordingly!
+## Chain 3: 
+## Chain 3: 
+## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
+## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
+## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
+## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
+## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
+## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
+## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
+## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
+## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
+## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
+## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
+## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
+## Chain 3: 
+## Chain 3:  Elapsed Time: 0.064 seconds (Warm-up)
+## Chain 3:                0.075 seconds (Sampling)
+## Chain 3:                0.139 seconds (Total)
+## Chain 3: 
+## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 4).
+## Chain 4: 
+## Chain 4: Gradient evaluation took 0 seconds
+## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 4: Adjust your expectations accordingly!
+## Chain 4: 
+## Chain 4: 
+## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
+## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
+## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
+## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
+## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
+## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
+## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
+## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
+## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
+## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
+## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
+## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
+## Chain 4: 
+## Chain 4:  Elapsed Time: 0.078 seconds (Warm-up)
+## Chain 4:                0.079 seconds (Sampling)
+## Chain 4:                0.157 seconds (Total)
+## Chain 4:
+ +
## Model type: Rubin model with aggregate data 
+## Pooling of effects: partial 
+## 
+## Aggregate treatment effect (on mean):
+## Hypermean (tau) =  6.5 with 95% interval -1.4 to 14.7 
+## Hyper-SD (sigma_tau) = 2.838 with 95% interval 0.094 to 9.908 
+## 
+## Treatment effects on mean:
+##          mean  sd pooling
+## School A  7.6 5.4    0.95
+## School B  6.6 4.9    0.90
+## School C  6.1 5.3    0.95
+## School D  6.6 4.8    0.91
+## School E  5.7 4.9    0.89
+## School F  6.0 5.0    0.91
+## School G  7.7 5.2    0.90
+## School H  6.8 5.3    0.96
+

If you still get a lot of divergent transitions after raising the adapt_delta really high, it may indicate an issue with your model! This is what is commonly referred to as the folk theorem of statistical computing.

+
+
+

\(\widehat{R}\) being too low

+

The other warning you get, that “r hat” statistics are too high, indicates that the effective sample size (ESS) of your draws are too low and your chains have not converged. Because MCMC is not a deterministic algorithm, we run multiple chains of samples in order to check that each has converged to a good enough answer.

+

We want to check that each chain is “mixing well,” meaning that we have enough effective draws after accounting for the correlation between subsequent samples. We also want to check that markov chains initialized at different values get to similar places. Effective sample size gives us a measure of the first, and \(\widehat{R}\) gives a measure of the second.

+

If you have a low ESS or high \(\widehat{R}\), you are best off running each chain for more iterations until you get a higher ESS/lower \(\widehat{R}\). To run chains for longer, you change the iter parameter in the ... arguments to your baggr() call.

+ +
## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 1).
+## Chain 1: 
+## Chain 1: Gradient evaluation took 0 seconds
+## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 1: Adjust your expectations accordingly!
+## Chain 1: 
+## Chain 1: 
+## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
+## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
+## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
+## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
+## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
+## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
+## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
+## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
+## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
+## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
+## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
+## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
+## Chain 1: 
+## Chain 1:  Elapsed Time: 0.437 seconds (Warm-up)
+## Chain 1:                0.4 seconds (Sampling)
+## Chain 1:                0.837 seconds (Total)
+## Chain 1: 
+## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 2).
+## Chain 2: 
+## Chain 2: Gradient evaluation took 0 seconds
+## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 2: Adjust your expectations accordingly!
+## Chain 2: 
+## Chain 2: 
+## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
+## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
+## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
+## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
+## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
+## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
+## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
+## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
+## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
+## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
+## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
+## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
+## Chain 2: 
+## Chain 2:  Elapsed Time: 0.425 seconds (Warm-up)
+## Chain 2:                0.348 seconds (Sampling)
+## Chain 2:                0.773 seconds (Total)
+## Chain 2: 
+## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 3).
+## Chain 3: 
+## Chain 3: Gradient evaluation took 0 seconds
+## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 3: Adjust your expectations accordingly!
+## Chain 3: 
+## Chain 3: 
+## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
+## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
+## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
+## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
+## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
+## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
+## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
+## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
+## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
+## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
+## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
+## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
+## Chain 3: 
+## Chain 3:  Elapsed Time: 0.383 seconds (Warm-up)
+## Chain 3:                0.355 seconds (Sampling)
+## Chain 3:                0.738 seconds (Total)
+## Chain 3: 
+## 
+## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 4).
+## Chain 4: 
+## Chain 4: Gradient evaluation took 0 seconds
+## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
+## Chain 4: Adjust your expectations accordingly!
+## Chain 4: 
+## Chain 4: 
+## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
+## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
+## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
+## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
+## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
+## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
+## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
+## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
+## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
+## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
+## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
+## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
+## Chain 4: 
+## Chain 4:  Elapsed Time: 0.391 seconds (Warm-up)
+## Chain 4:                0.377 seconds (Sampling)
+## Chain 4:                0.768 seconds (Total)
+## Chain 4:
+ +
## Model type: Rubin model with aggregate data 
+## Pooling of effects: partial 
+## 
+## Aggregate treatment effect (on mean):
+## Hypermean (tau) =  0.4 with 95% interval -1.5 to 2.3 
+## Hyper-SD (sigma_tau) = 2.456 with 95% interval 0.063 to 9.824 
+## 
+## Treatment effects on mean:
+##          mean  sd pooling
+## School A 1.56 3.9    0.96
+## School B 1.02 3.1    0.92
+## School C 0.27 3.3    0.96
+## School D 0.83 3.1    0.93
+## School E 0.26 2.9    0.91
+## School F 0.47 3.1    0.93
+## School G 1.79 3.7    0.92
+## School H 0.78 3.5    0.97
+
+ + + + + + + + + + + From 6018a6276290d611424e55c100815dad057baa36 Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 10:59:54 -0600 Subject: [PATCH 03/12] remove html that I accidentally included --- vignettes/baggr_model_convergence.html | 614 ------------------------- 1 file changed, 614 deletions(-) delete mode 100644 vignettes/baggr_model_convergence.html diff --git a/vignettes/baggr_model_convergence.html b/vignettes/baggr_model_convergence.html deleted file mode 100644 index 15c36c20..00000000 --- a/vignettes/baggr_model_convergence.html +++ /dev/null @@ -1,614 +0,0 @@ - - - - - - - - - - - - - - - - - -Understanding Model Convergence - - - - - - - - - - - - - - - - - - - - - -

Understanding Model Convergence

-

Brice Green

-

2020-01-25

- - - -

The baggr package makes running Bayesian meta-analysis models very easy, but that ease can also obscure some things that go on behind the scenes. Especially with low-data settings, say with only a few measurements, you may get cryptic error messages when you run your models, like

-
Warning: There were 110 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See
-http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
-

This vignette will walk through the basic intuition for what those mean, whether to be concerned about them, and how to tune the parameters underneath the hood. There are already a number of good resources that go more in depth on the topic so first I’ll point you there:

-

Visual MCMC Diagnostics Using the Bayesplot Package

-

Scale Reduction in the Stan Manual

-

Divergent Transitions in the Stan Manual

-

These go more in depth than this discussion is going to, but if you find yourself needing a deeper discussion then these would be good starting points.

-
-

Behind the scenes: basics of MCMC

-

The baggr package uses fully Bayesian methods to estimate meta-analysis models, meaning that it takes a prior (which if not specified is assigned by default), and jointly with the data uses that prior to estimate a posterior according to Bayes’ rule

-

\[P(\theta | X) = \dfrac{P(X | \theta) P(\theta)}{P(X)}\]

-

In order to calculate \(P(X)\), we need some computationally expensive machinery that can calculate

-

\[P(X) = \int P(X | \theta) d\theta \]

-

which is (in general) really difficult except for a few specific situations. In order to make these types of models easy to run, we use a method that approximates this integral to a high degree of numerical accuracy.

-

MCMC methods leverage an accept/reject proposal scheme by generating a potential value for \(\theta\), which is some set of parmeters, and using the relative likelihoods (\(P(\theta|X)\)) at that new value to determine the probability of whether to accept or reject those proposals. The simplest of these methods, Metropolis-Hastings, pretty much ends there. But for meta-analysis models (and other models with of high dimension), the spaces that the random walk has to explore get really complicated.

-

So instead of plain-old Metropolis-Hastings, we use Hamiltonian Monte Carlo with No-U-Turn sampling, powered on the back end by Stan. This algorithm leverages the gradient, or the rate of change in the posterior density at a given parameter value, in order to more efficiently generate new parameter proposals.

-
-
-

Warnings like “divergent transitions”

-

The gradient is passed into a Hamiltonian equation, which simulates a path for where it thinks the posterior is headed. But this trajectory is based on the local behavior of the posterior, and depending on the geometry, if we move too far away from that point we might get more and more off base. Stan, on the back end, uses the adapt_delta parameter to determine how often it needs to check its simulated paths, re-evaluating the trajectory it is using.

-

If you make adapt_delta really big, you end up moving really slowly, but you also are less likely to fly off the true trajectory of the posterior distribution. But if you make it small, you go faster, etc. You get the idea. How high you should set the parameter depends on the extent to which the algorithm is properly modeling the paths. So if you encounter divergent transitions, it is often good to raise adapt_delta. This discussion is really oversimplifying things to build the intuition, but you can find an (accessible) deeper discussion here if it is of interest!

-

When you get a warning that there are a certain number of “divergent transitions,” it means that the samples have gone far away from the anticipated path, and (at worse) are reduced to a random walk. This means that they may not be fully exploring the posterior distribution, and so you may need more iterations to get a good approximation of the distribution.

-

If you get this warning and want to re-estimate your model with a higher adapt_delta setting, you can change the argument in the call to the baggr function like so:

- -
## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 1).
-## Chain 1: 
-## Chain 1: Gradient evaluation took 0 seconds
-## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 1: Adjust your expectations accordingly!
-## Chain 1: 
-## Chain 1: 
-## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
-## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
-## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
-## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
-## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
-## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
-## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
-## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
-## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
-## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
-## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
-## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
-## Chain 1: 
-## Chain 1:  Elapsed Time: 0.098 seconds (Warm-up)
-## Chain 1:                0.063 seconds (Sampling)
-## Chain 1:                0.161 seconds (Total)
-## Chain 1: 
-## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 2).
-## Chain 2: 
-## Chain 2: Gradient evaluation took 0 seconds
-## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 2: Adjust your expectations accordingly!
-## Chain 2: 
-## Chain 2: 
-## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
-## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
-## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
-## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
-## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
-## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
-## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
-## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
-## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
-## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
-## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
-## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
-## Chain 2: 
-## Chain 2:  Elapsed Time: 0.132 seconds (Warm-up)
-## Chain 2:                0.062 seconds (Sampling)
-## Chain 2:                0.194 seconds (Total)
-## Chain 2: 
-## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 3).
-## Chain 3: 
-## Chain 3: Gradient evaluation took 0 seconds
-## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 3: Adjust your expectations accordingly!
-## Chain 3: 
-## Chain 3: 
-## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
-## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
-## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
-## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
-## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
-## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
-## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
-## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
-## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
-## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
-## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
-## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
-## Chain 3: 
-## Chain 3:  Elapsed Time: 0.064 seconds (Warm-up)
-## Chain 3:                0.075 seconds (Sampling)
-## Chain 3:                0.139 seconds (Total)
-## Chain 3: 
-## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 4).
-## Chain 4: 
-## Chain 4: Gradient evaluation took 0 seconds
-## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 4: Adjust your expectations accordingly!
-## Chain 4: 
-## Chain 4: 
-## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
-## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
-## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
-## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
-## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
-## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
-## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
-## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
-## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
-## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
-## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
-## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
-## Chain 4: 
-## Chain 4:  Elapsed Time: 0.078 seconds (Warm-up)
-## Chain 4:                0.079 seconds (Sampling)
-## Chain 4:                0.157 seconds (Total)
-## Chain 4:
- -
## Model type: Rubin model with aggregate data 
-## Pooling of effects: partial 
-## 
-## Aggregate treatment effect (on mean):
-## Hypermean (tau) =  6.5 with 95% interval -1.4 to 14.7 
-## Hyper-SD (sigma_tau) = 2.838 with 95% interval 0.094 to 9.908 
-## 
-## Treatment effects on mean:
-##          mean  sd pooling
-## School A  7.6 5.4    0.95
-## School B  6.6 4.9    0.90
-## School C  6.1 5.3    0.95
-## School D  6.6 4.8    0.91
-## School E  5.7 4.9    0.89
-## School F  6.0 5.0    0.91
-## School G  7.7 5.2    0.90
-## School H  6.8 5.3    0.96
-

If you still get a lot of divergent transitions after raising the adapt_delta really high, it may indicate an issue with your model! This is what is commonly referred to as the folk theorem of statistical computing.

-
-
-

\(\widehat{R}\) being too low

-

The other warning you get, that “r hat” statistics are too high, indicates that the effective sample size (ESS) of your draws are too low and your chains have not converged. Because MCMC is not a deterministic algorithm, we run multiple chains of samples in order to check that each has converged to a good enough answer.

-

We want to check that each chain is “mixing well,” meaning that we have enough effective draws after accounting for the correlation between subsequent samples. We also want to check that markov chains initialized at different values get to similar places. Effective sample size gives us a measure of the first, and \(\widehat{R}\) gives a measure of the second.

-

If you have a low ESS or high \(\widehat{R}\), you are best off running each chain for more iterations until you get a higher ESS/lower \(\widehat{R}\). To run chains for longer, you change the iter parameter in the ... arguments to your baggr() call.

- -
## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 1).
-## Chain 1: 
-## Chain 1: Gradient evaluation took 0 seconds
-## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 1: Adjust your expectations accordingly!
-## Chain 1: 
-## Chain 1: 
-## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
-## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
-## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
-## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
-## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
-## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
-## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
-## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
-## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
-## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
-## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
-## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
-## Chain 1: 
-## Chain 1:  Elapsed Time: 0.437 seconds (Warm-up)
-## Chain 1:                0.4 seconds (Sampling)
-## Chain 1:                0.837 seconds (Total)
-## Chain 1: 
-## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 2).
-## Chain 2: 
-## Chain 2: Gradient evaluation took 0 seconds
-## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 2: Adjust your expectations accordingly!
-## Chain 2: 
-## Chain 2: 
-## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
-## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
-## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
-## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
-## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
-## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
-## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
-## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
-## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
-## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
-## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
-## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
-## Chain 2: 
-## Chain 2:  Elapsed Time: 0.425 seconds (Warm-up)
-## Chain 2:                0.348 seconds (Sampling)
-## Chain 2:                0.773 seconds (Total)
-## Chain 2: 
-## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 3).
-## Chain 3: 
-## Chain 3: Gradient evaluation took 0 seconds
-## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 3: Adjust your expectations accordingly!
-## Chain 3: 
-## Chain 3: 
-## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
-## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
-## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
-## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
-## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
-## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
-## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
-## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
-## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
-## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
-## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
-## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
-## Chain 3: 
-## Chain 3:  Elapsed Time: 0.383 seconds (Warm-up)
-## Chain 3:                0.355 seconds (Sampling)
-## Chain 3:                0.738 seconds (Total)
-## Chain 3: 
-## 
-## SAMPLING FOR MODEL 'rubin' NOW (CHAIN 4).
-## Chain 4: 
-## Chain 4: Gradient evaluation took 0 seconds
-## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
-## Chain 4: Adjust your expectations accordingly!
-## Chain 4: 
-## Chain 4: 
-## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
-## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
-## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
-## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
-## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
-## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
-## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
-## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
-## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
-## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
-## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
-## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
-## Chain 4: 
-## Chain 4:  Elapsed Time: 0.391 seconds (Warm-up)
-## Chain 4:                0.377 seconds (Sampling)
-## Chain 4:                0.768 seconds (Total)
-## Chain 4:
- -
## Model type: Rubin model with aggregate data 
-## Pooling of effects: partial 
-## 
-## Aggregate treatment effect (on mean):
-## Hypermean (tau) =  0.4 with 95% interval -1.5 to 2.3 
-## Hyper-SD (sigma_tau) = 2.456 with 95% interval 0.063 to 9.824 
-## 
-## Treatment effects on mean:
-##          mean  sd pooling
-## School A 1.56 3.9    0.96
-## School B 1.02 3.1    0.92
-## School C 0.27 3.3    0.96
-## School D 0.83 3.1    0.93
-## School E 0.26 2.9    0.91
-## School F 0.47 3.1    0.93
-## School G 1.79 3.7    0.92
-## School H 0.78 3.5    0.97
-
- - - - - - - - - - - From facd403f164e2a04d1807f1396c42f9ea3c44d82 Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 11:00:33 -0600 Subject: [PATCH 04/12] code formatting typo --- vignettes/baggr_model_convergence.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd index 88730969..47edac3f 100644 --- a/vignettes/baggr_model_convergence.Rmd +++ b/vignettes/baggr_model_convergence.Rmd @@ -81,7 +81,7 @@ The other warning you get, that “r hat” statistics are too high, indicates t We want to check that each chain is “mixing well,” meaning that we have enough effective draws after accounting for the correlation between samples. We also want to check that markov chains initialized at different values get to similar places. ESS gives us a measure of the first, and $\widehat{R}$ gives a measure of the second. -If you have a low ESS or high $\widehat{R}$, you are best off running each chain for more iterations until you get a higher ESS/lower $\widehat{R}$. To run chains for longer, you change the iter parameter in the `...` arguments to your `baggr()` call. +If you have a low ESS or high $\widehat{R}$, you are best off running each chain for more iterations until you get a higher ESS/lower $\widehat{R}$. To run chains for longer, you change the `iter` parameter in the `...` arguments to your `baggr()` call. ```{r} # runs for 10,000 iterations per chain instead of 2,000 From ca7de8dd82e41d779f509b1c0960075e8b7bcff6 Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 11:15:55 -0600 Subject: [PATCH 05/12] max_treedepth warning errors --- vignettes/baggr_model_convergence.Rmd | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd index 47edac3f..74247a02 100644 --- a/vignettes/baggr_model_convergence.Rmd +++ b/vignettes/baggr_model_convergence.Rmd @@ -95,3 +95,25 @@ fit <- baggr(schools, model = "rubin", pooling = "partial", fit ``` + +# Hitting maximum treedepth + +Sometimes you get warnings about hitting "maximum treedepth." This is less of an issue than the previous two warnings, which fundamentally throw the resulting infrence into question. Stan uses a binary tree to determine how many "leapfrog steps" to take at each iteration of the sampler. This has to do with the way in which the sampler approximates movement through the Hamiltonian system used to model the posterior. Basically, you need to figure out how far away you want to jump from your current sample draw (this is how we end up getting more efficient exploration of the space). In order to do that, Stan evaluates a "tree," which is basically a proposal equally spaced on either side of the current draw. Then it does this again at each of the two points that are drawn, and so on, until it meets a certain criteria for accepting the new value. The `max_treedepth` parameter controls how deeply the algorithm should search (how many sub-trees to create) before terminating. + +Hitting the maximum treedepth may be indicative of a deeper issue with the model, but if you are confident in your model and your data, you should increase the `max_treedepth` parameter that is passed to the `control` list that `rstan::sampling` uses. + +```{r} + +# runs for 10,000 iterations per chain instead of 2,000 +fit <- baggr(schools, model = "rubin", pooling = "partial", + prior_hypermean = normal(0,1), prior_hypersd = cauchy(0,2), + refresh = 0, # don't print sampling + iter = 10000, + control = list( + adapt_delta = 0.95, # like above, to address divergences + max_treedepth = 20 # manually setting maximum tree depth + )) + +fit +``` + From a2e605dcf876cd73007f44d763ff6832b5f8dd61 Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 11:16:47 -0600 Subject: [PATCH 06/12] reference to rstan::stan because that's where control arguments are documented --- R/baggr.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R/baggr.R b/R/baggr.R index 29513fd3..c2f03d0e 100644 --- a/R/baggr.R +++ b/R/baggr.R @@ -57,7 +57,7 @@ #' with the same columns as `data` argument #' @param warn print an additional warning if Rhat exceeds 1.05 #' @param ... extra options passed to Stan function, e.g. \code{control = list(adapt_delta = 0.99)}, -#' number of iterations etc. For more details see [rstan::sampling()] +#' number of iterations etc. For more details see [rstan::sampling()] and [rstan::stan()]. #' @return `baggr` class structure: a list including Stan model fit #' alongside input data, pooling metrics, various model properties. #' If test data is used, mean value of -2*lpd is reported as `mean_lpd` From 96230653f82ff743d4de6afea850a35fa7f423ba Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 11:19:05 -0600 Subject: [PATCH 07/12] warning for high r_hats that recommends more iterations --- R/baggr.R | 1 + 1 file changed, 1 insertion(+) diff --git a/R/baggr.R b/R/baggr.R index c2f03d0e..0b5b31fa 100644 --- a/R/baggr.R +++ b/R/baggr.R @@ -300,6 +300,7 @@ baggr <- function(data, model = NULL, pooling = "partial", warning(paste0("Rhat statistic for ", sum(rhat > 1.05), " parameters exceeded 1.05, with maximum equal to ", round(max(rhat),2), ". This suggests lack of convergence.", + "\n Consider running the model chains for more iterations.", "\n No further warning will be issued.", "\n Stan model saved as $fit in the returned object. \n")) From 4d495abd066a9d9739bb9e6f7f60ae091d77719d Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 11:21:44 -0600 Subject: [PATCH 08/12] remove file that wasn't properly cleaned up from vignette --- vignettes/baggr_model_convergence.R | 26 -------------------------- 1 file changed, 26 deletions(-) delete mode 100644 vignettes/baggr_model_convergence.R diff --git a/vignettes/baggr_model_convergence.R b/vignettes/baggr_model_convergence.R deleted file mode 100644 index 96d598a3..00000000 --- a/vignettes/baggr_model_convergence.R +++ /dev/null @@ -1,26 +0,0 @@ -## ------------------------------------------------------------------------ -# the ... parameters in the call to baggr() are -# passed to rstan::sampling -# for documentation see ?rstan::sampling -library(baggr) - -fit <- baggr(schools, model = "rubin", - pooling = "partial", - prior_hypermean = normal(0, 10), - prior_hypersd = cauchy(0, 3), - control = list( - adapt_delta = 0.9 # 0.8 by default - )) - -fit - -## ------------------------------------------------------------------------ -# runs for 10,000 iterations per chain instead of 2,000 -fit <- baggr(schools, model = "rubin", pooling = "partial", - prior_hypermean = normal(0,1), prior_hypersd = cauchy(0,2), - iter = 10000, control = list( - adapt_delta = 0.95 # like above, to address divergences - )) - -fit - From 41169e69618cc7473b955fc4f187587a6b44e324 Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 11:25:30 -0600 Subject: [PATCH 09/12] silly Bayes' rule mistake on the integral --- vignettes/baggr_model_convergence.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd index 74247a02..a0c56c54 100644 --- a/vignettes/baggr_model_convergence.Rmd +++ b/vignettes/baggr_model_convergence.Rmd @@ -38,7 +38,7 @@ $$P(\theta | X) = \dfrac{P(X | \theta) P(\theta)}{P(X)}$$ In order to calculate $P(X)$, we need some computationally expensive machinery that can calculate -$$P(X) = \int P(X | \theta) d\theta $$ +$$P(X) = \int P(X | \theta) P(\theta) d\theta$$ which is (in general) really difficult except for a few specific situations. In order to make these types of models easy to run, we use a method that approximates this integral to a high degree of numerical accuracy. From 18298e198ab55c99aefd6243b16b855d2577bb03 Mon Sep 17 00:00:00 2001 From: be-green Date: Sat, 25 Jan 2020 11:35:29 -0600 Subject: [PATCH 10/12] fix weird tab where there wasn't one before --- vignettes/baggr_model_convergence.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd index a0c56c54..64e69cc3 100644 --- a/vignettes/baggr_model_convergence.Rmd +++ b/vignettes/baggr_model_convergence.Rmd @@ -1,5 +1,5 @@ --- - title: "Understanding Model Convergence" +title: "Understanding Model Convergence" author: "Brice Green" date: "`r Sys.Date()`" output: From 04d18b60c21115d0acf961f2371cf499f320f458 Mon Sep 17 00:00:00 2001 From: be-green Date: Fri, 14 Feb 2020 13:42:41 -0600 Subject: [PATCH 11/12] update convergence vignette --- man/baggr-package.Rd | 31 --------- man/baggr_compare.Rd | 78 ---------------------- man/baggr_plot.Rd | 62 ----------------- man/baggr_theme_set.Rd | 74 --------------------- man/effect_draw.Rd | 26 -------- man/effect_plot.Rd | 44 ------------- man/group_effects.Rd | 40 ----------- man/is.baggr_cv.Rd | 14 ---- man/loo_compare.Rd | 25 ------- man/loocv.Rd | 62 ----------------- man/microcredit.Rd | 36 ---------- man/microcredit_simplified.Rd | 35 ---------- man/mint.Rd | 26 -------- man/plot.baggr.Rd | 23 ------- man/pooling.Rd | 89 ------------------------- man/prepare_ma.Rd | 84 ----------------------- man/prepare_prior.Rd | 43 ------------ man/print.baggr.Rd | 28 -------- man/print.baggr_cv.Rd | 18 ----- man/print.compare_baggr_cv.Rd | 18 ----- man/priors.Rd | 95 --------------------------- man/schools.Rd | 23 ------- man/show_model.Rd | 19 ------ man/treatment_effect.Rd | 28 -------- vignettes/baggr_model_convergence.Rmd | 58 +++++++++++----- 25 files changed, 40 insertions(+), 1039 deletions(-) delete mode 100644 man/baggr-package.Rd delete mode 100644 man/baggr_compare.Rd delete mode 100644 man/baggr_plot.Rd delete mode 100644 man/baggr_theme_set.Rd delete mode 100644 man/effect_draw.Rd delete mode 100644 man/effect_plot.Rd delete mode 100644 man/group_effects.Rd delete mode 100644 man/is.baggr_cv.Rd delete mode 100644 man/loo_compare.Rd delete mode 100644 man/loocv.Rd delete mode 100644 man/microcredit.Rd delete mode 100644 man/microcredit_simplified.Rd delete mode 100644 man/mint.Rd delete mode 100644 man/plot.baggr.Rd delete mode 100644 man/pooling.Rd delete mode 100644 man/prepare_ma.Rd delete mode 100644 man/prepare_prior.Rd delete mode 100644 man/print.baggr.Rd delete mode 100644 man/print.baggr_cv.Rd delete mode 100644 man/print.compare_baggr_cv.Rd delete mode 100644 man/priors.Rd delete mode 100644 man/schools.Rd delete mode 100644 man/show_model.Rd delete mode 100644 man/treatment_effect.Rd diff --git a/man/baggr-package.Rd b/man/baggr-package.Rd deleted file mode 100644 index 1ae34e6e..00000000 --- a/man/baggr-package.Rd +++ /dev/null @@ -1,31 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/baggr-package.R -\docType{package} -\name{baggr-package} -\alias{baggr-package} -\title{baggr - a package for Bayesian meta-analysis} -\description{ -This is \emph{baggr} (pronounced as \emph{bagger} or \emph{badger}), a Bayesian meta-analysis -package for R using \href{https://mc-stan.org/}{Stan}. -\emph{Baggr} is intended to be user-friendly and transparent so that -it's easier to understand the models you are building and criticise them. -} -\details{ -\emph{Baggr} package provides a suite of models that work with both summary data and full data sets, -to synthesise evidence collected from different groups, contexts or time periods. -The \link{baggr} command automatically detects the data type and, by default, fits a partial -pooling model (which you may know as -\href{https://stats.stackexchange.com/questions/4700/what-is-the-difference-between-fixed-effect-random-effect-and-mixed-effect-mode}{random effects models}) -with weakly informative priors by calling \href{https://mc-stan.org/}{Stan} to carry -out Bayesian inference. Modelling of variances or quantiles, standardisation and -transformation of data is also possible. -} -\section{Getting help}{ - - -This is only a simple package help file. -For documentation of the main function for conducting analyses see \link{baggr}. -For description of models, data types and priors available in the package, -try the built-in vignette (\code{vignette("baggr")}). -} - diff --git a/man/baggr_compare.Rd b/man/baggr_compare.Rd deleted file mode 100644 index 278871b3..00000000 --- a/man/baggr_compare.Rd +++ /dev/null @@ -1,78 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/baggr_compare.R -\name{baggr_compare} -\alias{baggr_compare} -\title{(Run and) compare multiple baggr models} -\usage{ -baggr_compare( - ..., - what = "pooling", - compare = "groups", - style = "areas", - arrange = "single" -) -} -\arguments{ -\item{...}{Either a number (at least 1) of objects of class \code{baggr} -(you should name your objects, see the example below) -or the same arguments you'd pass to baggr(), -in which you must case specify \code{what} to compare.} - -\item{what}{One of \code{"pooling"} (comparison between no, partial and -full pooling) or \code{"prior"} (comparison between prior and -posterior predictive). If pre-existing baggr models are -passed to \code{...}, this argument is ignored.} - -\item{compare}{When plotting, choose between comparison of \code{"groups"} -(default) or (hyper-) \code{"effects"}. The former is not available -when \code{what = "prior"}.} - -\item{style}{What kind of plot to display (if \code{arrange = "grid"}), -passed to the \code{style} argument in \link{baggr_plot}.} - -\item{arrange}{If \code{"single"} (default), generate a single comparison plot; -if \code{"grid"}, display multiple plots side-by-side.} -} -\value{ -a \code{ggplot} is rendered and/or returned -} -\description{ -(Run and) compare multiple baggr models -} -\examples{ -\donttest{ -# Most basic comparison between no, partial and full pooling -# (This will run the models) -baggr_compare(schools) - -# Compare prior vs posterior -baggr_compare(schools, what = "prior") - - -# Compare existing models: -bg1 <- baggr(schools, pooling = "partial") -bg2 <- baggr(schools, pooling = "full") -baggr_compare("Partial pooling model" = bg1, "Full pooling" = bg2, - arrange = "grid") - -#' ...or simply draw prior predictive dist (note ppd=T) -bg1 <- baggr(schools, ppd=T) -bg2 <- baggr(schools, prior_hypermean = normal(0, 5), ppd=T) -baggr_compare("Prior A, p.p.d."=bg1, - "Prior B p.p.d."=bg2, - compare = "effects") - -# Compare posterior effects as a function of priors (note ppd=F) -bg1 <- baggr(schools, prior_hypersd = uniform(0, 20)) -bg2 <- baggr(schools, prior_hypersd = normal(0, 5)) -baggr_compare("Uniform prior on SD"=bg1, - "Normal prior on SD"=bg2, - compare = "effects") -# You can also compare different subsets of input data -bg1_small <- baggr(schools[1:6,], pooling = "partial") -baggr_compare("8 schools model" = bg1, "First 6 schools" = bg1_small) -} -} -\author{ -Witold Wiecek -} diff --git a/man/baggr_plot.Rd b/man/baggr_plot.Rd deleted file mode 100644 index f6b17908..00000000 --- a/man/baggr_plot.Rd +++ /dev/null @@ -1,62 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/baggr_plot.R -\name{baggr_plot} -\alias{baggr_plot} -\title{Plotting method in baggr package} -\usage{ -baggr_plot( - bg, - hyper = FALSE, - style = "intervals", - transform = NULL, - prob = 0.5, - prob_outer = 0.95, - vline = TRUE, - order = TRUE, - ... -) -} -\arguments{ -\item{bg}{object of class \code{baggr}} - -\item{hyper}{logical; show hypereffect as the last row of the plot?} - -\item{style}{one of \code{areas}, \code{intervals}} - -\item{transform}{a function (e.g. \code{exp()}, \code{log()}) to apply to the -values of group (and hyper, if \code{hyper=TRUE}) effects -before plotting; when working with effects that are on -log scale, exponent transform is used automatically, you can -plot on log scale by setting \code{transform = identity}} - -\item{prob}{Probability mass for the inner interval in visualisation} - -\item{prob_outer}{Probability mass for the outer interval in visualisation} - -\item{vline}{logical; show vertical line through 0 in the plot?} - -\item{order}{logical; sort groups by magnitude of treatment effect?} - -\item{...}{extra arguments to pass to the \code{bayesplot} functions} -} -\value{ -ggplot2 object -} -\description{ -Extracts study effects from the \code{baggr} model and sends them to -one of \code{bayesplot} package plotting functions. -} -\examples{ -fit <- baggr(schools, pooling = "none") -plot(fit) -plot(fit, style = "areas", order = FALSE) - -} -\seealso{ -\link[bayesplot:MCMC-intervals]{bayesplot::MCMC-intervals} for more information about \emph{bayesplot} functionality; -\link{forest_plot} for a typical meta-analysis alternative; \link{effect_plot} for plotting -treatment effects for a new group -} -\author{ -Witold Wiecek, Rachael Meager -} diff --git a/man/baggr_theme_set.Rd b/man/baggr_theme_set.Rd deleted file mode 100644 index 3d23e36b..00000000 --- a/man/baggr_theme_set.Rd +++ /dev/null @@ -1,74 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/plot.R -\name{baggr_theme_set} -\alias{baggr_theme_set} -\alias{baggr_theme_get} -\alias{baggr_theme_update} -\alias{baggr_theme_replace} -\title{Set, get, and replace themes for baggr plots} -\usage{ -baggr_theme_set(new = bayesplot::theme_default()) - -baggr_theme_get() - -baggr_theme_update(...) - -baggr_theme_replace(...) -} -\arguments{ -\item{new}{New theme to use for all baggr plots} - -\item{...}{A named list of theme settings} -} -\value{ -The get method returns the current theme, but all of the -others invisibly return the old theme. -} -\description{ -These functions get, set, and modify the ggplot2 themes -of the baggr plots. \code{baggr_theme_get()} returns a ggplot2 theme function for -adding themes to a plot. \code{baggr_theme_set()} assigns a new theme -for all plots of baggr objects. \code{baggr_theme_update()} edits a specific -theme element for the current theme while holding the theme's -other aspects constant. \code{baggr_theme_replace()} is used for -wholesale replacing aspects of a plot's theme (see \code{\link[ggplot2:theme_get]{ggplot2::theme_get()}}). -} -\details{ -Under the hood, many of the visualizations rely on the -bayesplot package, and thus these leverage the \code{\link[bayesplot:bayesplot_theme_get]{bayesplot::bayesplot_theme_get()}} -functions. By default, these match the bayesplot's package -theme to make it easier to form cohesive graphs across this package -and others. The trickiest of these to use is \code{baggr_theme_replace}; -9 times out of 10 you want baggr_theme_update. -} -\examples{ - -# make plot look like default ggplots - -library(ggplot2) - -fit <- baggr(schools) -baggr_theme_set(theme_grey()) -baggr_plot(fit) - -# use baggr_theme_get to return theme elements for current theme -qplot(mtcars$mpg) + baggr_theme_get() - -# update specific aspect of theme you are interested in -baggr_theme_update(text = element_text(family = "mono")) - -# undo that silliness -baggr_theme_update(text = element_text(family = "serif")) - -# update and replace are similar, but replace overwrites the -# whole element, update just edits the aspect of the element -# that you give it -# this will error: -# baggr_theme_replace(text = element_text(family = "Times")) -# baggr_plot(fit) -# because it deleted everything else to do with text elements - -} -\seealso{ -\link[bayesplot:bayesplot_theme_get]{bayesplot::bayesplot_theme_get} -} diff --git a/man/effect_draw.Rd b/man/effect_draw.Rd deleted file mode 100644 index ef8b308c..00000000 --- a/man/effect_draw.Rd +++ /dev/null @@ -1,26 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/trt_effects.R -\name{effect_draw} -\alias{effect_draw} -\title{Make posterior draws for treatment effect} -\usage{ -effect_draw(x, n, transform = NULL) -} -\arguments{ -\item{x}{A \code{baggr} class object.} - -\item{n}{How many values to draw? The default is the same -as number of samples in the model (default is 2,000).} - -\item{transform}{a transformation to apply to the result, should be an R function; -(this is commonly used when calling \code{group_effects} from other -plotting or printing functions)} -} -\value{ -A vector of possible values of the treatment effect. -} -\description{ -This function takes the samples of hyperparameters of a \code{baggr} model -(commonly hypermean tau and hyper-SD sigma_tau) and simulates values of -new realisations of tau (a mean effect in some unobserved group). -} diff --git a/man/effect_plot.Rd b/man/effect_plot.Rd deleted file mode 100644 index 513a8e5c..00000000 --- a/man/effect_plot.Rd +++ /dev/null @@ -1,44 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/trt_effects.R -\name{effect_plot} -\alias{effect_plot} -\title{Plot posterior distribution for treatment effect} -\usage{ -effect_plot(..., transform = NULL) -} -\arguments{ -\item{...}{Object(s) of class \code{baggr}. If there is more than one, -the names of objects will be used as a plot legend (see example).} - -\item{transform}{a transformation to apply to the result, should be an R function; -(this is commonly used when calling \code{group_effects} from other -plotting or printing functions)} -} -\value{ -A ggplot. -} -\description{ -This function plots the \link{effect_draw} for one or more baggr objects. -} -\examples{ - -# A single effects plot -bg1 <- baggr(schools, prior_hypersd = uniform(0, 20)) -effect_plot(bg1) - -# Compare how posterior depends on the prior choice -bg2 <- baggr(schools, prior_hypersd = normal(0, 5)) -effect_plot("Uniform prior on SD"=bg1, - "Normal prior on SD"=bg2) - -# Compare the priors themselves (ppd=T) -bg1_ppd <- baggr(schools, prior_hypersd = uniform(0, 20), ppd=TRUE) -bg2_ppd <- baggr(schools, prior_hypersd = normal(0, 5), ppd=TRUE) -effect_plot("Uniform prior on SD"=bg1_ppd, - "Normal prior on SD"=bg2_ppd) - -} -\seealso{ -\link{baggr_compare} can be used as a shortcut for \code{effect_plot} with argument -\code{compare = "effects"} -} diff --git a/man/group_effects.Rd b/man/group_effects.Rd deleted file mode 100644 index c27c8af8..00000000 --- a/man/group_effects.Rd +++ /dev/null @@ -1,40 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/group_effects.R -\name{group_effects} -\alias{group_effects} -\title{Extract baggr study effects} -\usage{ -group_effects(bg, summary = FALSE, transform = NULL, interval = 0.95) -} -\arguments{ -\item{bg}{baggr object} - -\item{summary}{logical; if TRUE returns summary statistics as explained below.} - -\item{transform}{a transformation to apply to the result, should be an R function; -(this is commonly used when calling \code{group_effects} from other -plotting or printing functions)} - -\item{interval}{uncertainty interval width (numeric between 0 and 1), if summarising} -} -\value{ -Either a matrix with MCMC samples (if summary = FALSE) -or a summary of these samples (if summary = TRUE). -} -\description{ -Given a baggr object, returns the raw MCMC draws of the posterior for -each group's effect, or a summary of these draws. This is an internal -function currently used as a helper for plotting and printing of results. -} -\details{ -If summary = TRUE, the returned object contains, for each study -or group, the following 5 values: -the posterior medians, the lower and upper bounds of the -uncertainty intervals using the central posterior credible interval -of width specified in the argument \code{interval}, the posterior mean, and -the posterior standard deviation. -} -\examples{ -fit1 <- baggr(schools) -group_effects(fit1, summary = TRUE, interval = 0.5) -} diff --git a/man/is.baggr_cv.Rd b/man/is.baggr_cv.Rd deleted file mode 100644 index 045262b7..00000000 --- a/man/is.baggr_cv.Rd +++ /dev/null @@ -1,14 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/loocv.R -\name{is.baggr_cv} -\alias{is.baggr_cv} -\title{Check if something is a baggr_cv object} -\usage{ -is.baggr_cv(x) -} -\arguments{ -\item{x}{object to check} -} -\description{ -Check if something is a baggr_cv object -} diff --git a/man/loo_compare.Rd b/man/loo_compare.Rd deleted file mode 100644 index 586684d5..00000000 --- a/man/loo_compare.Rd +++ /dev/null @@ -1,25 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/loocv.R -\name{loo_compare} -\alias{loo_compare} -\title{Compare fitted models on loo} -\usage{ -loo_compare(x, ...) -} -\arguments{ -\item{x}{An object of class \code{baggr_cv} or a list of such objects.} - -\item{...}{Additional objects of class "baggr_cv"} -} -\description{ -Compare fitted models on loo -} -\examples{ -\donttest{ -# 2 models with more/less informative priors -cv_1 <- loocv(schools, model = "rubin", pooling = "partial") -cv_2 <- loocv(schools, model = "rubin", pooling = "partial", - prior_hypermean = normal(0, 5), prior_hypersd = cauchy(0,4)) -loo_compare(cv_1, cv_2) -} -} diff --git a/man/loocv.Rd b/man/loocv.Rd deleted file mode 100644 index e3d7b01c..00000000 --- a/man/loocv.Rd +++ /dev/null @@ -1,62 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/loocv.R -\name{loocv} -\alias{loocv} -\title{Leave one group out cross-validation for \code{baggr} models} -\usage{ -loocv(data, return_models = FALSE, ...) -} -\arguments{ -\item{data}{Input data frame - same as for \link{baggr} function.} - -\item{return_models}{logical; if FALSE, summary statistics will be returned and the -models discarded; -if TRUE, a list of models will be returned alongside summaries} - -\item{...}{Additional arguments passed to \link{baggr}.} -} -\value{ -log predictive density value, an object of class \code{baggr_cv}; -full model, prior values and \emph{lpd} of each model are also returned. -These can be examined by using \code{attributes()} function. -} -\description{ -Performs exact leave-one-group-out cross-validation on a baggr model. -} -\details{ -The values returned by \code{loocv()} can be used to understand how any -one group affects the overall result, as well as how well the model -predicts the omitted group. - -This function automatically runs K baggr models, leaving out one group at a time, -and then calculates expected log predictive density (ELPD) for -that group (see Gelman et al 2013). The main output is the cross-validation -information criterion, or -2 times the ELPD averaged over 'K' models. -This is related to, and often approximated by, the Watanabe-Akaike -Information Criterion. A value closer to zero (i.e. a smaller number in magnitude) -means a better fit. For more information on cross-validation see -\href{http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf}{this overview article} - -For running more computation-intensive models, consider setting the -\code{mc.cores} option before running loocv, e.g. \code{options(mc.cores = 4)} -(by default baggr runs 4 MCMC chains in parallel). -As a default, rstan runs "silently" (\code{refresh=0}). -To see sampling progress, please set e.g. \code{loocv(data, refresh = 500)}. -} -\examples{ -\donttest{ -# even simple examples may take a while -cv <- loocv(schools, pooling = "partial") -print(cv) # returns the lpd value -attributes(cv) # more information is included in the object -} - -} -\references{ -Gelman, Andrew, Jessica Hwang, and Aki Vehtari. -“Understanding Predictive Information Criteria for Bayesian Models.” -Statistics and Computing 24, no. 6 (November 2014): 997–1016. https://doi.org/10.1007/s11222-013-9416-2. -} -\author{ -Witold Wiecek -} diff --git a/man/microcredit.Rd b/man/microcredit.Rd deleted file mode 100644 index 95c31c7a..00000000 --- a/man/microcredit.Rd +++ /dev/null @@ -1,36 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/microcredit.R -\docType{data} -\name{microcredit} -\alias{microcredit} -\title{7 studies on effect of microcredit supply} -\format{A data frame with 40267 rows, 7 study identifiers and 7 outcomes} -\usage{ -microcredit -} -\description{ -This dataframe contains the data used in Meager (2019) to estimate hierarchical -models on the data from 7 randomized controlled trials of expanding access to microcredit. -} -\details{ -The columns include the group indicator which gives the name of the lead author -on each of the respective studies, the value of the 6 outcome variables of most -interest (consumer durables spending, business expenditures, business profit, -business revenues, temptation goods spending and consumption spending) all of -which are standardised to USD PPP in 2009 dollars per two weeks (these are flow variables), -and finally a treatment assignment status indicator. - -The dataset has not otherwise been cleaned and therefore includes NAs and other -issues common to real-world datasets. - -For more information on how and why these variables were chosen and standardised, -see Meager (2019) or consult the associated code repository which includes the -standardisation scripts: -\href{https://bitbucket.org/rmeager/aggregate-average-impacts-microcredit/src/master/}{link} -} -\references{ -Meager, Rachael (2019) Understanding the average impact of microcredit expansions: -A Bayesian hierarchical analysis of seven randomized experiments. -American Economic Journal: Applied Economics, 11(1), 57-91. -} -\keyword{datasets} diff --git a/man/microcredit_simplified.Rd b/man/microcredit_simplified.Rd deleted file mode 100644 index c1d7b49e..00000000 --- a/man/microcredit_simplified.Rd +++ /dev/null @@ -1,35 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/microcredit.R -\docType{data} -\name{microcredit_simplified} -\alias{microcredit_simplified} -\title{Simplified version of the microcredit dataset.} -\format{A data frame with 14224 rows, 7 study identifiers and 1 outcome} -\usage{ -microcredit_simplified -} -\description{ -This dataframe contains the data used in Meager (2019) to estimate hierarchical -models on the data from 7 randomized controlled trials of expanding access to microcredit. -} -\details{ -The columns include the group indicator which gives the name of the lead author on -each of the respective studies, the value of the household consumer durables -spending standardised to USD PPP in 2009 dollars per two weeks (these are flow variables), -and finally a treatment assignment status indicator. - -The dataset has not otherwise been cleaned and therefore includes NAs and other -issues common to real data. - -For more information on how and why these variables were chosen and standardised, -see Meager (2019) or consult the associated code repository: -\href{https://bitbucket.org/rmeager/aggregate-average-impacts-microcredit/src/master/}{link} - -This dataset includes only complete cases and only the consumer durables outcome variable. -} -\references{ -Meager, Rachael (2019) Understanding the average impact of microcredit expansions: -A Bayesian hierarchical analysis of seven randomized experiments. American Economic Journal: -Applied Economics, 11(1), 57-91. -} -\keyword{datasets} diff --git a/man/mint.Rd b/man/mint.Rd deleted file mode 100644 index 7a719142..00000000 --- a/man/mint.Rd +++ /dev/null @@ -1,26 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/mint.R -\name{mint} -\alias{mint} -\title{Output mean, intervals and other summaries for matrix (by column) or vector} -\usage{ -mint(y, int = 0.95, digits = NULL, median = FALSE, sd = FALSE) -} -\arguments{ -\item{y}{matrix or a vector; for matrices, \code{mint} is done by-column} - -\item{int}{probability interval (default is 95 percent) to calculate} - -\item{digits}{number of significant digits to \link{round} values by.} - -\item{median}{return median value?} - -\item{sd}{return SD?} -} -\description{ -This function is just a convenient shorthand for getting typical summary statistics. -} -\examples{ -mint(rnorm(100, 12, 5)) - -} diff --git a/man/plot.baggr.Rd b/man/plot.baggr.Rd deleted file mode 100644 index e6e5f784..00000000 --- a/man/plot.baggr.Rd +++ /dev/null @@ -1,23 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/plot.R -\name{plot.baggr} -\alias{plot.baggr} -\title{Generic plot for baggr package} -\usage{ -\method{plot}{baggr}(x, ...) -} -\arguments{ -\item{x}{object of class \code{baggr}} - -\item{...}{optional arguments, see \code{baggr_plot}} -} -\value{ -ggplot2 object from \code{baggr_plot} -} -\description{ -Using generic \code{plot()} on \code{baggr} output invokes \code{\link{baggr_plot}} visual. -See therein for customisation options. Note that plot output is \code{ggplot2} object.` -} -\author{ -Witold Wiecek -} diff --git a/man/pooling.Rd b/man/pooling.Rd deleted file mode 100644 index b522b246..00000000 --- a/man/pooling.Rd +++ /dev/null @@ -1,89 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/pooling_metrics.R -\name{pooling} -\alias{pooling} -\alias{heterogeneity} -\title{Pooling metrics for baggr} -\usage{ -pooling(bg, type = c("groups", "total"), summary = TRUE) - -heterogeneity(bg, summary = TRUE) -} -\arguments{ -\item{bg}{output of a baggr() function} - -\item{type}{In \code{pooling} calculation is done for each of the \code{"groups"} -(default) or for \code{"total"} hypereffect(s). -See Details section for how calculation is done.} - -\item{summary}{logical; if \code{FALSE} a whole vector of pooling values is returned, -otherwise only the means and intervals} -} -\value{ -Matrix with mean and intervals for chosen pooling metric, -each row corresponding to one meta-analysis group. -} -\description{ -Compute statistics relating to \code{heterogeneity} (whole model) and -\code{pooling} (for each group) given a \link{baggr} meta-analysis model. -The statistics are the pooling metric by Gelman & Pardoe (2006) or its -complement, the \emph{I-squared} statistic. -} -\details{ -Pooling statistic describes the extent to which group-level estimates of treatment -effect are "pooled" (or pulled!) toward average treatment effect in the meta-analysis model. -If \code{pooling = "none"} or "full" in \link{baggr}, then the returned values are always 0 or 1, respectively. -If \code{pooling = "partial"}, the value is somewhere between 0 and 1. - -\strong{Formulae for the calculations below are provided in main package vignette.} See \verb{vignette("baggr").} - -#' \strong{Estimate of pooling in a group}: this is the calculation done by \code{pooling()} -if \code{type = "groups"} (default). - -In a partial pooling model (see \link{baggr}), group \emph{k} (e.g. study) has a treatment effect -estimate, with some SE around the real treatment effect (TE). -Each TE itself is distributed with mean and variance. - -The quantity of interest is ratio of variability in \eqn{\tau} to total variability. -By convention, we subtract it from 1, to obtain a \emph{pooling metric} \emph{p}. - -\deqn{p = 1 - (\sigma(\tau)^2 / (\sigma_(\tau)^2 + se_k^2))} -\itemize{ -\item If \eqn{p < 0.5}, that means the variation across studies is higher than variation within studies. -\item Values close to 1 indicate nearly full pooling. Variation across studies dominates. -\item Values close to 0 -- no pooling. Variation within studies dominates. -} - -Note that, since \eqn{\sigma_{\tau}^2} is a Bayesian parameter (rather than a single fixed value) -\emph{p} is also a parameter. It is typical for \emph{p} to have very high dispersion, as in many cases we -cannot precisely estimate \eqn{\sigma_{\tau}}. To obtain the whole distribution of_p_ -(rather than summarised values), set \code{summary=FALSE}. - -\strong{Overall pooling (in the model)} - -Typically it is a single measure of heterogeneity that is of interest to researchers. -This is calculated by setting \code{type = "total"} or simply writing \code{heterogeneity(mymodel)} - -In many contexts, i.e. medical statistics, it is typical to report \emph{1-P}, called \eqn{I^2} -(see Higgins \emph{et al}, 2003). Higher values of \emph{I-squared} indicate higher heterogeneity. -Von Hippel (2015) provides useful details for \emph{I-squared} calculations. - -Same as for group-specific estimates, \emph{P} is a Bayesian parameter and its dispersion can be high. - -\strong{Relationship to R-squared statistic} - -See Gelman & Pardoe (2006) Section 1.1 for a short explanation of how \eqn{R^2} -statistic relates to the pooling metric. -} -\references{ -Gelman, Andrew, and Iain Pardoe. -"Bayesian Measures of Explained Variance and Pooling in Multilevel (Hierarchical) Models." -\emph{Technometrics 48, no. 2 (May 2006): 241-51}. \url{https://doi.org/10.1198/004017005000000517}. - -Higgins, Julian P T, Simon G Thompson, Jonathan J Deeks, and Douglas G Altman. -"Measuring Inconsistency in Meta-Analyses." -\emph{British Medical Journal 327, no. 7414 (September 6, 2003): 557-60.} - -Hippel, Paul T von. "The Heterogeneity Statistic I2 Can Be Biased in Small Meta-Analyses." -\emph{BMC Medical Research Methodology 15 (April 14, 2015).} \url{https://doi.org/10.1186/s12874-015-0024-z}. -} diff --git a/man/prepare_ma.Rd b/man/prepare_ma.Rd deleted file mode 100644 index f67cd4b2..00000000 --- a/man/prepare_ma.Rd +++ /dev/null @@ -1,84 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/prepare_ma.R -\name{prepare_ma} -\alias{prepare_ma} -\title{Convert from individual to summary data in meta-analyses} -\usage{ -prepare_ma( - data, - effect = c("mean", "logOR", "logRR"), - rare_event_correction = 0.25, - log = FALSE, - cfb = FALSE, - summarise = TRUE, - treatment = "treatment", - baseline = NULL, - group = "group", - outcome = "outcome" -) -} -\arguments{ -\item{data}{data.frame of individual-level observations -with columns for outcome (numeric), treatment (values 0 and 1) and -group (numeric, character or factor); -column names can be user-defined (see below)} - -\item{effect}{what effect to calculate? a \code{mean} (and SE) of outcome in groups or -(for binary data) \code{logOR} (odds ratio), \code{logRR} (risk ratio);} - -\item{rare_event_correction}{If effect is \code{logOR} or \code{logRR}, this correction -is used when working with -binary data only. The value of correction is added to all arms -in trials where some arms had 0 events. -Using corrections may bias results but is the only alternative to -avoid infinite values.} - -\item{log}{logical; log-transform the outcome variable?} - -\item{cfb}{logical; calculate change from baseline? If yes, the outcome -variable is taken as a difference between values in \code{outcome} and -\code{baseline} columns} - -\item{summarise}{logical; \code{TRUE} by default, but you can disable it to obtain -converted (e.g. logged) data with columns renamed} - -\item{treatment}{name of column with treatment variable} - -\item{baseline}{name of column with baseline variable} - -\item{group}{name of the column with grouping variable} - -\item{outcome}{name of column with outcome variable} -} -\value{ -\itemize{ -\item If you \code{summarise} data.frame with columns for \code{group} \code{tau} and \code{se.tau} -(for \code{effect = "mean"}, also baseline means, for \code{"logRR"} or \code{"logOR"} also -\code{a}, \code{b}, \code{c}, \code{d}, which correspond to typical contingency table notation). -\item If you do not summarise data, individual level data will be returned, but -some columns may be renamed or transformed (see above). -} -} -\description{ -Allows one-way conversion from full to summary data. -Input must be pre-formatted appropriately. -} -\details{ -The conversions done by this function are not typically needed and may happen automatically -when data is fed to \link{baggr}. However, this function can be used to explicitly -convert from full to reduced (summarised) data without analysing it in any model. -It can be useful for examining your data. - -If multiple operations are performed, they are taken in this order: -\enumerate{ -\item conversion to log scale, -\item calculating change from baseline, -\item summarising data (using appropriate \code{effect}) -} -} -\seealso{ -\link{convert_inputs} for how any type of data is (internally) converted into Stan inputs; -} -\author{ -Witold Wiecek -} diff --git a/man/prepare_prior.Rd b/man/prepare_prior.Rd deleted file mode 100644 index 372fc639..00000000 --- a/man/prepare_prior.Rd +++ /dev/null @@ -1,43 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/auto_prior.R -\name{prepare_prior} -\alias{prepare_prior} -\title{Prepare prior values for Stan models in baggr} -\usage{ -prepare_prior( - prior, - data, - stan_data, - model, - pooling, - covariates, - quantiles = c() -) -} -\arguments{ -\item{prior}{\code{prior} argument passed from \link{baggr} call} - -\item{data}{\code{data} another argument in \link{baggr}} - -\item{stan_data}{list of inputs that will be used by sampler -this is already pre-obtained through \link{convert_inputs}} - -\item{model}{same as in \link{baggr}} - -\item{pooling}{same as in \link{baggr}} - -\item{covariates}{same as in \link{baggr}} - -\item{quantiles}{same as in \link{baggr}} -} -\value{ -A named list with prior values that can be appended to \code{stan_data} -and passed to a Stan model. -} -\description{ -This is an internal function called by \link{baggr}. You can use it for debugging -or to run modified models. -It extracts and prepares priors passed by the user. -Then, if any necessary priors are missing, it sets them automatically -and notifies user about these automatic choices. -} diff --git a/man/print.baggr.Rd b/man/print.baggr.Rd deleted file mode 100644 index c0ad77c5..00000000 --- a/man/print.baggr.Rd +++ /dev/null @@ -1,28 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/print_baggr.R -\name{print.baggr} -\alias{print.baggr} -\title{S3 print method for objects of class \code{baggr} (model fits)} -\usage{ -\method{print}{baggr}(x, exponent = FALSE, digits = 2, group, fixed = TRUE, ...) -} -\arguments{ -\item{x}{object of class \code{baggr}} - -\item{exponent}{if \code{TRUE}, results (for means) are converted to exp scale} - -\item{digits}{Number of significant digits to print.} - -\item{group}{logical; print group effects? If unspecified, -they are printed only if -less than 20 groups are present} - -\item{fixed}{logical: print fixed effects?} - -\item{...}{currently unused by this package: further arguments passed -to or from other methods (\code{print} requirement)} -} -\description{ -This \code{print} method for a very concise summary of main model features. -More info is included in the summary of the model and its attributes. -} diff --git a/man/print.baggr_cv.Rd b/man/print.baggr_cv.Rd deleted file mode 100644 index c1ee8f61..00000000 --- a/man/print.baggr_cv.Rd +++ /dev/null @@ -1,18 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/loocv.R -\name{print.baggr_cv} -\alias{print.baggr_cv} -\title{Print baggr cv objects nicely} -\usage{ -\method{print}{baggr_cv}(x, digits = 3, ...) -} -\arguments{ -\item{x}{baggr_cv object to print} - -\item{digits}{number of digits to print} - -\item{...}{additional arguments for s3 consistency} -} -\description{ -Print baggr cv objects nicely -} diff --git a/man/print.compare_baggr_cv.Rd b/man/print.compare_baggr_cv.Rd deleted file mode 100644 index c1e31bc6..00000000 --- a/man/print.compare_baggr_cv.Rd +++ /dev/null @@ -1,18 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/loocv.R -\name{print.compare_baggr_cv} -\alias{print.compare_baggr_cv} -\title{Print baggr_cv comparisons} -\usage{ -\method{print}{compare_baggr_cv}(x, digits = 3, ...) -} -\arguments{ -\item{x}{baggr_cv comparison to print} - -\item{digits}{number of digits to print} - -\item{...}{additional arguments for s3 consistency} -} -\description{ -Print baggr_cv comparisons -} diff --git a/man/priors.Rd b/man/priors.Rd deleted file mode 100644 index 3fbca5fc..00000000 --- a/man/priors.Rd +++ /dev/null @@ -1,95 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/prior_dist.R -\name{priors} -\alias{priors} -\alias{multinormal} -\alias{lkj} -\alias{normal} -\alias{cauchy} -\alias{uniform} -\title{Prior distributions in baggr} -\usage{ -multinormal(location, Sigma) - -lkj(shape, order = NULL) - -normal(location, scale) - -cauchy(location, scale) - -uniform(lower, upper) -} -\arguments{ -\item{location}{Mean for normal and multivariate normal (in which case \code{location} is a vector), -and median for Cauchy distributions} - -\item{Sigma}{Variance-covariance matrix for multivariate normal.} - -\item{shape}{Shape parameter for LKJ} - -\item{order}{Order of LKJ matrix (typically it does not need to be specified, -as it is inferred directly in the model)} - -\item{scale}{SD for Normal, scale for Cauchy} - -\item{lower}{Lower bound for Uniform} - -\item{upper}{Upper bound for Uniform} -} -\description{ -This page provides a list of all available distributions -that can be used to specify priors in \code{\link[=baggr]{baggr()}}. These convenience functions -are designed to allow the user to write the priors in the most "natural" way when -implementing them in baggr. Apart from -passing on the arguments, their only other role is to perform a rudimentary check -if the distribution is specified correctly. -} -\details{ -The prior choice in \link{baggr} is always done via 3 distinct arguments: \code{prior_hypermean}, -\code{prior_hypersd}, and \code{prior_hypercor}. - -These respectively refer to the priors on the average of the effects across -the groups (hypermean), the standard deviation of the effects across the groups -(hypersd), and the correlation in the distribution of parameters across groups -when the model allows multivariate shrinkage (say on control group means and effects). - -Notation for priors is "plain-text", in that you can write the distributions as -\code{normal(5,10)}, \code{uniform(0,100)} etc. -As with any other argument one has the option to simply input the prior directly, -e.g. \code{prior_hypermean = normal(0,1)} , or by creating a named list of custom priors -and then inputting the list to the argument \code{priors}. -See the examples below for more. - -Different parameters admit different priors: -\itemize{ -\item \code{prior_hypermean} will take \code{"normal"}, \code{"uniform"} and \code{"cauchy"} input for a scalar mean. -For a vector mean, it will take any of these arguments and apply them independently to -each component of the vector, or it can also take a \code{"multinormal"} argument -(see the example below). -\item \code{prior_hypersd} will take \code{"normal"} and \code{"uniform"} -\item \code{prior_hypercor} allows \code{"lkj"} input -} -} -\examples{ -# change the priors for 8 schools: -baggr(schools, model = "rubin", pooling = "partial", - prior_hypermean = normal(5,5), - prior_hypersd = normal(0,20)) - -\donttest{ -# passing priors as a list -custom_priors <- list(hypercor = lkj(1), hypersd = normal(0,10), - hypermean = multinormal(c(0,0),matrix(c(10,3,3,10),2,2))) -baggr(microcredit_summary_data, model = "mutau", - pooling = "partial", prior = custom_priors) -} -} -\references{ -Lewandowski, Daniel, Dorota Kurowicka, and Harry Joe. -"Generating Random Correlation Matrices Based on Vines and Extended Onion Method." -\emph{Journal of Multivariate Analysis} 100, no. 9 (October 1, 2009): 1989-2001. -https://doi.org/10.1016/j.jmva.2009.04.008. -} -\author{ -Witold Wiecek, Rachael Meager -} diff --git a/man/schools.Rd b/man/schools.Rd deleted file mode 100644 index 244afffa..00000000 --- a/man/schools.Rd +++ /dev/null @@ -1,23 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/microcredit.R -\docType{data} -\name{schools} -\alias{schools} -\title{8 schools example} -\format{An object of class \code{data.frame} with 8 rows and 3 columns.} -\usage{ -schools -} -\description{ -A classic example of aggregate level continuous data in Bayesian hierarchical modelling. -This dataframe contains a column of estimated treatment effects of an SAT prep -program implemented in 8 different schools in the US, and a column of estimated standard errors. -} -\details{ -See Gelman et al (1995), Chapter 5, for context and applied example. -} -\references{ -Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. -Bayesian Data Analysis. Taylor & Francis, 1995. -} -\keyword{datasets} diff --git a/man/show_model.Rd b/man/show_model.Rd deleted file mode 100644 index 5279b4dd..00000000 --- a/man/show_model.Rd +++ /dev/null @@ -1,19 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/baggr_code.R -\name{show_model} -\alias{show_model} -\title{Show Stan code for baggr models or objects} -\usage{ -show_model(model) -} -\arguments{ -\item{model}{either a \code{baggr} object (fitted model) or one of -\code{"rubin"}, \code{"mutau"}, \code{"individual"}} -} -\value{ -Nothing is returned in R. Stan code will be opened externally -(e.g. via notepad). -} -\description{ -Show Stan code for baggr models or objects -} diff --git a/man/treatment_effect.Rd b/man/treatment_effect.Rd deleted file mode 100644 index 5018710e..00000000 --- a/man/treatment_effect.Rd +++ /dev/null @@ -1,28 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/trt_effects.R -\name{treatment_effect} -\alias{treatment_effect} -\title{Average treatment effect in a baggr model} -\usage{ -treatment_effect(bg, summary = FALSE, transform = NULL, interval = 0.95) -} -\arguments{ -\item{bg}{a \link{baggr} model} - -\item{summary}{logical; if TRUE returns summary statistics as explained below.} - -\item{transform}{a transformation to apply to the result, should be an R function; -(this is commonly used when calling \code{treatment_effect} from other -plotting or printing functions)} - -\item{interval}{uncertainty interval width (numeric between 0 and 1), if summarising} -} -\value{ -A list with 2 vectors (corresponding to MCMC samples) -\code{tau} (mean effect) and \code{sigma_tau} (SD). If \code{summary=TRUE}, -both vectors are summarised as mean and lower/upper bounds according to -\code{interval} -} -\description{ -Average treatment effect in a baggr model -} diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd index 64e69cc3..8df879e1 100644 --- a/vignettes/baggr_model_convergence.Rmd +++ b/vignettes/baggr_model_convergence.Rmd @@ -12,28 +12,26 @@ vignette: > bibliography: baggr.bib --- -The `baggr` package makes running Bayesian meta-analysis models very easy, but that ease can also obscure some things that go on behind the scenes. Especially with low-data settings, say with only a few measurements, you may get cryptic error messages when you run your models, like +```{r include = F} +library(baggr) +``` + + +This is a vignette intended for baggr users who have no or little experience with Markov Chain Monte Carlo (MCMC) programs used for Bayesian inference. It is intended to help you understand common problems that may occur when running models. One of the nice things about using `baggr` is that it makes the barrier to entry for running fully Bayesian meta-analysis models very low. But because of this, it is entirely possible that you, the end user, has no previous experience with MCMC methods at all. As a consequence, some of the error messages that come from the underlying algorithm can appear quite cryptic, like ``` Warning: There were 110 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup ``` -This vignette will walk through the basic intuition for what those mean, whether to be concerned about them, and how to tune the parameters underneath the hood. There are already a number of good resources that go more in depth on the topic so first I'll point you there: - -[Visual MCMC Diagnostics Using the Bayesplot Package](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html) - -[Scale Reduction in the Stan Manual](https://mc-stan.org/docs/2_21/reference-manual/notation-for-samples-chains-and-draws.html) - -[Divergent Transitions in the Stan Manual](https://mc-stan.org/docs/2_21/reference-manual/divergent-transitions.html) +These errors arise from Stan, the underlying implementation of the MCMC algorithm that baggr uses to fit the models. [Stan](https://mc-stan.org) is a general purpose programming language for Bayesian modeling; most notably it implements an algorithm called Hamiltonian Monte Carlo with No-U-Turn-Sampling, a general purpose sampler that can explore unknown distributions without much tuning by the end user. -These go more in depth than this discussion is going to, but if you find yourself needing a stronger treatment of the topics then these would be good starting points. +When you run a model with `baggr`, it kicks off a series of Markov Chains which sample the underlying model parameters that you are interested in. This vignette will walk through the basics of the algorithm, the associated behaviors, and how to diagnose them. # Behind the scenes: basics of MCMC The baggr package uses fully Bayesian methods to estimate meta-analysis models, meaning that it takes a prior (which if not specified by the user is assigned by default), and jointly with the data uses that prior to estimate a posterior according to Bayes' rule - $$P(\theta | X) = \dfrac{P(X | \theta) P(\theta)}{P(X)}$$ In order to calculate $P(X)$, we need some computationally expensive machinery that can calculate @@ -42,13 +40,13 @@ $$P(X) = \int P(X | \theta) P(\theta) d\theta$$ which is (in general) really difficult except for a few specific situations. In order to make these types of models easy to run, we use a method that approximates this integral to a high degree of numerical accuracy. -MCMC methods leverage an accept/reject proposal scheme by generating a potential value for $\theta$, which is some set of parmeters, and using the relative likelihoods ($P(\theta|X)$) at that new value to determine the probability of whether to accept or reject those proposals. The simplest of these methods, Metropolis-Hastings, pretty much ends there. But for meta-analysis models (and other models with of high dimension), the spaces that the random walk has to explore get really complicated. +MCMC methods leverage an accept/reject proposal scheme by generating a potential value for $\theta$, which is some set of parameters, and using the relative likelihoods ($P(\theta|X)$) at that new value to determine the probability of whether to accept or reject those proposals. The simplest of these methods, Metropolis-Hastings, pretty much ends there. But for meta-analysis models (and other models with large numbers of parameters), the spaces that the random walk has to explore get really complicated. So instead of plain-old Metropolis-Hastings, we use Hamiltonian Monte Carlo with No-U-Turn sampling, powered on the back end by Stan. This algorithm leverages the gradient, or the rate of change in the posterior density at a given parameter value, in order to more efficiently generate new parameter proposals. # Warnings like "divergent transitions" -The gradient is passed into a Hamiltonian equation, which simulates a path for where it thinks the posterior is headed. But this trajectory is based on the local behavior of the posterior, and depending on the geometry, if we move too far away from that point we might get more and more off base. Stan, on the back end, uses the `adapt_delta` parameter to determine how often it needs to check its simulated paths, re-evaluating the trajectory it is using. +The gradient is passed into a Hamiltonian equation, which simulates a path for where it thinks the posterior is headed. But this trajectory is based on the local behavior of the posterior, and depending on the geometry of the space, if we move too far away from that point we might get more and more off base. Stan, on the back end, uses the `adapt_delta` parameter to determine how often it needs to check its simulated paths, re-evaluating the trajectory it is using. If you make `adapt_delta` really big, you end up moving really slowly, but you also are less likely to fly off the true trajectory of the posterior distribution. But if you make it small, you go faster, etc. You get the idea. How high you should set the parameter depends on the extent to which the algorithm is properly modeling the paths. So if you encounter divergent transitions, it is often good to raise `adapt_delta`. This discussion is really oversimplifying things to build the intuition, but you can find an (accessible) deeper discussion [here](https://arxiv.org/abs/1701.02434) if it is of interest! @@ -73,17 +71,27 @@ fit <- baggr(schools, model = "rubin", fit ``` -If you still get a lot of divergent transitions after raising the `adapt_delta` really high, it may indicate an issue with your model! This is what is commonly referred to as the [folk theorem of statistical computing](https://statmodeling.stat.columbia.edu/2008/05/13/the_folk_theore/). +If you still get a lot of divergent transitions after raising the `adapt_delta` really high, it may indicate an issue with the way you are modeling your data! This is what is commonly referred to as the [folk theorem of statistical computing](https://statmodeling.stat.columbia.edu/2008/05/13/the_folk_theore/). + +# Convergence and Sample Size Warnings + +## $\widehat{R}$ being too low + +The other warning you get, that “r hat” statistics are too high, indicates that your chains have not converged. Because MCMC is not a deterministic algorithm, we run multiple chains of samples at different starting values for our parameters in order to check that each has converged to a good enough answer. We want to check that Markov chains initialized at different values get to similar places. + +The $\widehat{R}$ statistic measures the ratio of the average variance of samples within each chain to the variance of all of samples in all of the chains. If all chains are have converged to a common distribution, these will be the same and $\widehat{R}$ will be one. If the chains have not converged to a common answer, the $\widehat{R}$ statistic will be greater than one. If you get a high $\widehat{R}$, you can't trust the inference you get from the Markov chains because they have not converged. -# $\widehat{R}$ being too low +## Low ESS (Effective Sample Size) -The other warning you get, that “r hat” statistics are too high, indicates that the effective sample size (ESS) of your draws are too low and your chains have not converged. Because MCMC is not a deterministic algorithm, we run multiple chains of samples in order to check that each has converged to a good enough answer. +We also want to check that each chain has enough effective samples for every parameter in our model. Sample size (without the word effective), is just the number of samples you are drawing for each parameter. If these samples are independent, as you get a lot of them, various forms of the central limit theorem kick in, and your estimation error is low relative to the variation of the quantities you are trying to measure. "Effective sample size" measures how efficient the sampler is relative to completely independent samples of those parameters. If you have a very efficient set of samples, you can actually get _higher_ effective sample size than the actual number of draws you took! -We want to check that each chain is “mixing well,” meaning that we have enough effective draws after accounting for the correlation between samples. We also want to check that markov chains initialized at different values get to similar places. ESS gives us a measure of the first, and $\widehat{R}$ gives a measure of the second. +This is because effective sample size is a function of the correlation between draws that the sampler takes. If you have positive correlation between subsequent samples, estimation error goes down slower than it would if the draws were independent. Sometimes this correlation is so high that, given the number of iterations you ran, you really don't have enough effective parameter draws to infer anything from the samples you have. In this case, your samples are likely to not accurately represent the distribution of the parameter you are interested in. + +## Remedying convergence and effective sample size If you have a low ESS or high $\widehat{R}$, you are best off running each chain for more iterations until you get a higher ESS/lower $\widehat{R}$. To run chains for longer, you change the `iter` parameter in the `...` arguments to your `baggr()` call. -```{r} +```{r message = F} # runs for 10,000 iterations per chain instead of 2,000 fit <- baggr(schools, model = "rubin", pooling = "partial", prior_hypermean = normal(0,1), prior_hypersd = cauchy(0,2), @@ -98,7 +106,9 @@ fit # Hitting maximum treedepth -Sometimes you get warnings about hitting "maximum treedepth." This is less of an issue than the previous two warnings, which fundamentally throw the resulting infrence into question. Stan uses a binary tree to determine how many "leapfrog steps" to take at each iteration of the sampler. This has to do with the way in which the sampler approximates movement through the Hamiltonian system used to model the posterior. Basically, you need to figure out how far away you want to jump from your current sample draw (this is how we end up getting more efficient exploration of the space). In order to do that, Stan evaluates a "tree," which is basically a proposal equally spaced on either side of the current draw. Then it does this again at each of the two points that are drawn, and so on, until it meets a certain criteria for accepting the new value. The `max_treedepth` parameter controls how deeply the algorithm should search (how many sub-trees to create) before terminating. +Sometimes you get warnings about hitting "maximum treedepth." This is less of an issue than the previous two warnings, which fundamentally throw the resulting inference into question. Hamiltonian Monte Carlo has a whole lot of knobs that need tuning--given the gradient, you need to figure out how far away you want to jump from your current sample draw (this is how we end up getting more efficient exploration of the space). The No-U-Turn sampler automates this so that users don't have to specify these values manually. This is important in practice because different values are optimal for different data and models (and potentially different parts of the posterior distribution). + +In order to automatically pick how far away to jump, Stan evaluates a "tree," which is a set of proposed parameter values equally spaced on either side of the current value. Then it does this again at each of the two points that are drawn, and so on, until it meets a certain criteria for accepting the new value. The `max_treedepth` parameter controls how deeply the algorithm should search (how many sub-trees to create) before terminating. Hitting the maximum treedepth may be indicative of a deeper issue with the model, but if you are confident in your model and your data, you should increase the `max_treedepth` parameter that is passed to the `control` list that `rstan::sampling` uses. @@ -117,3 +127,15 @@ fit <- baggr(schools, model = "rubin", pooling = "partial", fit ``` +# Additional Resources + +If you want a more in-depth treatment of the topics discussed in this vignette, here are a number of good resources: + +[Visual MCMC Diagnostics Using the Bayesplot Package](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html) + +[Scale Reduction in the Stan Manual](https://mc-stan.org/docs/2_21/reference-manual/notation-for-samples-chains-and-draws.html) + +[Divergent Transitions in the Stan Manual](https://mc-stan.org/docs/2_21/reference-manual/divergent-transitions.html) + + + From a388df2401f49860c158ef7d10a2e688e613ec96 Mon Sep 17 00:00:00 2001 From: Witold Wiecek <20645798+wwiecek@users.noreply.github.com> Date: Thu, 20 Feb 2020 20:14:15 +0000 Subject: [PATCH 12/12] Update baggr_model_convergence.Rmd --- vignettes/baggr_model_convergence.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/baggr_model_convergence.Rmd b/vignettes/baggr_model_convergence.Rmd index 8df879e1..fa0a6c06 100644 --- a/vignettes/baggr_model_convergence.Rmd +++ b/vignettes/baggr_model_convergence.Rmd @@ -17,14 +17,14 @@ library(baggr) ``` -This is a vignette intended for baggr users who have no or little experience with Markov Chain Monte Carlo (MCMC) programs used for Bayesian inference. It is intended to help you understand common problems that may occur when running models. One of the nice things about using `baggr` is that it makes the barrier to entry for running fully Bayesian meta-analysis models very low. But because of this, it is entirely possible that you, the end user, has no previous experience with MCMC methods at all. As a consequence, some of the error messages that come from the underlying algorithm can appear quite cryptic, like +This is a vignette intended for _baggr_ users who have no or little experience with fitting Bayesian model. This vignette will help you understand common problems that may occur when running models. While `baggr` is designed to make running Bayesian meta-analysis models easy, there is no guarantee of correctness and you still have to ensure that the Bayesian inference is correct. The package will provide you with accurate error messages, but to a non-technical user these can appear quite cryptic, e.g. ``` Warning: There were 110 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup ``` -These errors arise from Stan, the underlying implementation of the MCMC algorithm that baggr uses to fit the models. [Stan](https://mc-stan.org) is a general purpose programming language for Bayesian modeling; most notably it implements an algorithm called Hamiltonian Monte Carlo with No-U-Turn-Sampling, a general purpose sampler that can explore unknown distributions without much tuning by the end user. +These errors arise from Stan, the underlying implementation of the MCMC algorithm that baggr uses to fit the models. [Stan](https://mc-stan.org) is a general purpose programming language for Bayesian modeling; most notably it implements an algorithm called Hamiltonian Monte Carlo with No-U-Turn-Sampling, an algorithm to explore unknown distributions without much tuning by the end user. When you run a model with `baggr`, it kicks off a series of Markov Chains which sample the underlying model parameters that you are interested in. This vignette will walk through the basics of the algorithm, the associated behaviors, and how to diagnose them.