summary with one line per _named_ parameter #43

jgabry · 2019-11-08T00:34:57Z

The discussion in #32 reminded me that one thing @andrewgelman and others have been wanting for a while (and that I think would also be useful as an alternative to the standard summary output) is a summary where each vector/matrix/array parameter only occupies a single line just like scalars. That is, each line corresponds to a named parameter rather than a parameter element.

(This could be its own thing or just an option to a custom print method for our existing summary objects.)

The quantities to display are debatable (e.g. maybe min of all the individual elements’ ESS, max of the Rhats, etc), so it would be good to get input from Andrew and anyone else interested.

jgabry · 2019-11-12T18:59:33Z

One possibility is something like what I put below (I think @andrewgelman and @avehtari and I discussed something like this one time when Aki was visiting). Not sure which columns we include, but a concise table like this would be a really nice option for models with many parameters:

variable   min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha[1]       4.5     4.5    1.0         881         300
tau[1]         3.8     3.8    1.1         386         311
omega[9]       1.5     7.4    1.0         552         272
Sigma[7,7]    -1.7     5.1    1.0         421         202

Here the numbers inside brackets indicate the dimensions (we could also omit if a scalar instead of putting [1]). At first people could confuse this with just showing the 9th element of omega, for example, but I think we could make it clear in the doc (and this also wouldn't be the default output so they would be intentionally selecting this). The columns min_mean and max_mean here are the min and max of the "mean" columns if the summary was computed for the individual elements (to give a sense of how much the means can differ) but I'm not sure if we want to include that or not.

@andrewgelman Did we end up coming up with a better idea than putting the dimensions in brackets?

Anyone else have any thoughts on this?

avehtari · 2019-11-13T07:02:04Z

Since brackets are used for indexing, this can cause confusion for R users. Would it be too much to have a column dim as the second column?

variable dim min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha    1       4.5     4.5    1.0         881         300
tau      1       3.8     3.8    1.1         386         311
omega    9       1.5     7.4    1.0         552         272
Sigma    7,7    -1.7     5.1    1.0         421         202

paul-buerkner · 2019-11-13T07:18:42Z

dim separately may be a little ugly and not not too helpful for filtering I would say as the structure would vary across variables, that is, sometimes 1 number sometimes 2 or more separated with commas.

jgabry · 2019-11-13T16:34:25Z

What if we just replace the brackets to a different symbol to refer to dimension? There are lots of possibilities. Here are just a few:

variable   min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha (1)       4.5     4.5    1.0         881         300
tau   (1)       3.8     3.8    1.1         386         311
omega (9)       1.5     7.4    1.0         552         272
Sigma (7,7)    -1.7     5.1    1.0         421         202

variable   min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha {1}       4.5     4.5    1.0         881         300
tau   {1}       3.8     3.8    1.1         386         311
omega {9}       1.5     7.4    1.0         552         272
Sigma {7,7}    -1.7     5.1    1.0         421         202

variable   min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha <1>       4.5     4.5    1.0         881         300
tau   <1>       3.8     3.8    1.1         386         311
omega <9>       1.5     7.4    1.0         552         272
Sigma <7,7>    -1.7     5.1    1.0         421         202

mjskay · 2019-11-13T16:54:03Z

@paul-buerkner I'm not sure I understand the filtering issue with dim in a separate column --- it seems to me it would be easier to filter if dims were not a part of the variable column? E.g. filter(., variable == "Sigma")?

paul-buerkner · 2019-11-13T21:16:46Z

@mjskay you have a good point there. Not sure how to best represent the dimension, but you are right that filtering after variable should be easily possible.

andrewgelman · 2019-11-13T21:19:35Z

If we're gonna do one of these, I'd prefer the parentheses, as the braces look weird to me and the angle brackets make me thing of upper and lower. But if we're going to do it this way, what about indicating this in the header, thus: variable (dims) min_mean max_mean max_rhat min_ess_bulk min_ess_tail A

…

On Nov 13, 2019, at 11:34 AM, Jonah Gabry ***@***.***> wrote: What if we just replace the brackets to a different symbol to refer to dimension? There are lots of possibilities. Here are just a few: variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail alpha (1) 4.5 4.5 1.0 881 300 tau (1) 3.8 3.8 1.1 386 311 omega (9) 1.5 7.4 1.0 552 272 Sigma (7,7) -1.7 5.1 1.0 421 202 variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail alpha {1} 4.5 4.5 1.0 881 300 tau {1} 3.8 3.8 1.1 386 311 omega {9} 1.5 7.4 1.0 552 272 Sigma {7,7} -1.7 5.1 1.0 421 202 variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail alpha <1> 4.5 4.5 1.0 881 300 tau <1> 3.8 3.8 1.1 386 311 omega <9> 1.5 7.4 1.0 552 272 Sigma <7,7> -1.7 5.1 1.0 421 202 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#43>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZYCUPJIO2MOH7YPDFUPHLQTQUBFANCNFSM4JKQOFFQ>.

jsocolar · 2021-05-21T04:10:09Z

Hey guys, I'm still playing with ways to summarize the huge draws array that I'm working with, and realized that this issue is pretty relevant as well. I've approached the problem as below. If the approach looks not-crazy then I could try to clean it up and turn it into a PR. This would involve better error handling, inferring the dimensions of the nonscalar parameters, and wrangling the output into whatever format you want.

The default (which is tailored to Stan) is to "roll up" the summaries for any parameter with a [ in its name. There's also an option to pass the parameter names for rolling up, again with the assumption that they'll show up in the draws array as ^parameter_name\\[.*. The logic for passing parameter names is that nonscalar parameters come in (at least) two different flavors. One is big random effects and covariance matrices that ought to be rolled up. The other is vectors of regression coefficients probably should not be rolled up.

Functions:

summary_rollup <- function (draws_summary, rollup_vars = NULL,
                            min_only = c("ess_bulk", "ess_tail"),
                            max_only = "rhat") {
  vars <- draws_summary$variable
  
  if (is.null(rollup_vars)) {
    vars_nonscalar <- grepl("\\[", vars)
  } else {
    vars_nonscalar <- as.logical(colSums(do.call(rbind, lapply(paste0("^", rollup_vars, "\\["), function(x){grepl(x, vars)}))))
  }
  ds_scalar <- draws_summary[!vars_nonscalar, ]
  ds_nonscalar <- draws_summary[vars_nonscalar, ]
  varnames_nonscalar <- gsub("\\[(.*)", "", ds_nonscalar$variable)
  summary_names <- names(draws_summary)[-1]
  names_minmax <- summary_names[!(summary_names %in% c(min_only, max_only))]
  split_nonscalar <- split(ds_nonscalar, varnames_nonscalar)[unique(varnames_nonscalar)] # [unique(varnames_nonscalar)] preserves the order of the names
  
  min_max <- do.call(rbind, lapply(split_nonscalar, rollup_helper_minmax, names = names_minmax))
  min_only <- do.call(rbind, lapply(split_nonscalar, rollup_helper_min, names = min_only))
  max_only <- do.call(rbind, lapply(split_nonscalar, rollup_helper_max, names = max_only))
  
  nonscalar_out <- cbind(min_max, max_only, min_only)
  
  out <- list(unrolled = ds_scalar, rolled = nonscalar_out)
  out
}

rollup_helper_minmax <- function(x, names){
  x <- x[, names]
  mm <- c(apply(x, 2, function(x) {c(min(x), max(x))}))
  names(mm) <- paste0(rep(names(x), each = 2), c("_min", "_max"))
  mm
}

rollup_helper_min <- function(x, names){
  x <- x[, names]
  min_only <- apply(x, 2, min)
  names(min_only) <- paste0(names(x), "_min")
  min_only
}

rollup_helper_max <- function(x, names){
  x <- x[, names]
  max_only <- apply(x, 2, max)
  names(max_only) <- paste0(names(x), "_max")
  max_only
}

Example:

ds <- summarise_draws(example_draws())
ds2 <- summarise_draws(2 * example_draws())
ds2$variable <- paste0("new_", ds2$variable)
draws_summary <- rbind(ds, ds2)

summary_rollup(draws_summary)
summary_rollup(draws_summary, rollup_vars = c("theta", "new_theta"))
summary_rollup(draws_summary, rollup_vars = "theta")

mjskay · 2021-05-21T06:12:24Z

This looks like it could be very useful to me! Two quick thoughts if we go down this road:

To be consistent with other functions I'd consider a verb-ish name, maybe rollup_summary() instead of summary_rollup()
For dimension parsing, we probably want to have a consistent scheme for doing this wherever we have to do it, suggesting we at the very least should have an internal helper function and maybe even expose this externally if we think it would be useful. This will allow us to have consistent rules for doing it, and maybe add options in the future for people that need to parse non-standard dimension naming schemes. The beginnings of such a function could be made by factoring out the dimension-parsing code from as_draws_rvars.draws_matrix(), which already handles a number of weird cases (like missing elements, dimensions that don't start at 1, and non-numeric dimensions). Discussion of this aspect specifically should probably happen on Coding matrix/vector variables in tidy form #61

jgabry added the feature New feature or request label Nov 8, 2019

jgabry changed the title ~~Concise summary for non-scalar parameters~~ summary with one line per _named_ parameter Nov 8, 2019

jsocolar mentioned this issue May 21, 2021

Roll up summaries of nonscalar variables #152

Open

4 tasks

mjskay linked a pull request Feb 2, 2024 that will close this issue

Rollups of summaries of variables with indices, version 2 #343

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

summary with one line per _named_ parameter #43

summary with one line per _named_ parameter #43

jgabry commented Nov 8, 2019 •

edited

Loading

jgabry commented Nov 12, 2019 •

edited

Loading

avehtari commented Nov 13, 2019

paul-buerkner commented Nov 13, 2019

jgabry commented Nov 13, 2019

mjskay commented Nov 13, 2019

paul-buerkner commented Nov 13, 2019

andrewgelman commented Nov 13, 2019 via email

jsocolar commented May 21, 2021 •

edited

Loading

mjskay commented May 21, 2021 •

edited

Loading

summary with one line per _named_ parameter #43

summary with one line per _named_ parameter #43

Comments

jgabry commented Nov 8, 2019 • edited Loading

jgabry commented Nov 12, 2019 • edited Loading

avehtari commented Nov 13, 2019

paul-buerkner commented Nov 13, 2019

jgabry commented Nov 13, 2019

mjskay commented Nov 13, 2019

paul-buerkner commented Nov 13, 2019

andrewgelman commented Nov 13, 2019 via email

jsocolar commented May 21, 2021 • edited Loading

mjskay commented May 21, 2021 • edited Loading

jgabry commented Nov 8, 2019 •

edited

Loading

jgabry commented Nov 12, 2019 •

edited

Loading

jsocolar commented May 21, 2021 •

edited

Loading

mjskay commented May 21, 2021 •

edited

Loading