FR/Bug: Calculate Sigma (`get_sigma`) appropiately for `vglm` (and `vgam`) models #578

iago-pssjd · 2022-06-07T10:13:25Z

First, it is a bug, since, as stated in ?get_sigma

By default, get_sigma() tries to extract sigma by calling stats::sigma(). If the model-object has no sigma() method, the next step is calculating sigma as square-root of the model-deviance divided by the residual degrees of freedom. Finally, if even this approach fails, and x is a mixed model, the residual standard deviation is accessed using the square-root from get_variance_residual().

Therefore, when the model has no sigma() method get_sigma tries a calculation that not always is suitable, at least for many of vglm models. For example (from help("tobit", package = "VGAM")):

tdata <- data.frame(x2 = seq(-1, 1, length = (nn <- 100)))
set.seed(1)
Lower <- 1; Upper <- 4  # For the nonstandard Tobit model
tdata <- transform(tdata,
                   Lower.vec = rnorm(nn, Lower, 0.5),
                   Upper.vec = rnorm(nn, Upper, 0.5))
meanfun1 <- function(x) 0 + 2*x
meanfun2 <- function(x) 2 + 2*x
meanfun3 <- function(x) 2 + 2*x
meanfun4 <- function(x) 3 + 2*x
tdata <- transform(tdata,
  y1 = rtobit(nn, mean = meanfun1(x2)),  # Standard Tobit model
  y2 = rtobit(nn, mean = meanfun2(x2), Lower = Lower, Upper = Upper),
  y3 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec),
  y4 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec))
fit1 <- vglm(y1 ~ x2, tobit, data = tdata)
get_sigma(fit1)
[1] 2.694853
attr(,"class")
[1] "insight_aux" "numeric"

while the right sigma value is

exp(coef(fit1)["(Intercept):2"]) # look also at coef(fit1, matrix = TRUE)
(Intercept):2 
    0.9395564

(see https://stats.oarc.ucla.edu/r/dae/tobit-models/ and the book "Vector Generalized Linear and Additive Models" by Thomas Yee). In fact, in this package there are 2 points to consider:

on one hand (section 1.2.1 of the book), sigma is not treated as a scale parameter, like in GLM's, but it is generally (not always) modelled (as also happens with gamlss package), therefore it is found as an intercept;
on the other hand (sections 1.4 and 3.5.1 of the book), it allows for multiple and multivariate responses. In this case, should be sigma maybe a list, with each element being a matrix corresponding to the multivariate responses (using the intercepts (Intercept):2, (Intercept):4, etc.)? (I refer to the distinction between multiple and multivariate responses made in section 3.5.1 of the book, in the second half of page 115). This second point would be related to this issue I recently added to performance: FR: improve formatting of multivariate models by model_performance performance#430

Thanks!

The text was updated successfully, but these errors were encountered:

strengejacke · 2022-06-24T06:32:22Z

Any example for multivariate models?

iago-pssjd · 2022-06-27T10:15:13Z

Hi @strengejacke.

I assume you ask for the multivariate response models. I cite a paragraph in the same page of the book I mentioned above

Some care is needed to distinguish between multiple responses and multivariate responses. Multiple responses are treated independently of each other, and are to be inputted side-by-side on the LHS of the formula using cbind().Eachresponse may be a vector, of dimension Q1. Multivariate responses correspond to Q1 > 1, and are handled by some full-likelihood model that takes into account their joint distribution.

The example given by the author is the family binormal, so

set.seed(123); nn <- 1000
bdata <- data.frame(x2 = runif(nn), x3 = runif(nn))
bdata <- transform(bdata, y1 = rnorm(nn, 1 + 2 * x2),
                          y2 = rnorm(nn, 3 + 4 * x2))
fit1 <- vglm(cbind(y1, y2) ~ x2,
             binormal(), data = bdata, trace = FALSE)
coef(fit1, matrix = TRUE)
               mean1    mean2 loglink(sd1) loglink(sd2) rhobitlink(rho)
(Intercept) 1.021178 2.929393  0.009053151  -0.02282574      0.05231844
x2          2.042807 4.101541  0.000000000   0.00000000      0.00000000

In this case sigma would be the exp of the diagonal matrix 0.009053151, -0.02282574 . (Further, we would have an intercept-vector 1.021178, 2.929393 and an x2-slope vector 2.042807, 4.101541). Other examples would be with the families trinormal or bistudentt (as shown in the help pages of these functions). More implamented bivariate distributions are given in table 13.1 (page 382) of the book.

Then, first to get the parameters and other coefficients of VGAM models, and second, to distinguish between multiple responses and multivariate responses, one should look at the vglm-object@family. Therefore, the way to get them, and in particular sigma, is not so immediate as

insight/R/get_sigma.R

Lines 96 to 101 in a81c92c

    
           .get_sigma.VGAM <- function(x, verbose = TRUE, ...) { 
        
             s <- tryCatch(exp(stats::coef(x)[["(Intercept):2"]]), 
        
                           function(e) NULL) 
        
             class(s) <- c("insight_aux", class(s)) 
        
             s 
        
           }

Let's look some other examples.

Multiple responses and multivariate responses simultaneously (got from ?bistudentt)

nn <- 1000
mydof <- logloglink(1, inverse = TRUE)
ymat <- cbind(rt(nn, df = mydof), rt(nn, df = mydof))
bdata <- data.frame(y1 = ymat[, 1], y2 = ymat[, 2],
                    y3 = ymat[, 1], y4 = ymat[, 2], x2 = runif(nn))
## Not run:  plot(ymat, col = "blue") 
fit1 <- vglm(cbind(y1, y2, y3, y4) ~ 1,  # 2 responses, e.g., (y1,y2) is the 1st
             fam = bistudentt,  # crit = "coef",  # Sometimes a good idea
             data = bdata, trace = FALSE)
coef(fit1, matrix = TRUE)
            logloglink(df1) rhobitlink(rho1) logloglink(df2) rhobitlink(rho2)
(Intercept)        1.072883      -0.01766264        1.072882      -0.01765367

In this case, as we had two responses, we would have two sigma matrices, both having 1 in all the elements of the main diagonal, and having as secondary diagonal the values rho1 (maybe = rhobitlink(-0.01766264, inverse = TRUE)?) and rho2 (maybe = rhobitlink(-0.01765367, inverse = TRUE)?) respectively.

Multiple responses (got from ?tobit)

tdata <- data.frame(x2 = seq(-1, 1, length = (nn <- 100)))
set.seed(1)
Lower <- 1; Upper <- 4  # For the nonstandard Tobit model
tdata <- transform(tdata,
                   Lower.vec = rnorm(nn, Lower, 0.5),
                   Upper.vec = rnorm(nn, Upper, 0.5))
meanfun1 <- function(x) 0 + 2*x
meanfun2 <- function(x) 2 + 2*x
meanfun3 <- function(x) 2 + 2*x
meanfun4 <- function(x) 3 + 2*x
tdata <- transform(tdata,
  y1 = rtobit(nn, mean = meanfun1(x2)),  # Standard Tobit model
  y2 = rtobit(nn, mean = meanfun2(x2), Lower = Lower, Upper = Upper),
  y3 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec),
  y4 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec))
fit4 <- vglm(cbind(y3, y4) ~ x2,
             tobit(Lower = rep(with(tdata, Lower.vec), each = 2),
                   Upper = rep(with(tdata, Upper.vec), each = 2),
                   byrow.arg = TRUE),
             data = tdata, crit = "coeff", trace = FALSE)
coef(fit4, matrix = TRUE)
                 mu1 loglink(sd1)      mu2 loglink(sd2)
(Intercept) 1.950757   0.04030946 1.944842  0.008811054
x2          1.899930   0.00000000 2.122347  0.000000000

In this case, we would have two sigma values, one for each response, being coef(fit4, matrix = TRUE)["(Intercept)","loglink(sd1)"] (or also coef(fit4)["(Intercept):2"]) and coef(fit4, matrix = TRUE)["(Intercept)","loglink(sd2)"] (or also coef(fit4)["(Intercept):4"])

A more trivial example (from ?vglm): same as for glm

d.AD <- data.frame(treatment = gl(3, 3),
                         outcome = gl(3, 1, 9),
                         counts = c(18,17,15,20,10,20,25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson,
                 data = d.AD)
vglm.D93 <- vglm(counts ~ outcome + treatment, family = poissonff,
                 data = d.AD, trace = FALSE)
sigma(glm.D93)
[1] 1.13238
sigma(vglm.D93)
[1] 1.13238

Finally, other arguments given to family may be relevant, like zero, nointercept and other (table 18.6, page 518 in the book), even this may appear summarized in some way by the vglm-object@family object.

iago-pssjd · 2022-06-28T07:20:24Z

Another thought related to this issue is if sigma should be reported among the "performance" values of a model, when it is being estimated by the model. Probably, reporting it as until now, among the "parameters", is better, but would look even better if we could detect it by means of reading loglink(sd) or sigma (or some other alternative) instead of (Intercept):2.

strengejacke · 2022-06-28T08:04:39Z

I think this will take some time until I (or someone else) can look into this, seems not to be very straightforward. If you are able to, we of course also appreciate PRs on this issue :-)

iago-pssjd changed the title ~~FR/Bug: Calculate Sigma (get_sigma) appropiately for vglm models~~ FR/Bug: Calculate Sigma (get_sigma) appropiately for vglm (and vgam) models Jun 7, 2022

strengejacke added Bug 🐛 Something isn't working 3 investigators ❔❓ Need to look further into this issue labels Jun 7, 2022

iago-pssjd mentioned this issue Jun 8, 2022

FR: improve formatting of multivariate models by model_performance easystats/performance#430

Open

strengejacke added a commit that referenced this issue Jun 24, 2022

#578

fac4d09

strengejacke added the Low priority 😴 This issue can be easily workaround or happens only in edge cases label Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR/Bug: Calculate Sigma (`get_sigma`) appropiately for `vglm` (and `vgam`) models #578

FR/Bug: Calculate Sigma (`get_sigma`) appropiately for `vglm` (and `vgam`) models #578

iago-pssjd commented Jun 7, 2022 •

edited

Loading

strengejacke commented Jun 24, 2022

iago-pssjd commented Jun 27, 2022

iago-pssjd commented Jun 28, 2022

strengejacke commented Jun 28, 2022

FR/Bug: Calculate Sigma (get_sigma) appropiately for vglm (and vgam) models #578

FR/Bug: Calculate Sigma (get_sigma) appropiately for vglm (and vgam) models #578

Comments

iago-pssjd commented Jun 7, 2022 • edited Loading

strengejacke commented Jun 24, 2022

iago-pssjd commented Jun 27, 2022

iago-pssjd commented Jun 28, 2022

strengejacke commented Jun 28, 2022

FR/Bug: Calculate Sigma (`get_sigma`) appropiately for `vglm` (and `vgam`) models #578

FR/Bug: Calculate Sigma (`get_sigma`) appropiately for `vglm` (and `vgam`) models #578

iago-pssjd commented Jun 7, 2022 •

edited

Loading