-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR/Bug: Calculate Sigma (get_sigma
) appropiately for vglm
(and vgam
) models
#578
Comments
get_sigma
) appropiately for vglm
modelsget_sigma
) appropiately for vglm
(and vgam
) models
Any example for multivariate models? |
Hi @strengejacke. I assume you ask for the multivariate response models. I cite a paragraph in the same page of the book I mentioned above
The example given by the author is the family set.seed(123); nn <- 1000
bdata <- data.frame(x2 = runif(nn), x3 = runif(nn))
bdata <- transform(bdata, y1 = rnorm(nn, 1 + 2 * x2),
y2 = rnorm(nn, 3 + 4 * x2))
fit1 <- vglm(cbind(y1, y2) ~ x2,
binormal(), data = bdata, trace = FALSE)
coef(fit1, matrix = TRUE)
mean1 mean2 loglink(sd1) loglink(sd2) rhobitlink(rho)
(Intercept) 1.021178 2.929393 0.009053151 -0.02282574 0.05231844
x2 2.042807 4.101541 0.000000000 0.00000000 0.00000000 In this case Then, first to get the parameters and other coefficients of Lines 96 to 101 in a81c92c
Let's look some other examples.
nn <- 1000
mydof <- logloglink(1, inverse = TRUE)
ymat <- cbind(rt(nn, df = mydof), rt(nn, df = mydof))
bdata <- data.frame(y1 = ymat[, 1], y2 = ymat[, 2],
y3 = ymat[, 1], y4 = ymat[, 2], x2 = runif(nn))
## Not run: plot(ymat, col = "blue")
fit1 <- vglm(cbind(y1, y2, y3, y4) ~ 1, # 2 responses, e.g., (y1,y2) is the 1st
fam = bistudentt, # crit = "coef", # Sometimes a good idea
data = bdata, trace = FALSE)
coef(fit1, matrix = TRUE)
logloglink(df1) rhobitlink(rho1) logloglink(df2) rhobitlink(rho2)
(Intercept) 1.072883 -0.01766264 1.072882 -0.01765367 In this case, as we had two responses, we would have two
tdata <- data.frame(x2 = seq(-1, 1, length = (nn <- 100)))
set.seed(1)
Lower <- 1; Upper <- 4 # For the nonstandard Tobit model
tdata <- transform(tdata,
Lower.vec = rnorm(nn, Lower, 0.5),
Upper.vec = rnorm(nn, Upper, 0.5))
meanfun1 <- function(x) 0 + 2*x
meanfun2 <- function(x) 2 + 2*x
meanfun3 <- function(x) 2 + 2*x
meanfun4 <- function(x) 3 + 2*x
tdata <- transform(tdata,
y1 = rtobit(nn, mean = meanfun1(x2)), # Standard Tobit model
y2 = rtobit(nn, mean = meanfun2(x2), Lower = Lower, Upper = Upper),
y3 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec),
y4 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec))
fit4 <- vglm(cbind(y3, y4) ~ x2,
tobit(Lower = rep(with(tdata, Lower.vec), each = 2),
Upper = rep(with(tdata, Upper.vec), each = 2),
byrow.arg = TRUE),
data = tdata, crit = "coeff", trace = FALSE)
coef(fit4, matrix = TRUE)
mu1 loglink(sd1) mu2 loglink(sd2)
(Intercept) 1.950757 0.04030946 1.944842 0.008811054
x2 1.899930 0.00000000 2.122347 0.000000000 In this case, we would have two sigma values, one for each response, being
d.AD <- data.frame(treatment = gl(3, 3),
outcome = gl(3, 1, 9),
counts = c(18,17,15,20,10,20,25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson,
data = d.AD)
vglm.D93 <- vglm(counts ~ outcome + treatment, family = poissonff,
data = d.AD, trace = FALSE)
sigma(glm.D93)
[1] 1.13238
sigma(vglm.D93)
[1] 1.13238 Finally, other arguments given to |
Another thought related to this issue is if |
I think this will take some time until I (or someone else) can look into this, seems not to be very straightforward. If you are able to, we of course also appreciate PRs on this issue :-) |
First, it is a bug, since, as stated in
?get_sigma
Therefore, when the model has no
sigma()
methodget_sigma
tries a calculation that not always is suitable, at least for many ofvglm
models. For example (fromhelp("tobit", package = "VGAM")
):while the right
sigma
value is(see https://stats.oarc.ucla.edu/r/dae/tobit-models/ and the book "Vector Generalized Linear and Additive Models" by Thomas Yee). In fact, in this package there are 2 points to consider:
sigma
is not treated as a scale parameter, like in GLM's, but it is generally (not always) modelled (as also happens withgamlss
package), therefore it is found as an intercept;sigma
maybe a list, with each element being a matrix corresponding to the multivariate responses (using the intercepts(Intercept):2
,(Intercept):4
, etc.)? (I refer to the distinction between multiple and multivariate responses made in section 3.5.1 of the book, in the second half of page 115). This second point would be related to this issue I recently added toperformance
: FR: improve formatting of multivariate models bymodel_performance
performance#430Thanks!
The text was updated successfully, but these errors were encountered: