Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with link_inverse() and nnet::multinom() #603

Open
strengejacke opened this issue Jul 31, 2022 · 11 comments
Open

Issue with link_inverse() and nnet::multinom() #603

strengejacke opened this issue Jul 31, 2022 · 11 comments
Labels
3 investigators ❔❓ Need to look further into this issue Bug 🐛 Something isn't working

Comments

@strengejacke
Copy link
Member

vincentarelbundock/marginaleffects#404

@strengejacke strengejacke added Bug 🐛 Something isn't working 3 investigators ❔❓ Need to look further into this issue labels Jul 31, 2022
@tomwenseleers
Copy link

tomwenseleers commented Aug 24, 2022

The correct link & inverse link functions for multinomial (both for nnet::multinom and mclogit::mblogit models - the latter doesn't seem currently supported) (assuming mu and eta would be predictions on the link and response scale with all the outcome levels in different columns) would be

# this is the link function for multinomial
# = generalized logit
inverse_softMax <- function(mu) {
  log_mu <- log(mu)
  return(sweep(log_mu, 1, STATS=rowMeans(log_mu), FUN="-")) # we let the log(odds) sum to zero - these predictions are referred to as type="latent" in the emmeans package
}

# this is the inverse link function for multinomial (here assuming eta is a matrix with different outcome levels as columns)
# = inverse generalized logit
softMax <- function(eta){
  exp_eta <- exp(eta)
  return(sweep(exp_eta, 1, STATS=rowSums(exp_eta), FUN="/"))
}

Now it seems that insight treats multinomial as logistic, but that's not quite correct - it should treat it as multinomial, which is rather a sort of multivariate logistic...

@bwiernik
Copy link
Contributor

Please feel free to submit a PR fixing this!

@tomwenseleers
Copy link

I can have a try - though I should say I'm not expert in the inner workings of easystats. E.g. I haven't checked whether easystats assumes predictions of multinomial models on either the link or response scale come out in wide format (with different columns for each of the response levels, as I assumed would be the case above) or in long format...

@strengejacke
Copy link
Member Author

I don't think you need to go that deep into detail. If I understand right, this issue is about adding/revising the link_inverse() and link_function() methods for class multinom.

@tomwenseleers
Copy link

Well there seem to be some other issues related to support of nnet::multinom multinomial models - e.g. insight::get_predicted_ci() doesn't work yet, insight::get_predicted() works but has the predictions come out in long format which is a little nonstandard for multinomial models, where I would expect the different outcome levels to be in different columns (as multinomial really is a multivariate version of logistic), insight::get_predicted(multinom_model, predict="link") doesn't work (if predictions were coming out in wide format, inverse_softMax(insight::get_predicted(multinom_model, predict="expectation")) would work, but now it doesn't because predictions are not returned in wide format), insight::model_info(multinom_model) would return $is_binomial as TRUE, which is not correct, $is_multivariate I think should evaluate to TRUE (if predictions are returned in multiple columns per outcome level), $link_function gives "logit" which is wrong (it should be "generalized logit", ie inverse softMax) and $family gives "binomial", which should be "multinomial". Question also is what other packages would depend on insight::link_inverse() and insight::link_function() and how they expect the model predictions to look like (long or wide format) - I saw marginaleffects call it for both nnet::multinom multinomial models and mclogit::mblogit multinomial models (the latter are in fact currently not supported by insight) & I saw that it was assuming a logit link whereas in fact it's an inverse softMax / generalized logit link. The nnet::multinom predict function currently doesn't support type="link", but one could get that from an inverse softMax transform of the expected values. The mblogit predict function does support type="link" but drops the first outcome level (reference level), which implicitly is zero; adding a zero column and row centering there would be required to get the predictions on the link scale with all outcome levels included. All this came up in the contect of making the marginaleffects package correctly support nnet::multinom and mclogit::mblogit multinomial models, see vincentarelbundock/marginaleffects#469 vincentarelbundock/marginaleffects#404. If you would like to support mclogit::mblogit multinomial models also take into account that the variance-covariance matrix there comes out in a different order than in nnet::multinom (in column-major order rather than row-major order, which is fixed in emmeans, https://github.com/rvlenth/emmeans/blob/master/R/multinom-support.R). To get standard errors on the predictions and confidence intervals maybe https://github.com/melff/mclogit/blob/master/pkg/R/mblogit.R could provide some inspiration (though the vcov matrix is arranged differently than that of nnet::multinom). So all in all I think the problem above is only part of the problem and points at a more fundamental problem in the treatment of/support for multinomial models... Which is why I am a little hesitant to start pushing PRs without further guidance on what we would like to achieve & expected behaviour... Maybe title for issue would better be "Provide correct support for multinomial nnet::multinom and mclogit::mblogit multinomial models" then...

@strengejacke
Copy link
Member Author

where I would expect the different outcome levels to be in different columns

But you have different predictions for each level, right? Thus, predictions would need to be in separate columns, too. Therefor, we decided to use the long format, which is probably easier to deal with, both for printing and subsequent processing of the returned predictions.

@tomwenseleers
Copy link

tomwenseleers commented Aug 25, 2022

Yes that's the question if that's what you want - for a multivariate type model like multinomial it's standard that predictions for each outcome level would come out in different columns... And to convert between predictions on the response and on the link / latent scale it's also easiest if the predictions for the different outcome levels would be in different columns...

@vincentarelbundock
Copy link
Contributor

If you look at the default get_predicted.default method, you'll see that it works in 4 steps: https://github.com/easystats/insight/blob/main/R/get_predicted.R#L195

In the last step, we do "final preparations" by calling .get_predicted_out(). In that function, we check if the predictions are a matrix, and we reshape them into long format: https://github.com/easystats/insight/blob/main/R/get_predicted.R#L599

So predictions stay in matrix shape, with different levels in different columns for almost the whole time, until the very end when we return them in a convenient format for users.

@vincentarelbundock
Copy link
Contributor

Probably more relevant is this method: https://github.com/easystats/insight/blob/main/R/get_predicted_ordinal.R#L133

And yeah, it's probably best if you describe the different things you want to do in concrete code-line-specific way before starting a PR. Also, I'm interested in this issue, so I would be happy to provide guidance if needed.

@vincentarelbundock
Copy link
Contributor

Notes transferred from the other thread.

  • Softmax and its inverse are unit-specific transformations: they normalize based on the sum or mean of outcomes for each response level. If we have an NxP matrix of predictions with P response level, then we apply softmax separately to each row.
  • The functions copied below are not one-to-one. This means we cannot easily go from link to response and back again. In particular, note that predict.multinom() does not support type="link", and we probably can't just go back and forth.
  • Recommendations? Not sure, but possibly:
    • link_inverse() should return an appropriate softmax function which accepts a matrix, and returns an informative warning if users try to feed it a vector.
    • Do not use this link_inverse() in get_predicted() to get confidence intervals. Unless someone can do a deeper dive into these issues, it seems more prudent to stick with the type values that are supported by the original package.
inverse_softMax <- function(mu) {
  log_mu <- log(mu)
  return(sweep(log_mu, 1, STATS=rowMeans(log_mu), FUN="-")) 
}
softMax <- function(eta){
  exp_eta <- exp(eta)
  return(sweep(exp_eta, 1, STATS=rowSums(exp_eta), FUN="/"))
}

@tomwenseleers
Copy link

Just to chime in here: predict.multinom indeed only has type="probs" (="response") implemented (annoyingly enough). However, to go from the response type to the link type as returned by mclogit::predict.mblogit with type="link" is easy, that would be using transformation

mu = nnet:::predict.multinom(fit_multinom, type="probs")

inverse_softMax_tolink <- function(mu) {
  log_mu <- log(mu)
  # we normalize log(odds) so that first outcome level comes out as zero, but in contrast to the behaviour of predict.mblogit with type="link" we do not drop that zero column; instead, we leave it so that one can easily transform between the response and link scale & keep all outcome levels
  return(sweep(log_mu, 1, STATS=log_mu[,1], FUN="-")) 
}

In this case, by keeping the zero column for the reference level, inverse_softMax_tolink(softMax(mu)) would return mu, as should be the case.

For multinomial models and predictions on the link scale, the emmeans package uses predictions on a centered logit scale, where instead of normalizing the reference level to zero, the sum of the logits over all outcome levels sum to zero (usually this is the prediction scale one is actually interested in) - emmeans refers to that as type="latent".

This can be gotten from the predictions on the response scale using the function I mentioned above

inverse_softMax_tolatent <- function(mu) {
  log_mu <- log(mu)
  return(sweep(log_mu, 1, STATS=rowMeans(log_mu), FUN="-")) 
}

Here too, inverse_softMax_tolatent(softMax(mu)) would return mu.

I think for multinomial models one might as well drop support for predictions on the link scale where the reference level would come out as zero (this is never ever used as far as I know) and only support the centered logit type="latent" scale, as in emmeans (in addition to type="probs"/"response" of course). I contacted Brian Ripley, the package maintainer of nnet, a while ago to ask if he could add type="link" and type="latent" to the nnet::predict.multinom function, but no reply unfortunately, so I fear that is not going to happen. If you have the predictions with type="probs" it is easy enough though to get type="link" or type="latent" as shown above, just by transformating to the link scale & centering the logits either on the first reference level (type="link") or centering the logits over all outcome levels so that they sum to zero (type="latent")...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue Bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants