-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross validation function does not seem to consider offset in model #372
Comments
Thanks for pointing this out, Marc. I had started thinking about that in this issue: #274 But I hadn't thought about the cross validation issue.
That is the behaviour now: library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)
dat <- dat[1:5, ]
m3 <- sdmTMB(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept), spatial = "off")
predict(m3)$est
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m3, offset = rep(0, 5))$est
#> [1] 7.259584 7.259584 7.259584 7.259584 7.259584 Created on 2024-09-20 with reprex v2.1.1 The problem is the cross validation function supplies Leaving the cross validation aside, my thinking was that once library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)
dat <- dat[1:5, ]
m3 <- sdmTMB(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept), spatial = "off")
predict(m3, newdata = dat)$est
#> [1] 7.259584 7.259584 7.259584 7.259584 7.259584
predict(m3, newdata = dat, offset = rep(0, 5))$est
#> [1] 7.259584 7.259584 7.259584 7.259584 7.259584 Created on 2024-09-20 with reprex v2.1.1 It appears glm() takes the approach of always applying the original offset regardless of newdata and regardless of what you put in the offset argument and glmmTMB takes the approach of always including the offset and erroring out if the offset argument is supplied. These approaches seem crazy to me and wouldn't work with the need to predict at a given offset (usually 0) for the purpose of standardizing for area swept. library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)
dat <- dat[1:5, ]
m <- glm(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept))
predict(m)
#> 1 4 5 6 7
#> 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m, newdata = dat[1:2,])
#> Warning in predictor + offset: longer object length is not a multiple of
#> shorter object length
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m, newdata = dat[1:2,], offset = rep(0, 2))
#> Warning in predictor + offset: longer object length is not a multiple of
#> shorter object length
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m, newdata = dat[1:2,], offset = log(dat$area_swept))
#> Warning in predictor + offset: longer object length is not a multiple of
#> shorter object length
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m, newdata = dat[1:2,], offset = rep(0, 5))
#> Warning in predictor + offset: longer object length is not a multiple of
#> shorter object length
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
m2 <- glmmTMB::glmmTMB(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept))
predict(m2)
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m2, newdata = dat[1:2,])
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m2, newdata = dat[1:2,], offset = rep(0, 2))
#> Warning in check_dots(..., .action = "warning"): unknown arguments: offset
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m2, newdata = dat[1:2,], offset = log(dat$area_swept))
#> Warning in check_dots(..., .action = "warning"): unknown arguments: offset
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m2, newdata = dat[1:2,], offset = rep(0, 5))
#> Warning in check_dots(..., .action = "warning"): unknown arguments: offset
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
m3 <- sdmTMB(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept), spatial = "off")
predict(m3)$est
#> [1] 4.985392 5.157243 4.920854 5.157243 5.046017
predict(m3, newdata = dat[1:2,])$est
#> [1] 7.259584 7.259584
predict(m3, newdata = dat[1:2,], offset = rep(0, 2))$est
#> [1] 7.259584 7.259584
predict(m3, newdata = dat[1:2,], offset = log(dat$area_swept))$est
#> Error in `predict()`:
#> ! Prediction offset vector does not equal number of rows in prediction
#> dataset.
predict(m3, newdata = dat[1:2,], offset = rep(0, 5))$est
#> Error in `predict()`:
#> ! Prediction offset vector does not equal number of rows in prediction
#> dataset. Created on 2024-09-20 with reprex v2.1.1 Leaving aside the above mess, I'll get the cross validation part working to close this issue... |
It's now fixed. Use a character offset in library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)
set.seed(1)
x <- sdmTMB_cv(catch_weight ~ 1,
data = dat, family = Gamma("log"),
offset = "area_swept", spatial = "off",
mesh = make_mesh(dat, c("X", "Y"), cutoff = 10), k_folds = 2
)
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.
y <- x$data[, c("catch_weight", "cv_predicted")]
plot(y$catch_weight, y$cv_predicted) Created on 2024-09-20 with reprex v2.1.1 As proof, you can see the variation in the prediction from this intercept-only model indicating the offset is getting included in the prediction. |
Great - thanks Sean. I agree that it is worrying to always apply the offset from the original data if newdata is specified. I guess for non-TMB models, using the |
Great idea. I've added a message: library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)
fit <- sdmTMB(
catch_weight ~ 1,
data = dat,
family = Gamma("log"),
offset = "area_swept",
spatial = "off"
)
pred <- predict(fit)
pred <- predict(fit, offset = rep(0, nrow(dat)))
pred <- predict(fit, newdata = qcs_grid, offset = rep(0, nrow(qcs_grid)))
pred <- predict(fit, newdata = qcs_grid)
#> Fitted object contains an offset but the offset is `NULL` in
#> `predict.sdmTMB()`.
#> Prediction will proceed assuming the offset vector is 0 in the prediction.
#> Specify an offset vector in `predict.sdmTMB()` to override this. Created on 2024-09-23 with reprex v2.1.1 |
Hello,
I have been enjoying using this package very much - thank you for the great tool.
I have just started moving to a model that considers swept area in an offset term. When conducting a cross validation fitting using
sdmTMB_cv
one defines offset as a character string describing the data variable (e.g.offset = "logSweptArea"
). However, using thepredict.sdmTMB
one must provide a vector of values, equal to the number of rows in the data (e.g.offset = dat$logSweptArea
).The issue is that the offset information is not being passed to the prediction within
sdmTMB_cv
:sdmTMB/R/cross-val.R
Line 372 in cb83a62
It looks like I can fix this manually afterwards, but it would be worth fixing in the function to avoid confusion.
Also, when not predicting to a new dataset, it might be more logical to have the
predict.sdmTMB(fit)
automatically use the offset infit$offset
Cheers,
Marc
The text was updated successfully, but these errors were encountered: