You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thanks for this great packages. I am using it for my Phd-Thesis. Now, I found something very counterintuitive and suggest to change this. This "bug" occurred while doing a logistic regression with missing data. My McFadden-R^2 fell and it took me a while to figure out why. So, I attach you a reproducible example. There you can see that the R^2 is different if you plug in a model with a data set with missings and if you do not.
This is counterintuitive because the glm-function does not distinguish between these two data sets as it deletes already all observations with missing data. So the McFadden-R^2 should not change neither. Mathematically this is because the calculation of observations with missing data is different between the full model and the empty model. So, I suggest to use the function complete.cases before calculating the loglikehood for the two models in order to be more intuitive.
Let me know what you think about this suggestion. Thank you in advance!
Find below my reproducible example. I hope this is correctly done as I am new to reprex
# Delete environment
rm(list= ls())
# Package namespackages<- c("ISLR", "blorr")
# Install packages not yet installedinstalled_packages<-packages%in% rownames(installed.packages())
if (any(installed_packages==FALSE)) {
install.packages(packages[!installed_packages])
}
# Packages loadinginvisible(lapply(packages, library, character.only=TRUE))
# set seed for reproducibility
set.seed(176)
# remove columns not needed for regressiondataset<- subset(Smarket, select=-c(Year, Today))
# define function that creates NAs and execute itcreateNAs<-function (x, pctNA=0.1) {
n<- nrow(x)
p<- ncol(x)
NAloc<- rep(FALSE, n*p)
NAloc[sample.int(n*p, floor(n*p*pctNA))] <-TRUEx[matrix(NAloc, nrow=n, ncol=p)] <-NAreturn(x)
}
dataset<- createNAs(dataset, 0.1)
# do first regression without complete casesglm.fit1<- glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data=dataset, family=binomial)
blr_rsq_mcfadden(glm.fit1)
#> [1] 0.4616006# do second regression with complete casesdataset<-dataset[complete.cases(dataset),]
glm.fit2<- glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data=dataset, family=binomial)
blr_rsq_mcfadden(glm.fit2)
#> [1] 0.003519264# NOTE THAT THERE IS A DIFFERENCE BETWEEN THE TWO MC FADDEN R^2!
The text was updated successfully, but these errors were encountered:
Thank you very much for bringing this to our attention. Based on your suggestion, we have decided to review blorr API using data sets with missing data and fix the bugs that arise subsequently.
Dear developers,
First, thanks for this great packages. I am using it for my Phd-Thesis. Now, I found something very counterintuitive and suggest to change this. This "bug" occurred while doing a logistic regression with missing data. My McFadden-R^2 fell and it took me a while to figure out why. So, I attach you a reproducible example. There you can see that the R^2 is different if you plug in a model with a data set with missings and if you do not.
This is counterintuitive because the glm-function does not distinguish between these two data sets as it deletes already all observations with missing data. So the McFadden-R^2 should not change neither. Mathematically this is because the calculation of observations with missing data is different between the full model and the empty model. So, I suggest to use the function
complete.cases
before calculating the loglikehood for the two models in order to be more intuitive.Let me know what you think about this suggestion. Thank you in advance!
Find below my reproducible example. I hope this is correctly done as I am new to reprex
The text was updated successfully, but these errors were encountered: