You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the coefficient residual distribution when running multiple repetitions. The residuals are expected to follow a normal distribution as explained in your basics documentation. However, in the examples, you generate new samples at each repetition. I’m wondering if this applies also to real-life (fixed-size) datasets.
To experiment this, I applied, on the 401(k) dataset, the same approach you follow except I’m sampling from the same original dataset at each repetition. I observe that the distribution of residuals narrows down as the sample size increases (whether I apply PLR or IRM).
Similarly, if I run a model using n_rep > 1 and I then compare the provided confidence intervals with the observed quantiles (i.e. using .all_coef), the observed quantiles are narrower than the calculated CIs.
If building confidence intervals requires independent data samples, what is the purpose of using n_rep > 1?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello DoubleML team,
I have a question regarding the coefficient residual distribution when running multiple repetitions. The residuals are expected to follow a normal distribution as explained in your basics documentation. However, in the examples, you generate new samples at each repetition. I’m wondering if this applies also to real-life (fixed-size) datasets.
To experiment this, I applied, on the 401(k) dataset, the same approach you follow except I’m sampling from the same original dataset at each repetition. I observe that the distribution of residuals narrows down as the sample size increases (whether I apply PLR or IRM).
Similarly, if I run a model using
n_rep > 1
and I then compare the provided confidence intervals with the observed quantiles (i.e. using.all_coef
), the observed quantiles are narrower than the calculated CIs.If building confidence intervals requires independent data samples, what is the purpose of using
n_rep > 1
?Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions