Coefficient Distribution and Confidence Intervals #280

AmauryVVK · 2024-12-19T11:12:45Z

AmauryVVK
Dec 19, 2024

Hello DoubleML team,

I have a question regarding the coefficient residual distribution when running multiple repetitions. The residuals are expected to follow a normal distribution as explained in your basics documentation. However, in the examples, you generate new samples at each repetition. I’m wondering if this applies also to real-life (fixed-size) datasets.

To experiment this, I applied, on the 401(k) dataset, the same approach you follow except I’m sampling from the same original dataset at each repetition. I observe that the distribution of residuals narrows down as the sample size increases (whether I apply PLR or IRM).

Similarly, if I run a model using n_rep > 1 and I then compare the provided confidence intervals with the observed quantiles (i.e. using .all_coef), the observed quantiles are narrower than the calculated CIs.

If building confidence intervals requires independent data samples, what is the purpose of using n_rep > 1?

Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coefficient Distribution and Confidence Intervals #280

{{title}}

Replies: 0 comments

Select a reply

Coefficient Distribution and Confidence Intervals #280

AmauryVVK Dec 19, 2024

Replies: 0 comments

AmauryVVK
Dec 19, 2024