-
Notifications
You must be signed in to change notification settings - Fork 66
/
99-confidence-intervals.Rmd
91 lines (71 loc) · 3.22 KB
/
99-confidence-intervals.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# Confidence interval and p-value refresher
So you think you know what a confidence interval and a p-value represent?
I've created the following small simulation example to hopefully make it clear.
```{r}
library(dplyr)
library(ggplot2)
set.seed(12345) # try different values
```
Imagine you are measuring the temperature of a cup of water 10 times but you have an old really bad thermometer. The true temperature is 3 degrees Celsius and the standard deviation on the sampling error is 5.
```{r}
# Simulate data:
mu <- 3
sigma <- 5
n <- 10
y <- rnorm(n = n, mean = mu, sd = sigma)
```
We will estimate a mean temperature by fitting an intercept only linear regression model:
```{r}
mean(y)
m <- lm(y~1)
mu_hat <- coef(m)[[1]]
ci <- confint(m, level = 0.95)
ci
summary(m)
```
Let's illustrate what those confidence intervals really represent.
Imagine you went outside 20 times and each time you measured the cup of water 10 times. Then you fitted a linear regression model with an intercept only each time and plotted the confidence intervals.
```{r}
calc_mean <- function(i) {
y <- rnorm(n = n, mean = mu, sd = sigma)
m <- lm(y~1)
ci <- confint(m, level = 0.95)
tibble(l = ci[1], u = ci[2], y = y, i = i)
}
N <- 20
sim_ci <- lapply(seq_len(N), calc_mean) %>% bind_rows()
sim_ci
ggplot(sim_ci, aes(1, y)) + facet_wrap(~i) +
geom_point(alpha = 0.2) +
geom_hline(aes(yintercept = l)) +
geom_hline(aes(yintercept = u)) +
geom_hline(yintercept = mu, colour = "red")
```
19 times out 20 (95%) the 95% confidence intervals should contain the true value. Check the above plot. Does that look approximately correct?
So that means our calculated confidence intervals are just one realization of a theoretically repeated experiment.
# P-values
P-values tell us the long term frequency with which we would expect to calculate a value as extreme or more if the null hypothesis were actually true (assuming a two-side p-value). Usually, our null hypothesis is that a parameter is equal to zero. (Of course we know that no parameter is precisely equal to zero, and this is one reason why hypothesis testing can seem silly if you think about it carefully. Gather enough data and you can reject any null hypothesis.)
Now we imagine thousands of parallel universes where we went out and performed the exact same experiment but the true temperature was 0. And we want to know what fraction of those times we would observe a mean temperature at least as extreme as `r mean(y)` (i.e. greater than the absolute value of `r mean(y)`).
Sound convoluted? It kind of is in many cases.
```{r}
calc_mean_zero <- function(i) {
y <- rnorm(n = n, mean = 0, sd = sigma)
m <- lm(y~1)
tibble(i = i, mu_hat_sim = coef(m)[[1]])
}
N <- 3000
sim_pvals <- lapply(seq_len(N), calc_mean_zero) %>% bind_rows()
sim_pvals
sim_pvals <- mutate(sim_pvals,
tail = abs(mu_hat_sim) > abs(mu_hat))
ggplot(sim_pvals, aes(mu_hat_sim, fill = tail)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = c(-mu_hat, mu_hat), lty = "solid") +
geom_vline(xintercept = 0, lty = "dashed")
ggplot(sim_pvals, aes(abs(mu_hat_sim), fill = tail)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = mu_hat, lty = "solid") +
geom_vline(xintercept = 0, lty = "dashed")
summary(m)
sum(sim_pvals$tail)/N
```