-
Notifications
You must be signed in to change notification settings - Fork 169
/
Exercise_05_JAGS.Rmd
390 lines (254 loc) · 29.8 KB
/
Exercise_05_JAGS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
---
title: "Introduction to JAGS"
author: "Michael Dietze"
output: html_document
---
```{r,echo=FALSE}
library(rjags)
library(coda)
```
# Introduction
The aim of this activity is to provide a tutorial on JAGS (Just Another Gibbs Sampler), a statistical software package designed specifically to do Bayesian analyses of simple to intermediate complexity using Markov Chain Monte Carlo (MCMC) numerical simulation methods. JAGS is one of a set of generalized software tools for doing Bayesian computation, with BUGS, NIMBLE, and STAN being other popular options. These tools use very similar and simple scripting languages that focus on specifying data and process models. The great thing about these tools is that they keeps a lot of the mathematical and computational details "under the hood" so that you can focus on the structure of your model at a high level rather than being bogged down in the details. Also, since JAGS is designed just to do Bayesian MCMC computation it is very efficient at this, which is nice since Bayesian MCMC computations can be time consuming. In this tutorial we will work through the basics of using JAGS: writing and compiling a model, loading data, executing the model, and evaluating the numerical output of the model.
While tools like JAGS can be run by themselves, they are not really designed for managing or manipulating data, and thus it is common to use R packages to interface with these tools. In the example below we will use the `rjags` package to call JAGS. To be able to install `rjags` you will need to first install JAGS itself: <http://mcmc-jags.sourceforge.net/>
The reason for choosing JAGS in this course is that it has much better cross-platform support than the earlier BUGS language, which really only operates well in Windows. The NIMBLE and STAN languages are newer, also work across platforms, and are becoming increasingly popular.
### Resources
JAGS itself provides a reference manual that is useful for looking up the syntax for distributions and functions, but isn't particularly useful for learning how to code. However, the JAGS syntax is virtually identical to the BUGS syntax and there are a much wider range of resources available for learning BUGS (books, examples, email list, etc.). The OpenBUGS website, <http://www.openbugs.net/>, is a great first stop for information---in particular the Documentation section contains examples, tutorials, and other useful information.
There's a large amount of info in both the JAGS and BUGS manuals but I wanted to point out a few sections that you'll find yourself coming back to repeatedly. One of these topics is the BUGS "Model Specification" which provides a lot of detail on how to write models in BUGS/JAGS, how data must be formatted, and the list of functions and distributions that BUGS/JAGS knows about. The naming convention for distributions in BUGS/JAGS is very similar to the convention in R, but there are a few cases where the parameterization is different (e.g. Weibull) so it is always good to check that the values you are passing to a BUGS/JAGS distribution are what you think they are. One very important example of this is that the Normal distribution in BUGS/JAGS, `dnorm`, is parameterized in terms of a mean and precision (1/variance) rather than a mean and standard deviation, which is the parameterization in R. To make life even more complicated, there are a small number of places where the JAGS syntax is slightly different from the BUGS version, so it's best to check the JAGS manual specifically for the definitions of any distributions and functions.
Within the BUGS Examples page you'll find three volumes of examples that provide written explanations of analyses with the BUGS code. Working through these is a great way to learn more about how BUGS works. When you first start analyzing your own data it is often easiest to start from an existing example and modify it to meet your needs rather than starting from scratch.
# Bayesian Regression using Gibbs Sampling
Linear regression is a good place to start our exploration of Bayesian numerical methods because it is the foundation for the large majority of the data analysis that occurs using classical methods (recall that ANOVA models are just a special case of regression) and because it gives us a foundation to build off of for exploring more complex models.
Regardless of whether you are a Bayesian or a frequentist, the standard regression model begins with a Normal likelihood:
$$L = P(y \vert b,\sigma^2, X) \propto N_n(y \vert Xb,\sigma^2 I)$$
Which is here written in matrix notation: $y$ is the vector of observations of the response variable, $b$ is the parameter vector (intercept and slopes), $\sigma^2$ is the residual variance, and $X$ is the design matrix. The subscript $n$ on the Normal indicates the size of the multivariate normal distribution (i.e. the number of rows of data). The matrix notation $Xb$ is just short-hand for the linear model $b_1 X_1 + b_2 X_2 + b_3 X_3 + \ldots$. The first column $X_1$ is typically all 1's, which when multiplied by $b_1$ gives us the intercept. Continuous covariates are each entered in a single column (e.g. $X_2$), while catagorical covariates are typically entered using indicator variables (multiple columns of 0's and 1's used to indicate whether are particular sample belongs to a particular class or not).
Since this is a Bayesian regression, we also need to specify priors on all of our parameters, which are often set to a multivariate Normal prior on the vector of regression parameters (length $p$), and an Inverse Gamma prior on the variance:
$$P(b) = N_p(b \vert b_0, V_b)$$
$$P(\sigma^2) = IG(\sigma^2 \vert s_1,s_2)$$
As a reminder, a Normal prior is not used on the variance because variances must be non-negative. The inverse gamma prior on the variance (equivalent to a gamma prior on the precision) has the advantage of being conjugate and having relatively interpretable parameters (`s_1` and `s_2` are proportional to the prior sample size and sum of squares, respectively; see Box 5.1). Putting things together (Likelihood * prior), our full model is:
$$P(b,\sigma^2 \vert X, y) \propto N_n(y \vert Xb,\sigma^2 I) N_p(b \vert b_0, V_b) IG(\sigma^2 \vert s_1,s_2)$$
Within a Gibbs sampler, the model will then be fit by iteratively sampling from each of the conditional posterior distributions:
The regression parameters given the variance
$$P(b \vert \sigma^2, X, y) \propto N_n(y \vert Xb,\sigma^2 I) N_p(b \vert b_0, V_b)$$
The variance given the regression parameters
$$P(\sigma^2 \vert b, X, y) \propto N_p(b \vert b_0, V_b) IG(\sigma^2 \vert s_1,s_2)$$
# A Simple JAGS model
As noted earlier JAGS parameterizes the Normal distribution in terms of a mean and precision (1/variance), rather than a mean and standard deviation. Therefore, let's redefine the prior precision as $P_b = V_b^{-1}$ and the data precision as $S = (\sigma^2I)^{-1} = {{1}\over{\sigma^2}}I$. Let's also switch from the conditional notation for PDFs to the tilde notation (~ is read as "is distributed as"). Given this we can rewrite the above model as:
$$ y \sim N_n( Xb,S) $$
$$b \sim N_p(b_0, P_b)$$
$$S \sim Gamma(s_1,s_2)$$
In JAGS the specification of any model begins with the word "model" and then encapsulates the rest of the model code in curly brackets:
```
model {
## model goes here ##
}
```
When writing models in JAGS we have to specify the data model, process model, and parameter model using the tilde notation. However, we don't need to specify explicitly the connections between them---JAGS figures this out based on the conditional probabilities involved. Deterministic calculations (e.g. process model) always make use of an arrow (`<-`) for assignment (similar to R syntax). Assignment of random variables is always done with a tilde (`~`). **The equal sign (`=`) cannot be used for either process or data models. Furthermore, deterministic calculations and distributions can not be combined in the same line of code.** For example, a regression model where we're trying to predict data y based on observation x might include
```
mu <- b1 + b2* x # process model
y ~ dnorm(mu,tau) # data model
```
but in JAGS the same model can **NOT** be expressed as
```
y ~ dnorm(b0 + b1* x, tau)
```
Putting this all together, to specify our first model of a normal likelihood and normal prior, the JAGS code is:
```
model{
b ~ dmnorm(b0,Pb) ## multivariate Normal prior on vector of regression params
S ~ dgamma(s1,s2) ## prior precision
for(i in 1:n){
mu[i] <- b[1] + b[2]*x[i] ## process model
y[i] ~ dnorm(mu[i],S) ## data model
}
}
```
The first line of this model specifies the prior on the regression coefficients and says that the vector `b` is a random variable (`~`) that follows a multivariate Normal distribution (`dmnorm`) with an expected value of `b0` and a precision of `Pb`. The second line specifies the prior on the precision `S` as following a gamma distribution. Next, to specify the Likelihood we loop over all `n` observation, and for each observation we calculate `mu[i]`, the expected value of the regression (process model), and then plug that expected value into the Normal data model. JAGS also doesn't allow temporary variables that are overwritten, so it's important that each row of data have it's own `mu[i]` rather than reusing a single `mu` within the loop.
Hopefully the above JAGS code seems pretty clear. However there are a few quirks to how JAGS evaluates a model, some of which are just idiosyncrasies (like no `=`) while others are deeply embedded in the logic but often misleading to newcomers. One of the more disorienting features is that, because JAGS isn't evaluating the code sequentially, the order of the code doesn't matter---the above model is equivalent to
```
model {
for(i in 1:n){
y[i] ~ dnorm(mu[i],S) ## data model
mu[i] <- b[1] + b[2]*x[i] ## process model
}
b ~ dmnorm(b0,Pb) ## multivariate Normal prior on vector of regression params
S ~ dgamma(s1,s2) ## prior precision
}
```
If you're accustomed to R the above seems 'wrong' because you're using mu in the data model before you've 'defined' it. But JAGS models are not a set of sequential instructions, rather they are the definition of a problem which JAGS will parse into a graph and then analyze to determine how to best sample from each posterior. Furthermore, even if the code were evaluated sequentially, MCMC is a cyclic algorithm and thus, after initial conditions are provided, it doesn't really matter what order thing are in.
The other major "quirk" of JAGS is that what JAGS does with a model can depend a LOT on what's passed in as knowns! Understanding how this works will help you considerably when writing and debug models. Key to this is to realize that *no variables are 'unknowns' that JAGS is trying to solve for in the algebraic sense*. Rather, some variables are constants or derived (calculated) quantities while others are **random variables**. Random variables *HAVE to have a distribution*. If we look at our simple model, x, b0, Pb, s1, s2, and n don't have distributions so they have to be provided as inputs. If they are not, JAGS will throw an error. Next, b and S are clearly random variables as they shows up on the left-hand side of the prior and on the right-hand side of the likelihood. Similarly, mu is on the left-hand side of a deterministic calculation. S, b, and mu *cannot* be specified as an input---this likewise will throw an error. But the interesting part is that, depending on context, $y$ could either be known or random and thus what JAGS does with this model depends a lot of whether $y$ is in the list of inputs. If $y$ is NOT specified, then the JAGS reads the above code as:
1. Draw a random b and S from the prior distribution
2. Conditional on those values and x, simulate a vector of random y's
However, if $y$ IS specified then JAGS will generate samples of $y | b,S,x$ -- in other words it will estimate the posteriors of b and S. This polymorphic behavior leads to another quirk---all distributions are in the "d" form even if you're using them to generate random numbers (i.e. there is no `rnorm` in JAGS).
Because of how JAGS views random variables, **no variable can show up on the left-hand side more than once**. In other words, a variable can't be both calculated and random, it can't be calculated more than once, and it can't have multiple priors or data models. Most of the time this is fine, but what catches many people is that it means you **can't reuse temporary variables**, because JAGS will view this as an attempt to redefine them.
# Fitting a JAGS model
OK, so how do we now use JAGS to fit our model to data? We can divide the R code required to perform this analysis into three parts:
- Code used to set-up the analysis
- specify model
- load data
- specify parameters for the priors
- specify initial conditions
- MCMC loop (i.e. running JAGS)
- Code used to evaluate the analysis
- Convergence diagnostics
- Summary statistics
- Credible & predictive intervals
## Specify model
To start with, we need to realize that JAGS code is not identical to R code, so if we write it in R as is, R will throw an error. Therefore, we either need to write this code in a separate file or treat it as text within R. In general I prefer the second option because I feel it makes coding and debugging simpler. Here I define the above JAGS model as a text string in R:
```{r}
univariate_regression <- "
model{
b ~ dmnorm(b0,Vb) ## multivariate Normal prior on vector of regression params
S ~ dgamma(s1,s2) ## prior precision
for(i in 1:n){
mu[i] <- b[1] + b[2]*x[i] ## process model
y[i] ~ dnorm(mu[i],S) ## data model
}
}
"
```
To pass this model to JAGS we'll use the `rjags` function `jags.model`, which has required arguments of the model and the data, and optional arguments for specifying initial conditions and the number of chains (I encourage you to read the man page for this function!). So before running this function let's first set up these other inputs
## Load data
Any data analysis begins with loading and organizing the data. In this case we have two variables, `x1` and `y`, stored in a file:
```{r}
load("data/Ex05.Part1.RData")
plot(x1,y)
```
To be able to pass the data into JAGS we need to organize it into R's *list* data type. As noted before the code can be polymorphic, so the order that data is specified is meaningless. However, **the NAMES of the variables in the list and in the code have to match EXACTLY (this is a common bug)**.
```{r}
data <- list(x = x1, y = y, n = length(y))
```
## Specify priors
Now that we have "data" and a model, the next task in setting up the analysis is to specify the parameters for the prior distributions.
Since `b` is a vector, the prior mean, `b0`, is also a vector, and the prior variance is a matrix. Since we have no prior conceptions about our data we'll select relatively weak priors, assuming a prior mean of 0 and a prior variance matrix that is diagonal (i.e. no covariances) and with a moderately large variance of 10000 (s.d. of 100). We'll use the `diag` function to set up a diagonal matrix that has a size of 2 and values of 10000 along the diagonal. In practice, JAGS requires that we specify the multivariate normal in terms of a precision matrix, therefore we will compute the inverse of the prior variance matrix using the `solve` function. For the residual error, we will specify an uninformative gamma prior on the precision with parameters s1 = 0.1 and s2 = 0.1. Finally, we'll add all of these prior parameters to the `data` object so we can pass them in to the JAGS model
```{r}
## specify priors
data$b0 <- as.vector(c(0,0)) ## regression b means
data$Vb <- solve(diag(10000,2)) ## regression b precisions
data$s1 <- 0.1 ## error prior n/2
data$s2 <- 0.1 ## error prior SS/2
```
**Note: If you want to fit a regression with more covariates (and thus more bs), you will need to increase the *size* of `b0` and `Vb` to match the number of bs.**
In this example I demonstrated converting from standard deviation to precision because, at least initially, most people are not accustomed to thinking in terms of precisions. The exact numeric values used for the data and priors are not important here and are just for illustration---how priors are chosen is discussed elsewhere.
## Initial conditions: lists, lists of lists, functions
The next step is to specify the initial conditions. In JAGS this is optional because if initial conditions are not specified then the code will draw them randomly from the priors. If you use informative priors this is not usually a problem. If you use uninformative priors, however, the initial parameter value can start far from the center of the distribution and take a long time to converge, just like in maximum likelihood optimization. Initial conditions are also specified using a list, but in this case we want to name the parameter variable names instead of the data variable names. A nice feature in JAGS is that we only have to specify some of the variable names, and those that we don't specify will be initialized based on the priors. Therefore, we can often get away with only initializing a subset of variables.
As an example, if we wanted to specify an initial condition for `S` we would do this with a list, same as we did with the data.
```{r}
inits = list(S=1/var(y))
```
Unlike priors, which strictly cannot be estimated from data, it is perfectly fair (and in most cases encouraged) to use the available data to construct the initial guess at model parameters. Good initial conditions improve convergence by reducing the time spent exploring low probability parameter combinations. In more complex models, bad choices of initial conditions can cause the model to blow up immediately. Indeed, if they start getting really funky results out of MCMC's many people's first reaction is to change the priors, which is generally the wrong thing to do (unless the priors really did have a bug in them). A better place to start is to check the initial conditions, especially if parameters are not being specified.
In JAGS we will commonly want to run multiple independent MCMC chains, as this better allows us to assess whether all chains have converged to the same part of parameter space. As with numerical optimization, it is best practice to start each chain from different initial conditions, but the above example would give all the chains the same initial mu. To specify multiple initial conditions we need to construct a list of lists, with each sub-list providing one initial condition list for each chain. In this example we'll run three chains starting at three different values for . We specify this in R as:
```{r}
inits <- list()
inits[[1]] <- list(S = 1/200)
inits[[2]] <- list(S = 1/70)
inits[[3]] <- list(S = 1/20)
```
We can also specifying initial conditions for multiple variable creating a list of lists so that we can give different initial conditions to each chain.
```{r}
## initial conditions
nchain = 3
inits <- list()
for(i in 1:nchain){
inits[[i]] <- list(b = rnorm(2,0,5), S = runif(1,1/200,1/20))
}
```
Finally, it is also possible to pass a function that returns a list of initial values
# Running JAGS
Now that we've got the data and initial conditions set up, we can call JAGS. Running the model in JAGS is broken into two steps. First, `jags.model` is called to establish the connection to JAGS, compile the model, and run through an initial number of adaptation steps (n.adapt is an option argument that defaults to 1000). Second, once the model is compiled then `coda.samples` is called to sample from the posterior. Separating this into two steps is actually advantageous because it makes it easier to monitor the progress of the MCMC and to add/remove variables from what's being sampled.
The call to `jags.model` is fairly straightforward
```{r}
j.model <- jags.model(file = textConnection(univariate_regression),
data = data,
inits = inits,
n.chains = 3)
```
The `textConnection` function is part of the normal R base and allows a text string (in this case our model) to be passed to a function that's expecting an external file. `n.chains` sets the number of chains. We'll use three for this example; 3-5 is typical.
If you have any bugs in your model, this is the step that will most likely throw an error.
Once the model is correctly compiled and initialized you can call `coda.samples` to sample from the model. Besides the initialized model object, the other arguments to the function are `variable.names`, a vector of variables you want stored, and `n.init` the number of iterations.
```{r}
jags.out <- coda.samples(model = j.model,
variable.names = c("b","S"),
n.iter = 5000)
```
# Evaluating MCMC outputs
The output from JAGS is returned in a format called an `mcmc.list`. The R package `coda` provides a whole set of functions for assessing and visualizing outputs in this format. The `coda` package can also be used to evaluate saved outputs from other Bayesian software, for example if you decide to use the BUGS stand-alone graphical interface.
### MCMC convergence
When running an MCMC analysis the first question to ask with any output is "has it converged yet?" This usually starts with a visual assessment of the output:
```{r}
plot(jags.out)
```
For every variable you are tracking this will spit out two plots, a trace plot and a density plot. The trace plot looks like a time-series with the parameter value on the Y-axis and the iterations on the X-axis. Each chain is plotted in a different color. In a model that has converged these chains will be overlapping and will bounce around at random, preferably looking like white noise but sometimes showing longer term trends.
To help make the assessment of convergence more objective, coda offers a number of diagnostics, with the most common being the Brooks-Gelman-Rubin (BGR) statistic. BGR requires that you have run multiple chains because it compares the among chain variance to the within change variance. If a model has converged then these two should be identical and the BGR should be 1. There's no hard-and-fast rule for the threshold that this statistic needs to hit but in general values less than 1.01 are excellent, 1.05 is good, and \>1.1 is generally held to be not yet converged.
```{r}
gelman.diag(jags.out)
```
In addition to having checked for overall convergence, it is important to remove any samples from before convergence occurred. Determining when this happened is done both visually and by plotting the BGR statistic versus sample
```{r}
BGR <- gelman.plot(jags.out)
```
The point up to where the BGR drops below 1.05 is termed the "burn-in" period and should be discarded. For example, if I determine the burn-in to be 500 steps I could remove the first 500 as:
```{r}
burnin = 500 ## determine convergence
jags.burn <- window(jags.out, start = burnin) ## remove burn-in
plot(jags.burn) ## check diagnostics post burn-in
```
After discarding burn-in, you should work with the trimmed object for subsequent analyses
### Updating the MCMC
If the MCMC hasn't converged by the end, if you need additional samples, or if you want to add or remove variables from your output, you can take additional samples by simply calling `coda.samples` again.
jags.out2 <- coda.samples(j.model,variable.names = c("b","S"),10000)
Note: the new samples are written to a new variable, not automatically added to the old. If you're working with a slow model where samples are precious you'll want to manually splice the old and new samples together.
### Sample size
The previous section raises an obvious question, "How many samples is enough?" Unfortunately, there's no single right answer to this question because it depends on how well your model is mixing (i.e. how independent the samples are from one step to the next) and what output statistics you are interested in.
To address the first question, the independence of samples is largely a question of autocorrelation. `coda` provides a simple autocorrelation plot
```{r}
acfplot(jags.burn)
```
Autocorrelation is always 1 by definition at lag 0 and then typically decays toward 0 roughly exponentially. The faster the decay the greater the independence. One way of approximating the number of independent samples you have is to divide the actual number of samples by the lag distance at which samples are approximately independent. A slightly more analytically rigorous way of achieving the same thing is to perform an **effective sample size calculation**
```{r}
effectiveSize(jags.burn)
```
For the second question, the number of samples required increases with the tails of the posterior distribution. For example, if you only care about the posterior mean or median, then you can get a stable estimate with an effective sample size of only a few hundred points. The standard deviation and interquartile range are more extreme and thus require more samples to estimate, while the 95% CI requires even more---**a common rule of thumb would be an effective size of 5000 samples**. Recall that the CI is determined by the most extreme values, so at n=5000 it is only the 125 largest and 125 smallest values that determine the value. If you need to work with even larger CI or even lower probability events you'll require more samples. This phenomena can be shown graphically (and diagnosed) by looking at plots of how different statistics change as a function of sample size:
```{r}
cumuplot(jags.burn, probs = c(0.025, 0.25, 0.5, 0.75, 0.975))
```
### Thinning
Older texts will advise that you thin your posterior samples to account for autocorrelation. Current thinking is that this is unnecessary unless you need to reduce the size of the output you save. Intuitively, setting "thin" to 10 retains only every 10th sample for further analysis, etc.
```{r}
jags.thin <- window(jags.burn, thin = 10)
plot(jags.thin)
```
# MCMC Statistics
Once your model has converged, and has enough samples after burn-in, you can calculate any summary statistics you want to describe your posterior output
```{r}
summary(jags.burn)
```
The first part of this table is the mean, standard deviation, and two different standard errors. Knowing how to interpret the last three of this is important. The first, SD, describes the spread of the posterior distributionin this case it is the **uncertainty about the parameter** `mu` (which, in other contexts, we usually refer to as the standard error). The SD doesn't decrease as you run the MCMC longer, it just converges on the value determined by the number and distribution of data points. By contrast, the SE declines asymptotically with MCMC length. In this context, this makes it an **indicator of the numerical precision of your results**---the longer the MCMC the higher the numerical precision of the posterior statistics. Between the two alternative SE values you should always use the Time-series SE, which has been corrected for autocorrelation, rather than the Naive SE, which has not.
The second part of the summary table is the sample quantiles for the posterior---the default is the 95% CI, interquartile range, and median.
Finally, if you need to work with the MCMC output itself, either to perform additional analyses (e.g. uncertainty analysis, prediction) or generate additional statistics/diagnostics, the `coda` `mcmc.list` format is a bit of a pain to work with. I find the most convenient format is a matrix:
```{r}
out <- as.matrix(jags.burn)
```
For example, there are a number of stats/diagnostics that you'll want to perform for multivariate models to assess the correlations among parameters, such as visualizing the correlations among parameters and calculating parameter correlations. These correlations tell you how well you were able to identify parameters, or whether you're seeing any trade-offs among model parameters (i.e. they tell you how the *parameters* are related to one another, *not* how the x's are related to each other or the y).
```{r}
## Pairwise scatter plots & correlation
pairs(out) ## pairs plot to evaluate parameter correlation
cor(out) ## correlation matrix among model parameters
```
Be aware that for a univariate regression that it is completely normal for the slope and intercept to be correlated (this occurs in a frequentist regression too). This also emphasizes why it's important to remember that MCMC gives you the full JOINT distribution of all your parameters, not just their individual marginal distributions, and that you can make some pretty significant errors if you just apply the marginal summary statistics when making predictions.
It's also worth noting that once you have the raw MCMC samples that you can also safely perform transformations on these, for example converting precision back to standard deviation:
```{r}
SD <- 1/sqrt(out[,"S"])
hist(SD)
summary(SD)
```
### Task 1
- Evaluate the MCMC chain for convergence. Include relevant diagnostics and plots. Determine and remove burn-in [A]
- Report parameter summary table and plot marginal distributions [A]
- Describe and explain the parameter covariances that you observe in the pairs plot and parameter correlation matrix.[B]
- Compare the summary statistics for the Bayesian regression model to those from the classical regression: `summary(lm(y ~ x1))`. This should include a comparison of the means and uncertainties of **all 3 model parameters** [A]
# Multiple Regression
Using the dataset "data/Ex05.Part2.RData", extend your univariate regression to a multiple regression model with two covariates (x1, x2) and an interaction term (x1\*x2). In doing so, not only do you need to update your process model, but you also need to make sure to update the dimension of your prior and initial conditions on $b$ from 2 to 4.
### Task 2
- Show the JAGS and R code used. [A]
- Include relevant convergence diagnostics and plots. [A]
- Report parameter summary table. [A]
- Plot marginal parameter distributions and pairwise parameter correlations (stats and scatter plots). [B]