-
Notifications
You must be signed in to change notification settings - Fork 175
/
ch04.Rmd
501 lines (328 loc) · 23.3 KB
/
ch04.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
---
output:
bookdown::html_document2:
fig_caption: yes
editor_options:
chunk_output_type: console
---
```{r echo = FALSE, cache = FALSE}
# This block needs cache=FALSE to set fig.width and fig.height, and have those
# persist across cached builds.
source("utils.R", local = TRUE)
knitr::opts_chunk$set(fig.width=5, fig.height=3.5, out.width="50%")
```
Line Graphs {#CHAPTER-LINE-GRAPH}
===========
Line graphs are typically used for visualizing how one continuous variable, on the y-axis, changes in relation to another continuous variable, on the x-axis. Often the *x* variable represents time, but it may also represent some other continuous quantity, for example, the amount of a drug administered to experimental subjects.
As with bar graphs, there are exceptions. Line graphs can also be used with a discrete variable on the x-axis. This is appropriate when the variable is ordered (e.g., "small", "medium", "large"), but not when the variable is unordered (e.g., "cow", "goose", "pig"). Most of the examples in this chapter use a continuous *x* variable, but we'll see one example where the variable is converted to a factor and thus treated as a discrete variable.
Making a Basic Line Graph {#RECIPE-LINE-GRAPH-BASIC-LINE}
-------------------------
### Problem
You want to make a basic line graph.
### Solution
Use `ggplot()` with `geom_line()`, and specify which variables you mapped to x and y (Figure \@ref(fig:FIG-LINE-GRAPH-BASIC-LINE)):
```{r FIG-LINE-GRAPH-BASIC-LINE, fig.cap="Basic line graph"}
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line()
```
### Discussion
In this sample data set, the *x* variable, Time, is in one column and the *y* variable, demand, is in another:
```{r}
BOD
```
Line graphs can be made with discrete (categorical) or continuous (numeric) variables on the x-axis. In the example here, the variable demand is numeric, but it could be treated as a categorical variable by converting it to a factor with `factor()` (Figure \@ref(fig:FIG-LINE-GRAPH-BASIC-LINE-FACTOR)). When the *x* variable is a factor, you must also use `aes(group=1)` to ensure that ggplot knows that the data points belong together and should be connected with a line (see Recipe \@ref(RECIPE-LINE-GRAPH-MULTIPLE-LINE) for an explanation of why group is needed with factors):
```{r FIG-LINE-GRAPH-BASIC-LINE-FACTOR, fig.cap="Basic line graph with a factor on the x-axis (notice that no space is allocated on the x-axis for 6)"}
BOD1 <- BOD # Make a copy of the data
BOD1$Time <- factor(BOD1$Time)
ggplot(BOD1, aes(x = Time, y = demand, group = 1)) +
geom_line()
```
In the `BOD` data set there is no entry for `Time = 6`, so there is no level 6 when `Time` is converted to a factor. Factors hold categorical values, and in that context, 6 is just another value. It happens to not be in the data set, so there's no space for it on the x-axis.
With ggplot2, the default *y* range of a line graph is just enough to include the *y* values in the data. For some kinds of data, it's better to have the *y* range start from zero. You can use `ylim()` to set the range, or you can use `expand_limits()` to expand the range to include a value. This will set the range from zero to the maximum value of the demand column in `BOD` (Figure \@ref(fig:FIG-LINE-GRAPH-BASIC-LINE-YLIM)):
```{r eval=F}
# These have the same result
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
ylim(0, max(BOD$demand))
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
expand_limits(y = 0)
```
```{r FIG-LINE-GRAPH-BASIC-LINE-YLIM, echo=FALSE, fig.cap="Line graph with manually set y range"}
# This code block is so that it generates just one of the plots from above
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
expand_limits(y = 0)
```
### See Also
See Recipe \@ref(RECIPE-AXES-RANGE) for more on controlling the range of the axes.
Adding Points to a Line Graph {#RECIPE-LINE-GRAPH-POINTS}
-----------------------------
### Problem
You want to add points to a line graph.
### Solution
Add `geom_point()` (Figure \@ref(fig:FIG-LINE-GRAPH-POINT)):
```{r FIG-LINE-GRAPH-POINT, fig.cap="Line graph with points"}
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
geom_point()
```
### Discussion
Sometimes it is useful to indicate each data point on a line graph. This is helpful when the density of observations is low, or when the observations do not happen at regular intervals. For example, in the `BOD` data set there is no entry for `Time=6`, but this is not apparent from just a bare line graph (compare Figure \@ref(fig:FIG-LINE-GRAPH-BASIC-LINE-YLIM) with Figure \@ref(fig:FIG-LINE-GRAPH-POINT)).
In the `worldpop` data set, the intervals between each data point are not consistent. In the far past, the estimates were not as frequent as they are in the more recent past. Displaying points on the graph illustrates when each estimate was made (Figure \@ref(fig:FIG-LINE-GRAPH-POINTS-INTERVAL)):
```{r FIG-LINE-GRAPH-POINTS-INTERVAL, fig.show="hold", fig.cap='Top: points indicate where each data point is; bottom: the same data with a log y-axis', fig.height=2}
library(gcookbook) # Load gcookbook for the worldpop data set
ggplot(worldpop, aes(x = Year, y = Population)) +
geom_line() +
geom_point()
# Same with a log y-axis
ggplot(worldpop, aes(x = Year, y = Population)) +
geom_line() +
geom_point() +
scale_y_log10()
```
With the log y-axis, you can see that the rate of proportional change has increased in the last thousand years. The estimates for the years before 0 have a roughly constant rate of change of 10 times per 5,000 years. In the most recent 1,000 years, the population has increased at a much faster rate. We can also see that the population estimates are much more frequent in recent times--and probably more accurate!
### See Also
To change the appearance of the points, see Recipe \@ref(RECIPE-LINE-GRAPH-POINT-APPEARANCE).
Making a Line Graph with Multiple Lines {#RECIPE-LINE-GRAPH-MULTIPLE-LINE}
---------------------------------------
### Problem
You want to make a line graph with more than one line.
### Solution
In addition to the variables mapped to the x- and y-axes, map another (discrete) variable to colour or linetype, as shown in Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-COLOR-TYPE):
```{r FIG-LINE-GRAPH-MULTI-LINE-COLOR-TYPE, fig.show="hold", fig.cap="A variable mapped to colour (left); A variable mapped to linetype (right)"}
library(gcookbook) # Load gcookbook for the tg data set
# Map supp to colour
ggplot(tg, aes(x = dose, y = length, colour = supp)) +
geom_line()
# Map supp to linetype
ggplot(tg, aes(x = dose, y = length, linetype = supp)) +
geom_line()
```
### Discussion
The `tg` data has three columns, including the factor `supp`, which we mapped to `colour` and `linetype`:
```{r}
tg
```
> **Note**
>
> If the *x* variable is a factor, you must also tell ggplot to group by that same variable, as described below.
Line graphs can be used with a continuous or categorical variable on the x-axis. Sometimes the variable mapped to the x-axis is *conceived* of as being categorical, even when it's stored as a number. In the example here, there are three values of dose: 0.5, 1.0, and 2.0. You may want to treat these as categories rather than values on a continuous scale. To do this, convert `dose` to a factor (Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-FACTOR)):
```{r FIG-LINE-GRAPH-MULTI-LINE-FACTOR, fig.cap="Line graph with continuous x variable converted to a factor"}
ggplot(tg, aes(x = factor(dose), y = length, colour = supp, group = supp)) +
geom_line()
```
Notice the use of `group = supp`. Without this statement, ggplot won't know how to group the data together to draw the lines, and it will give an error:
```{r eval=FALSE}
ggplot(tg, aes(x = factor(dose), y = length, colour = supp)) + geom_line()
#> geom_path: Each group consists of only one observation. Do you need to
#> adjust the group aesthetic?
```
Another common problem when the incorrect grouping is used is that you will see a jagged sawtooth pattern, as in Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-SAWTOOTH):
```{r FIG-LINE-GRAPH-MULTI-LINE-SAWTOOTH, fig.cap="A sawtooth pattern indicates improper grouping"}
ggplot(tg, aes(x = dose, y = length)) +
geom_line()
```
This happens because there are multiple data points at each *y* location, and ggplot thinks they're all in one group. The data points for each group are connected with a single line, leading to the sawtooth pattern. If any *discrete* variables are mapped to aesthetics like colour or linetype, they are automatically used as grouping variables. But if you want to use other variables for grouping (that aren't mapped to an aesthetic), they should be used with group.
> **Note**
>
> When in doubt, if your line graph looks wrong, try explicitly specifying the grouping variable with group. It's common for problems to occur with line graphs because ggplot is unsure of how the variables should be grouped.
If your plot has points along with the lines, you can also map variables to properties of the points, such as shape and fill (Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-SHAPE-FILL)):
```{r FIG-LINE-GRAPH-MULTI-LINE-SHAPE-FILL, fig.show="hold", fig.cap="Line graph with different shapes (left); With different colors (right)"}
ggplot(tg, aes(x = dose, y = length, shape = supp)) +
geom_line() +
geom_point(size = 4) # Make the points a little larger
ggplot(tg, aes(x = dose, y = length, fill = supp)) +
geom_line() +
geom_point(size = 4, shape = 21) # Also use a point with a color fill
```
Sometimes points will overlap. In these cases, you may want to *dodge* them, which means their positions will be adjusted left and right (Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-DODGE)). When doing so, you must also dodge the lines, or else only the points will move and they will be misaligned. You must also specify how far they should move when dodged:
```{r FIG-LINE-GRAPH-MULTI-LINE-DODGE, fig.cap="Dodging to avoid overlapping points"}
ggplot(tg, aes(x = dose, y = length, shape = supp)) +
geom_line(position = position_dodge(0.2)) + # Dodge lines by 0.2
geom_point(position = position_dodge(0.2), size = 4) # Dodge points by 0.2
```
Changing the Appearance of Lines {#RECIPE-LINE-GRAPH-LINE-APPEARANCE}
--------------------------------
### Problem
You want to change the appearance of the lines in a line graph.
### Solution
The type of line (solid, dashed, dotted, etc.) is set with `linetype`, the thickness (in mm) with `size`, and the color of the line with `colour` (or `color`).
These properties can be set (as shown in Figure \@ref(fig:FIG-BAR-GRAPH-BASIC-LINE-CUSTOMIZED)) by passing them values in the call to `geom_line()`:
```{r FIG-BAR-GRAPH-BASIC-LINE-CUSTOMIZED, fig.cap="Line graph with custom linetype, size, and colour"}
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line(linetype = "dashed", size = 1, colour = "blue")
```
If there is more than one line, setting the aesthetic properties will affect all of the lines. On the other hand, *mapping* variables to the properties, as we saw in Recipe \@ref(RECIPE-LINE-GRAPH-MULTIPLE-LINE), will result in each line looking different. The default colors aren't the most appealing, so you may want to use a different palette, as shown in Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-PALETTE), by using `scale_colour_brewer()` or `scale_colour_manual()`:
```{r FIG-LINE-GRAPH-MULTI-LINE-PALETTE, fig.cap="Using a palette from RColorBrewer"}
library(gcookbook) # Load gcookbook for the tg data set
ggplot(tg, aes(x = dose, y = length, colour = supp)) +
geom_line() +
scale_colour_brewer(palette = "Set1")
```
### Discussion
To set a single constant color for all the lines, specify colour outside of `aes()`. The same works for size, linetype, and point shape (Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-CONSTANT-AES)). You may also have to specify the grouping variable:
(ref:cap-FIG-LINE-GRAPH-MULTI-LINE-CONSTANT-AES) Line graph with constant size and color (left); With `supp` mapped to `colour`, and with points added (right)
```{r FIG-LINE-GRAPH-MULTI-LINE-CONSTANT-AES, fig.show="hold", fig.cap="(ref:cap-FIG-LINE-GRAPH-MULTI-LINE-CONSTANT-AES)"}
# If both lines have the same properties, you need to specify a variable to
# use for grouping
ggplot(tg, aes(x = dose, y = length, group = supp)) +
geom_line(colour = "darkgreen", size = 1.5)
# Since supp is mapped to colour, it will automatically be used for grouping
ggplot(tg, aes(x = dose, y = length, colour = supp)) +
geom_line(linetype = "dashed") +
geom_point(shape = 22, size = 3, fill = "white")
```
### See Also
For more information about using colors, see Chapter \@ref(CHAPTER-COLORS).
Changing the Appearance of Points {#RECIPE-LINE-GRAPH-POINT-APPEARANCE}
---------------------------------
### Problem
You want to change the appearance of the points in a line graph.
### Solution
In `geom_point()`, set the `size`, `shape`, `colour`, and/or `fill` outside of `aes()` (the result is shown in Figure \@ref(fig:FIG-LINE-GRAPH-POINT-PROPERTIES)):
```{r FIG-LINE-GRAPH-POINT-PROPERTIES, fig.cap="Points with custom size, shape, color, and fill"}
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
geom_point(size = 4, shape = 22, colour = "darkred", fill = "pink")
```
### Discussion
The default shape for points is a solid circle, the default size is 2, and the default colour is black. The fill color is relevant only for some point shapes (numbered 21–25), which have separate outline and fill colors (see Recipe \@ref(RECIPE-SCATTER-SHAPES) for a chart of shapes). The fill color is typically `NA`, or empty; you can fill it with white to get hollow-looking circles, as shown in Figure \@ref(fig:FIG-LINE-GRAPH-POINT-WHITEFILL):
```{r FIG-LINE-GRAPH-POINT-WHITEFILL, fig.cap="Points with a white fill"}
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
geom_point(size = 4, shape = 21, fill = "white")
```
If the points and lines have different colors, you should specify the points after the lines, so that they are drawn on top. Otherwise, the lines will be drawn on top of the points.
For multiple lines, we saw in Recipe \@ref(RECIPE-LINE-GRAPH-MULTIPLE-LINE) how to draw differently colored points for each group by mapping variables to aesthetic properties of points, inside of `aes()`. The default colors are not very appealing, so you may want to use a different palette, using `scale_colour_brewer()` or `scale_colour_manual()`. To set a single constant shape or size for all the points, as in Figure \@ref(fig:FIG-LINE-GRAPH-MULTI-LINE-POINT-PALETTE), specify shape or size outside of `aes()`:
```{r FIG-LINE-GRAPH-MULTI-LINE-POINT-PALETTE, fig.cap="Line graph with manually specified fills of black and white, and a slight dodge"}
library(gcookbook) # Load gcookbook for the tg data set
# Save the position_dodge specification because we'll use it multiple times
pd <- position_dodge(0.2)
ggplot(tg, aes(x = dose, y = length, fill = supp)) +
geom_line(position = pd) +
geom_point(shape = 21, size = 3, position = pd) +
scale_fill_manual(values = c("black","white"))
```
### See Also
See Recipe \@ref(RECIPE-SCATTER-SHAPES) for more on using different shapes, and Chapter \@ref(CHAPTER-COLORS) for more about colors.
Making a Graph with a Shaded Area {#RECIPE-LINE-GRAPH-AREA}
---------------------------------
### Problem
You want to make a graph with a shaded area.
### Solution
Use `geom_area()` to get a shaded area, as in Figure \@ref(fig:FIG-LINE-GRAPH-AREA):
```{r FIG-LINE-GRAPH-AREA, fig.cap="Graph with a shaded area", fig.width=8, fig.height=1.5, out.width="100%"}
# Convert the sunspot.year data set into a data frame for this example
sunspotyear <- data.frame(
Year = as.numeric(time(sunspot.year)),
Sunspots = as.numeric(sunspot.year)
)
ggplot(sunspotyear, aes(x = Year, y = Sunspots)) +
geom_area()
```
### Discussion
By default, the area will be filled with a very dark grey and will have no outline. The color can be changed by setting `fill`. In the following example, we'll set it to `"blue"`, and we'll also make it 80% transparent by setting `alpha` to 0.2. This makes it possible to see the grid lines through the area, as shown in Figure \@ref(fig:FIG-LINE-GRAPH-AREA-OUTLINE). We'll also add an outline, by setting colour:
```{r FIG-LINE-GRAPH-AREA-OUTLINE, fig.cap="Graph with a semitransparent shaded area and an outline", fig.width=8, fig.height=1.5, out.width="100%"}
ggplot(sunspotyear, aes(x = Year, y = Sunspots)) +
geom_area(colour = "black", fill = "blue", alpha = .2)
```
Having an outline around the entire area might not be desirable, because it puts a vertical line at the beginning and end of the shaded area, as well as one along the bottom. To avoid this issue, we can draw the area without an outline (by not specifying colour), and then layer a `geom_line()` on top, as shown in Figure \@ref(fig:FIG-LINE-GRAPH-AREA-TOPLINE):
(ref:cap-FIG-LINE-GRAPH-AREA-TOPLINE) Line graph with a line just on top, using `geom_line()`
```{r FIG-LINE-GRAPH-AREA-TOPLINE, fig.cap="(ref:cap-FIG-LINE-GRAPH-AREA-TOPLINE)", fig.width=8, fig.height=1.5, out.width="100%"}
ggplot(sunspotyear, aes(x = Year, y = Sunspots)) +
geom_area(fill = "blue", alpha = .2) +
geom_line()
```
### See Also
See Chapter \@ref(CHAPTER-COLORS) for more on choosing colors.
Making a Stacked Area Graph {#RECIPE-LINE-GRAPH-STACKED-AREA}
---------------------------
### Problem
You want to make a stacked area graph.
### Solution
Use `geom_area()` andmap a factor to fill (Figure \@ref(fig:FIG-LINE-GRAPH-STACKED-AREA)):
```{r FIG-LINE-GRAPH-STACKED-AREA, fig.cap="Stacked area graph", fig.width=6, out.width="75%"}
library(gcookbook) # Load gcookbook for the uspopage data set
ggplot(uspopage, aes(x = Year, y = Thousands, fill = AgeGroup)) +
geom_area()
```
### Discussion
The sort of data that is plotted with a stacked area chart is often provided in a wide format, but ggplot requires data to be in long format. To convert it, see Recipe \@ref(RECIPE-DATAPREP-WIDE-TO-LONG).
In the example here, we used the `uspopage` data set:
```{r}
uspopage
```
This version of the chart (Figure \@ref(fig:FIG-LINE-GRAPH-STACKED-AREA-CUSTOMIZED)) changes the palette to a range of blues and adds thin (`size = .2`) lines between each area. It also makes the filled areas semitransparent (`alpha = .4`), so that it is possible to see the grid lines through them:
```{r FIG-LINE-GRAPH-STACKED-AREA-CUSTOMIZED, fig.cap="Reversed legend order, lines, and a different palette", fig.width=6, out.width="75%"}
ggplot(uspopage, aes(x = Year, y = Thousands, fill = AgeGroup)) +
geom_area(colour = "black", size = .2, alpha = .4) +
scale_fill_brewer(palette = "Blues")
```
Since each filled area is drawn with a polygon, the outline includes the left and right sides. This might be distracting or misleading. To get rid of it (Figure \@ref(fig:FIG-LINE-GRAPH-STACKED-AREA-NOSIDES)), first draw the stacked areas *without* an outline (by leaving `colour` as the default `NA` value), and then add a `geom_line()` on top:
```{r FIG-LINE-GRAPH-STACKED-AREA-NOSIDES, fig.cap="No lines on the left and right of the graph", fig.width=6, out.width="75%"}
ggplot(uspopage, aes(x = Year, y = Thousands, fill = AgeGroup, order = dplyr::desc(AgeGroup))) +
geom_area(colour = NA, alpha = .4) +
scale_fill_brewer(palette = "Blues") +
geom_line(position = "stack", size = .2)
```
### See Also
See Recipe \@ref(RECIPE-DATAPREP-WIDE-TO-LONG) for more on converting data from wide to long format.
See Chapter \@ref(CHAPTER-COLORS) for more on choosing colors.
Making a Proportional Stacked Area Graph {#RECIPE-LINE-GRAPH-PROPORTIONAL-STACKED-AREA}
----------------------------------------
### Problem
You want to make a stacked area graph with the overall height scaled to a constant value.
### Solution
Use `geom_area(position = "fill")`, as in Figure \@ref(fig:FIG-LINE-GRAPH-PROPORTIONAL-STACKED-AREA), left:
```{r FIG-LINE-GRAPH-PROPORTIONAL-STACKED-AREA-1, eval=FALSE, fig.show="hold"}
ggplot(uspopage, aes(x = Year, y = Thousands, fill = AgeGroup)) +
geom_area(position = "fill", colour = "black", size = .2, alpha = .4) +
scale_fill_brewer(palette = "Blues")
```
### Discussion
With `position="fill"`, the y values will be scaled to go from 0 to 1. To print the labels as percentages, use `scale_y_continuous(labels = scales::percent)`, as in Figure \@ref(fig:FIG-LINE-GRAPH-PROPORTIONAL-STACKED-AREA), right:
```{r FIG-LINE-GRAPH-PROPORTIONAL-STACKED-AREA-2, eval=FALSE, fig.show="hold"}
ggplot(uspopage, aes(x = Year, y = Thousands, fill = AgeGroup)) +
geom_area(position = "fill", colour = "black", size = .2, alpha = .4) +
scale_fill_brewer(palette = "Blues") +
scale_y_continuous(labels = scales::percent)
```
```{r FIG-LINE-GRAPH-PROPORTIONAL-STACKED-AREA, ref.label=c("FIG-LINE-GRAPH-PROPORTIONAL-STACKED-AREA-1", "FIG-LINE-GRAPH-PROPORTIONAL-STACKED-AREA-2"), echo=FALSE, fig.cap="A proportional stacked area graph", fig.width=6}
```
### See Also
Creating stacked bar graph is done in a similar way. See Recipe \@ref(RECIPE-BAR-GRAPH-PROPORTIONAL-STACKED-BAR) for information about computing the percentages separately.
For more on summarizing data by groups, see Recipe \@ref(RECIPE-DATAPREP-SUMMARIZE).
Adding a Confidence Region {#RECIPE-LINE-GRAPH-REGION}
--------------------------
### Problem
You want to add a confidence region to a graph.
### Solution
Use `geom_ribbon()` and map values to `ymin` and `ymax`.
In the `climate` data set, `Anomaly10y` is a 10-year running average of the deviation (in Celsius) from the average 1950–1980 temperature, and `Unc10y` is the 95% confidence interval. We'll set `ymax` and `ymin` to `Anomaly10y` plus or minus `Unc10y` (Figure \@ref(fig:FIG-LINE-GRAPH-REGION)):
```{r FIG-LINE-GRAPH-REGION, fig.cap="A line graph with a shaded confidence region", fig.width=8, out.width="100%"}
library(gcookbook) # Load gcookbook for the climate data set
library(dplyr)
# Grab a subset of the climate data
climate_mod <- climate %>%
filter(Source == "Berkeley") %>%
select(Year, Anomaly10y, Unc10y)
climate_mod
# Shaded region
ggplot(climate_mod, aes(x = Year, y = Anomaly10y)) +
geom_ribbon(aes(ymin = Anomaly10y - Unc10y, ymax = Anomaly10y + Unc10y), alpha = 0.2) +
geom_line()
```
The shaded region is actually a very dark grey, but it is mostly transparent. The transparency is set with `alpha = 0.2`, which makes it 80% transparent.
### Discussion
Notice that the `geom_ribbon()` comes before `geom_line()`, so that the line is drawn on top of the shaded region. If the reverse order were used, the shaded region could obscure the line. In this particular case that wouldn't be a problem since the shaded region is mostly transparent, but it would be a problem if the shaded region were opaque.
Instead of a shaded region, you can also use dotted lines to represent the upper and lower bounds (Figure \@ref(fig:FIG-LINE-GRAPH-REGION-LINES)):
```{r FIG-LINE-GRAPH-REGION-LINES, fig.cap="A line graph with dotted lines representing a confidence region", fig.width=8, out.width="100%"}
# With a dotted line for upper and lower bounds
ggplot(climate_mod, aes(x = Year, y = Anomaly10y)) +
geom_line(aes(y = Anomaly10y - Unc10y), colour = "grey50", linetype = "dotted") +
geom_line(aes(y = Anomaly10y + Unc10y), colour = "grey50", linetype = "dotted") +
geom_line()
```
Shaded regions can represent things other than confidence regions, such as the difference between two values, for example.
In the area graphs in Recipe \@ref(RECIPE-LINE-GRAPH-STACKED-AREA), the *y* range of the shaded area goes from 0 to y. Here, it goes from `ymin` to `ymax`.