-
Notifications
You must be signed in to change notification settings - Fork 0
/
datavisualization.qmd
139 lines (89 loc) · 4.58 KB
/
datavisualization.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# Data visualization
\
## Creating graphs with `ggplot()`
`gg`stands for [grammar of graphics](https://link.springer.com/book/10.1007/0-387-28695-0), a framework which aims to describe all components of a graph. The `ggplot2`-package relies on this framework hence the name. This package is already included in the `tidyverse` therefore you do not have to install it again. If you load the `tidyverse`-library, the `ggplot2`-library is loaded automatically.
<aside>Even art can be created with the `ggplot`-library as shown [hier](https://www.data-imaginist.com/art).</aside>
```{r}
#| warnings: FALSE
#| errors: FALSE
#| output: FALSE
library(tidyverse)
d <- read.csv("data/DatasaurusDozen.csv") |>
filter(condition %in% c("away", "bullseye", "circle", "dino", "dots, star")) |>
mutate(id = as.factor(id))
d_summary <- d |> group_by(condition) |>
summarise(mean_x = mean(x),
mean_y = mean(y))
```
<aside>The data we used is from Matejka and Fitzmaurice (2017) and can be found [here](https://dl.acm.org/doi/10.1145/3025453.3025912)</aside>
A graph contains always:
- ***data***
- ***geoms***, visible forms (*aesthetics*) such as points, lines or boxes.
- ***a coordinate systems / mapping*** describes how data and geoms are linked, also colors or grouping variables are specified here
Further components could be:
- statistical parameters
- positions
- coordinate functions
- ***facets***
- scales
- ***themes***
(we will only cover the contents in italics)
::: {.callout-tip appearance="simple"}
## Good to know
- For plotting with `ggplot()` it is easiest when your data is in *long* format.
- What variables do you want to plot (categorical? continuous? ...) affects which `geoms`can be used. You can try out what is suited with the `esquisse`-package below or find ideas [here](https://www.data-to-viz.com).
:::
## Data, geoms and mapping
We start with entering the current data frame and add `geoms` and mappings (specified with `aes()`) with arguments such as
```{r}
ggplot(d_summary, # data
aes(x = mean_x, y = mean_y)) + # mapping
geom_point() # geom
```
Depending on your variables and what you want to show with your data different geoms are well suited.
Examples of available geoms:
- data points, scatterplots: `geom_point()`
- lines, tendencies: `geom_line()`
- histograms: `geom_histogram()`
- means and standard deviations: `geom_pointrange()`
- densities: `geom_density()`
- boxplots: `geom_boxplot()`
- violin plots: `geom_violin()`
<aside>[Here](https://github.com/rstudio/cheatsheets/blob/main/data-visualization.pdf) you can download `ggplot`-Cheatsheet</aside>
## Facets
With facets you can show subsets of your data in different panels
```{r}
ggplot(d, # data
aes(x = x, y = y)) + # mapping
geom_point() + # geom
facet_grid(~ condition) # facet
```
## Themes and labels
```{r}
ggplot(data = d,
mapping = aes(x = x,
y = y)) +
geom_point() +
ggtitle ("Title") +
labs(title = "Title",
x = "Variable A [a.u.]",
y = "Variable B [a.u.]") +
theme_minimal() # also theme_classic and theme_minimal are nice
```
::: {.callout-tip appearance="simple"}
## `esquisse`-package
With this package you can use the data frames in your current environment or load a new one to try out which geoms might be useful
```{r}
#| eval: FALSE
install.packages("esquisse")
esquisse::esquisser()
```
:::
<aside>If we do not want to load the whole library we can call the function by writing the library name then `::` and the function afterwards such as `esquisse::esquisser()`. This is helpful when libraries have overlapping function names which can create conflicts or to easily call a function without typing `library()` first.</aside>
Further helpful ressources:
- [`ggplot2`documentation](https://ggplot2.tidyverse.org/)
- [Website PsyTeachR: Data Skills for reproducible research](https://psyteachr.github.io/reprores-v3)
- [Start of the PsyTeachR videos by Lisa DeBruine](https://youtu.be/90IdULVGmYY), e.g. [Basic Plots](https://youtu.be/tOFQFPRgZ3M), [Common Plots](https://youtu.be/kKlQupjD__g) and [Plot Themes and Customization](https://youtu.be/6pHuCbOh86s)
- [Einführung in R](https://methodenlehre.github.io/einfuehrung-in-R/chapters/05-plotting.html) by Andrew Ellis and Boris Mayer
- [Here](https://www.data-to-viz.com) you find ideas for plots.
- [Demo of different geoms](https://rstudio-connect.psy.gla.ac.uk/plotdemo)