-
Notifications
You must be signed in to change notification settings - Fork 5
/
functionswithr.Rpres
160 lines (124 loc) · 4.63 KB
/
functionswithr.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
functions with R
========================================================
author: Hezi Buba & Irene Steves
date: 18/12/2018
autosize: true
Presentation materials: https://github.com/ecodatasci-tlv/functions
Why write functions?
========================================================
Sometimes, we tend to repeat ourselves when coding: Repeating similar analyses, getting data ready for plots, etc.
Functions have multiple advantages over copy and pasting chuncks of code:
- Naming a function improve readability of code ("write codes for humans")
- Changing paramters and conditions in one place.
- Eliminate chance of mistakes due to copy-pasting.
When to write functions?
=======================================================
A good rule of thumb is to not copy and paste code more than twice.
```{r eval=F}
df$a <- (df$a - min(df$a, na.rm = TRUE)) /
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
(max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
(max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
(max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
```
Writing functions
========================================================
![](spotify-howtobuildmvp.gif)
>"It is faster to make a four-inch mirror then a six-inch mirror than to make a six-inch mirror."
****
1. Start with a limited but working chuck of code.
2. Rewrite it as a function. (psst.. FUN snippet..)
```{r eval =F}
name <- function(variables) {
}
```
3. Test it. OPTIONAL: conditional stopping of functions.
4. Name it.
5. Use it in your code or within a more complicated function.
How much code do we want to encapsule in a funcion?
========================================================
Suprisingly, not so much. Focus on your function doing just one thing.
Most of R's functions are less than 12 lines long!
Naming functions
========================================================
Function names should be kept short yet informative. Remember: functions are meant to help codes be reusable and readable.
What are good names for these two functions?
```{r eval = F}
f1 <- function(string, prefix) {
substr(string, 1, nchar(prefix)) == prefix
}
f2 <- function(x) {
if (length(x) <= 1) return(NULL)
x[-length(x)]
}
```
Anonymous functions
========================================================
Sometimes, you won't save the function like seen below, but rather - use it directly in a code:
```{r}
matrix_of_numbers <- matrix(1:100,10,10)
apply(matrix_of_numbers,2,function(x) x^2)
```
Iterations
========================================================
To further streamline your code, use iterations to repeat chuncks of code.
Most basic are `for` and `while` loops. However, there are more ways to iterate code.
Iterations have three main components: an output, a sequence to iterate over, and the body of code.
Vector allocation will save you a lot of time!
========================================================
```{r eval=F}
library(tictoc)
n_times <- 50000
tic()
a <- NULL
for(i in seq_len(n_times)){
a <- c(a, i^2)
a
}
toc()
tic()
a <- vector("double", n_times)
for(i in seq_len(n_times)){
a[i] <- i^2
}
toc()
```
Sequence
========================================================
`for (i in (1:10))` is a sequence. So is `while TRUE`.
`for (i in seq_along(vector))` is a better way of sequencing if you might get a vector of length 0 like so:
```{r}
y <- vector("double", 0)
seq_along(y)
#> integer(0)
1:length(y)
#> [1] 1 0
```
for i in unique(df$column)
========================================================
That's a common phrase when working with data. So common that there is a tidyverse package that does it for you.
![](purrr.png) We will discuss it shortly...
R is a functional language
========================================================
That means you can wrap loops within a function and just call that function when neccessary.
Remember - limit your copy and paste as much as possible!
```{r}
library(tidyverse)
data <- tibble(a=rnorm(10),
b=rnorm(10))
col_means <- function(dataframe){
output <- vector("double",ncol(dataframe))
for (i in seq_along(dataframe)){
output[[i]] <- mean(dataframe[[i]])
}
return(output)
}
col_means(data)
```
99 bottles of beer on the wall
========================================================
Create a function that returns the full song for any number of any vessel (bottles,cans, even boxes... ) of any drink (But no Jägermeister please):
>99 bottles of beer on the wall, 99 bottles of beer. Take one down, pass it around - 98 bottles of beer on the wall