-
Notifications
You must be signed in to change notification settings - Fork 0
/
multispp_loop_tutorial.Rmd
149 lines (110 loc) · 4.84 KB
/
multispp_loop_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: "Multispecies Loops"
author: "Kelsey King"
date: '2022-05-27'
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(ggplot2)
library(ggpubr)
```
# Data
This data comes from GBIF.org (27 May 2022) GBIF Occurrence Download
<https://doi.org/10.15468/dl.vf6fgm>. We collected all lupines
(*Lupinus* spp.) in the United States, with coordinate uncertainty less
than \~5000 meters. If you utilize this data elsewhere, please use the
above citation, see [GBIF citation
guidelines](https://www.gbif.org/citation-guidelines).
Now let's get started. You'll first want to import the data. I cleaned
the data as well to give a smaller set to work with, the original data
is included in the zip folder, but the file we use below is the cleaned
data, just what we want to use in the tutorial.
Data was filtered to records identified to species or finer and records
that have county information. Then we removed records with missing date
information.
```{r}
data=read.csv("tutorial_data.csv")
#forming a , which combo name
#could be used for various sorting
#I use this as a unique for the type of sort I want
#could be used to make it only one loop instead of nested/
data$sppState=as.character(paste(data$specificEpithet,
data$stateProvince))
#for my figures I only want to display
#one point for each date in each county,
#for each species
#so I can use this code.
data$day.loc=as.character(paste(data$county,
data$specificEpithet,
data$year,data$startDayOfYear))
#removing duplicates
data1=distinct(data, day.loc, .keep_all = TRUE)
#sorting to more than 50 records for the species-state combo name
summary.dat=data1 %>% group_by(sppState) %>%
summarize(sample.size=n())
keep.dat=summary.dat[summary.dat$sample.size>50,]
#finally altering the data
data2=data1[which(data1$sppState %in% keep.dat$sppState),]
```
# Quick Guide to Loops
I am going to make test data to create a loop plotting the count of each species by year. This loop translate in plain words to: for each species do this task. The task is within the curly brackets. First we need to know what species is the current one, and then we can sort our data, and then make our plot.
What is the variable `each.spp` ? This is a counter that runs from 1 to 4. Each time the loop finishes the task it automatically adds 1 to the number of `each.spp`. You can name this anything you want, many people use a single letter. We could have used 's' for species.
However, sometimes when starting to learn loops it is better to write a longer variable as you are taking what you want to do and putting it into R code. The counter is used in all for loops, but may or may not be used in the task itself.
```{r}
#make a simple data frame
name=rep(c("orange","leopard","sparrow","asparagus"),10)
count=round(rnorm(length(name),mean=30,sd=10),0)
year=sort(rep(2000:2009, 4))
test.data=data.frame(name,count,year)
#I want to sort by species, so I need a list of that
species=unique(name)
#now I make a loop to do a simple task
for(each.spp in 1:length(species)){
#what is my current species?
current.species=species[each.spp]
#what is my current data
current.data=test.data[test.data$name==current.species,]
#plot it!
plot(current.data$year,current.data$count,
main=paste(current.species))
}
```
# Using the Loop
```{r}
species=unique(data2$specificEpithet)
#loop for each species
for(each.spp in 1:length(species)){
#set the current species
cur.spp=species[each.spp]
#sort to the current species data
spp.dat=data2[data2$specificEpithet==cur.spp,]
#make a list of sites for that species
spp.states=unique(spp.dat$stateProvince)
#loop for each site
for(each.loc in 1:length(spp.states)){
#specifying the current site
cur.state=spp.states[each.loc]
#soting the data to the current site
cur.dat=spp.dat[spp.dat$stateProvince==cur.state,]
######################################################
#here are examples of things you can do inside this loop
#use this code to guide you on making your own loop
#plotting the day of year for sightings
ggplot(data=cur.dat,aes(x=year,y=startDayOfYear,
color = county))+
#this line of code is what lets add in the
#name of the species and the site to the figure
geom_jitter()+labs(title=paste("Lupinus",cur.spp,"in",
cur.state))+xlab("Year")+ylab("Day of Year")+
#pick a theme you like
#I want to hide the legend
theme_pubr()+
theme(legend.position = "none")
#save all the files automatically, with labelling
ggsave(paste0("./Figures/Lupinus_",cur.spp,"_",
cur.state,".jpg"), dpi=300,height=4,width=4)
}
}
```