120_Calculate_trends.Rmd

---
title: "120 Calculate time trends"
output: html_document
---

Note: Ca 35 minutes run time for running all trends    

This procedure uses data from script 110 and creates creates two data.frames: 120_result_10yr_[last_year] and 120_result_long_[last_year]

Note 1: This can be run twice, using different values for last_year (e.g. 2018 and 2019 if we make the report in 2020) 
to get both "newest" and "last year's" trends  
  
Note 2: If you want to add or replace a few time series, and don't want to run this for 35 minutes:    
- set add_extra_series to TRUE in the first chunk (Section 0)    
- run everything down to the "SPECIAL" section below (4c)  
- Adjust section 4c (modify the filtering) until 'df_series' is as you want it to be  
- Run the rest ('Run all chunks below' command in RStudio)
- The "save data" section (section 6) will then automatically add the new series to the existing time trends  

## 0. Will you add/change series in an existing file, or make a new file?   
```{r}

# ============ NOTE ============
#
# If 'add_extra_series' below is set to FALSE
#   - all time trends will be calculated (i.e., the 'normal' procedure)
#
# If 'add_extra_series' below is set to TRUE
#   - only the series defined in 'add_extra_series' will be calculated ("SPECIAL" part below, 4c)
#   - the "save data" section will then automatically add the new series to the existing time trends  
#     If those time trends already exists, they will be replaced by the new time trends

add_extra_series <- FALSE
# add_extra_series <- TRUE

last_year <- 2021

series_lastyear <- 2018   # The series must last at least until this year    
                          # Series without data in any of the three last years will be excluded


```
  
## 1. Libraries and functions  
```{r, results='hide', message=FALSE, warning=FALSE}

# install.packages("lubridate")

library(dplyr)
library(purrr)
library(lubridate)
library(readxl)
library(mgcv)
library(AICcmodavg)   # AICc()
library(ggplot2)
library(safejoin)     # installed from https://github.com/moodymudskipper/safejoin   
source("001_Add_trends_functions.R")  # Copied from '16_Trend_functions.R' in Milkys_2018

#
# Define a couple of extra functions for shortening code
#
get_stats <- function(df){
  calc_models_one_station2(df, gam = TRUE, log = TRUE)$statistics_for_file
}

model_from_medians_stat <- function(...)
  model_from_medians(...)$statistics

```

## 2. Read data  

### a. Medians per year   
Made in script 110  
```{r, collapse=TRUE}

filepattern <- "110_mediandata_updated_"         # entire file name except date and extension
filenumber <- 1                           # filenumber = 1 means "read the newest file"

files <- dir("Data", pattern = filepattern) %>% rev()

data_list <- read_rds_file("Data",
                     files, 
                     filenumber = filenumber,   # "1" gets the newest file   
                     get_date = TRUE, time_since_modified = TRUE)

data_med <- data_list$data%>%
  rename(Proref_median = Median,
         Median = Value) 
  
cat("File date text:", data_list$file_date, "\n")

```


### b. Select substances to perform trend analysis on  

```{r}

pars <- get_trend_parameters()

```


## 3. Testing 'model_from_medians'   
  
### a. Test - single series     
Just one example of the output of `model_from_medians()`   
The output includes  
- N = number of years with data  
- Nplus = number of years with median > LOQ   
- p_linear and p_nonlinear = p-values for linear and non-linear (GAM) regression   
- AICc - "model goodness". AIC decreases when the model fits the data better, but is "punished" by model complexity (non-linear is more complex than linear).
The AICc decides whether we use the linear or the non-linear model.   
- Lin_slope = slope of theh linear model (the non-linear has no fixed slope)    
- Lin_yr1 and Lin_yr2 - the value of the linear model at the start and end of the time series  
- Nonlin_yr1 and Nonlin_yr2 - the value of the non-linear model at the start and end of the time series   
- Over_LOQ_yr2 - whether Lin_yr2/Nonlin_yr2 is above LOQ   
- Model_used:  
    - Mean - if Nplus = 2,3 or 4 and N >= 3   
    - Nonlinear - if Nplus >= 7 and AICc_nonlin < AICc_lin  
    - Linear - otherwise  
- P_change: p_linear and p_nonlinear, depending on Model_used   
```{r}

years_long <- 1980:last_year
years_short <- (last_year-9):last_year

model_from_medians("HG", "Gadus morhua", "Muskel", 
                   "30B", "WW", years_long, data_med, plotname = "window", ggplot = TRUE)$statistics %>%
  select(Nplus, p_linear:Status)

# Some other examples:  
if (FALSE){
  # A short (5 year) series with downwards trend  
  model_from_medians("SCCP", "Gadus morhua", "Lever", 
                     "24B", "WW", years_long, data_med,
                     plotname = "window", ggplot = TRUE)$statistics %>%
    select(Nplus, p_linear:Status)
  # A short (5 year) series with no trend  
  model_from_medians("SCCP", "Gadus morhua", "Lever", 
                     "23B", "WW", years_long, data_med)$statistics %>%
    select(Nplus, p_linear:Status)
  # TBT  
  model_from_medians("TBT", "Nucella lapillus", "Whole soft body", 
                     "227G1", "DW", years_long, data_med)$statistics
  # Mercury, Oslo, last year and this year
  model_from_medians("HG", "Gadus morhua", "Muskel", 
                     "30B", "DW", years_long, data_med)$statistics
  model_from_medians("HG", "Gadus morhua", "Muskel", 
                     "30B", "DW", years_long, data_med)$statistics
  model_from_medians("HG", "Gadus morhua", "Muskel", 
                     "30B", "WW", years_long, data_med)$statistics
  model_from_medians("CB_S7", "Gadus morhua", "Lever", 
                     "30B", "WW", years_short, data_med)$statistics
  model_from_medians("CB_S7", "Limanda limanda", "Lever", 
                     "36F", "WW", years_long, data_med)$statistics
  model_from_medians("TBT", "N. lapillus / L. littorea", "Whole soft body", 
                     "71G", "WW", years_short, data_med)$statistics
  model_from_medians("VDSI/Intersex", "N. lapillus / L. littorea", "Whole soft body", 
                     "71G", "WW", years_long, data_med)$statistics
  model_from_medians("VDSI", "Nucella lapillus", "Whole soft body", 
                   "131G", "WW", years_long, data_med)$statistics %>%
  select(Nplus, p_linear:Status)
}

```


## 4. Prepare analysis  
  
### a. Prepare **df_series_all**  
df_series contains one line for each series we want to analyse with time series analysis  
```{r}

df_series_all <- get_trend_series(data_med, 
                                  parameters = pars,
                                  series_lasting_to = series_lastyear)

cat("Number of series:", nrow(df_series_all), "\n")


```

### b. Test the entire procedure  
For 20 random series  
*  Takes ca 30 seconds  
* Note: probably creates quite a few error messages - donæt worry about this  
```{r}
# We turn the function 'model_from_medians_stat' into a 'safe' function
#   in the sense that if the function fails (an error occurs), the procedure will not stop, but 
# continue looping through all the stations/parameters.
# The new function, called 'model_from_medians_stat_s', is created using the 'safely' function. (More info below.)
model_from_medians_stat_s <- safely(model_from_medians_stat)

# Pick some random series - 15 random and some non-random
df_series_random <- bind_rows(
  df_series_all %>% sample_n(15),        # 15 random series
  df_series_all %>% filter(PARAM == "HG" & STATION_CODE == "30B" & TISSUE_NAME == "Muskel" & Basis %in% c("WW","WWa")),
  df_series_all %>% filter(PARAM == "CB_S7" & STATION_CODE == "30B" & TISSUE_NAME == "Lever" & Basis %in% c("WW","WWa")),
  df_series_all %>% filter(PARAM == "CB_S7" & STATION_CODE == "I133" & Basis %in% c("WW"))
)

# df_series_random <- df_series_all %>%
#   filter(STATION_CODE == "36F")
  
# Turn into list
args <- df_series_random %>% 
  as.list()
# names(args) 

# Change names of the list objects, so they fit with input names of 'model_from_medians'
names(args) <- sub("STATION_CODE", "station", names(args))
names(args) <- sub("PARAM", "param", names(args))
names(args) <- sub("LATIN_NAME", "species", names(args))
names(args) <- sub("TISSUE_NAME", "tissue", names(args))
names(args) <- sub("Basis", "basis", names(args))

# Pick the list objects we need
args <- args[c("station", "tissue", "species", "param", "basis")]

# FOR DEBUGGING ONLY:
# debugonce(model_from_medians)  
# debugonce(calc_models_one_station2)
# debugonce(calc_models_gam)
# debugonce(GAM_trend_analysis)
# debugonce(statistics_for_excel)

# Turn off error messages if you want
# old_options <- options(show.error.messages = FALSE, warn = -1)

# Run models
result_list_10yr <- args %>% pmap(model_from_medians_stat_s, 
                                  yrs = seq(last_year-9, last_year), data_medians = data_med)
result_list_long <- args %>% pmap(model_from_medians_stat_s, 
                                  yrs = seq(1980, last_year), data_medians = data_med)

# Turn error messages on again
# options(old_options)

# 'model_from_medians_stat' returns a data frame of one line   
# 'model_from_medians_stat_s' (the 'safe' version) returns a list with two elements,
#     "result" and "error"
#   If the function works, "result" contains the normal output (the one-line data frame) and "error" is NULL
#   If the function fails, "result" is NULL and "error" contains the error message
# pmap() returns a list with one element per parameter/station, so the result is a 20-element list of lists:
#   result_list_10yr[[1]] = list with elements "result" and "error" 
#   result_list_10yr[[2]] = list with elements "result" and "error" 
#   ...
#   result_list_10yr[[20]] = list with elements "result" and "error" 

# We transpose these lists, resulting in the following lists:
#   result_list_10yr_t$result = list of 20 elements (each is a one line data frame)
#   result_list_10yr_t$error = list of 20 elements (each is either NULL or an error message)
result_list_10yr_t <- purrr::transpose(result_list_10yr)
result_list_long_t <- purrr::transpose(result_list_long)

# For 10-year trends, use the "error" element to find out which elements that have no errors:
no_error <- result_list_10yr_t$error %>% map_lgl(is.null)

# Combine the "result" elements with no errors to a single data frame
result_10yr <- bind_rows(result_list_10yr_t$result[no_error])

# For long-term trends, do the same:
no_error <- result_list_long_t$error %>% map_lgl(is.null)
result_long <- bind_rows(result_list_long_t$result[no_error])

```

### c. SPECIAL: add or update trends for one/a few parameters/stations   
Calculate trends for just a few selected series, and add them to the existing trend data sets (result_10yr and result_long)   
In order to do this:  
- set 'add_extra_series' in this chunk to TRUE    
- then modify the code so df_series contains only the series you want to add   -
- the rest of the code won't have to be changed - the latest trend data sets will be updated with the new data  
- remember to change add_extra_series back to `FALSE` afterwards    
For the ordinary case (where you want to analyse all time series), i.e., when add_extra_series = FALSE, just run the chunk below   

```{r}

# If 'add_extra_series' (set in the top of the script) is set to FALSE:
#   - all time trends will be calculated (i.e., the 'normal' procedure)
#
# If 'add_extra_series' is set to TRUE:
#   - only the series defined in 'add_extra_series' (see below) will be calculated 
#   - the "save data" section will then automatically add the new series to the existing time trends  
#     If those time trends already exists, they will be replaced by the new time trends

#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o
#
# Note: if 'add_extra_series' is (correctly) set to TRUE, the 'save results' part
#   will automatically read the existing (last saved) trends, add new ones, 
#   and re-save the trends. If the new series made her overlap with old ones, 
#   the old ones will be deleted before the new ones are added, avoiding duplication
#
#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o#o


#
# Alternative 1: add_extra_series = FALSE (all series will be analysed)  
#
if (!add_extra_series){
  
  df_series <- df_series_all
  
}

#
# Alternative 2: Automatically define 'df_series' based on running script 118
#
if (add_extra_series & exists("data_series_reanalyse")){
  
  df_series <- df_series_all %>%
    semi_join(data_series_reanalyse,
              by = c("STATION_CODE", "LATIN_NAME", "TISSUE_NAME", "PARAM"))
  
}

#
# Manually define 'df_series'
#
# Change the filter below in order to calculate trends for just selected series
# FILTERING EXAMPLES - insert where it says "Filter df_series_all" 
#    (replacee everything between "start" and "end")
#
if (add_extra_series & !exists("data_series_reanalyse")){
  
  cat("Original number of time series:\n")
  cat("  ", nrow(df_series_all), "\n")
  
  #
  # Filter df_series_all START
  # 
  
  # Example 1: we want to add 'TPTIN' to existing trends (for all stations)  
  # We filter df_series_all to get 'TPTIN' only:
  if (FALSE)
    df_series <- df_series_all %>% filter(STATION_CODE %in% "C/N") 

  # Example 2: VDSI/intersex
  if (FALSE)
    df_series <- df_series_all %>% filter(STATION_CODE %in% "71G" & PARAM == "VDSI/Intersex")
  
  # Example 3: HBCDs in eider duck blood 
  if (FALSE)
    df_series <- df_series_all %>% filter(STATION_CODE %in% "19N" & TISSUE_NAME %in% "Blod" & grepl("HBCD", PARAM))

  # Example 4: Unadjusted ('OH') PAH metabolites 
  if (FALSE)
    df_series <- df_series_all %>% filter(PARAM %in% c("PYR1OH", "BAP3OH", "PA1OH") & Basis %in% "WW")

  if (FALSE){
    df_series <- df_retest_2017 # %>% filter(Basis %in% c("WW", "WWa"))
  }
  #
  # Filter df_series_all END
  # 
  
  if (FALSE){
    df_series <- df_comb4   # defined in "120_Calculate_trend_error"  
  }

  cat("Number of time series after filtering df_series:\n")
  cat("  ", nrow(df_series), "\n")
} 


```

## 5. Analyse time series    
  
### a. Prepare 'args' list    
```{r}

args <- df_series %>% 
  as.list()
# names(args) 

# Change names of the list objects, so they fit with input names of 'model_from_medians'
names(args) <- sub("STATION_CODE", "station", names(args))
names(args) <- sub("PARAM", "param", names(args))
names(args) <- sub("LATIN_NAME", "species", names(args))
names(args) <- sub("TISSUE_NAME", "tissue", names(args))
names(args) <- sub("Basis", "basis", names(args))
args <- args[c("station", "tissue", "species", "param", "basis")]

```


### b. Calculation procedure   
  
**NOTE: For all time series, this takes ca. 35 minutes**    
  
```{r, result = 'hide', message=FALSE, warning=FALSE, error=FALSE}

# Make results a little reproducible
# However, 
set.seed(1814)

# Turn 'model_from_medians_stat' into a 'safe' function
#   The new function 'model_from_medians_stat_s' returns a list with two elements,
#     "result" and "error"
#   If the function works, "result" contains the normal output and "error" is NULL
#   If the function fails, "result" is NULL and "error" contains the error message
model_from_medians_stat_s <- safely(model_from_medians_stat)

# Example
# X <- model_from_medians_stat_s("TBT", "Nucella lapillus", "Whole soft body", "227G1", "DW", 1980:2018, data_med)
# X$result
# X$error

#
#o#o#o#o#o#o Calculation procedure starts #o#o#o#o#o#o
#

t0 <- Sys.time()                                                            # Start time for procedure
zz <- file("Messages from script 120 (can be deleted).txt", open = "wt")    # File for error messages

sink(zz, type = "message")             # Redirect error/warning messages 
                                       # - error messages  will be sent to the temp.txt file 

result_list_10yr <- args %>% pmap(model_from_medians_stat_s, 
                                  yrs = seq(last_year-9, last_year), data_medians = data_med)  # ca 15-20 minutes
result_list_long <- args %>% pmap(model_from_medians_stat_s, 
                                  yrs = seq(1980, last_year), data_medians = data_med)  # ca 15-20 minutes

sink()                                 # Turn off redirection of messages   
t1 <- Sys.time()                                                            # Start time for procedure


```

### c. Time used  
```{r}
cat("Procedure finished. \n")
print(t1 - t0)
```

### d. Change output into data frames   

```{r}

# Transpose lists
# 'result_list_10yr' is a list of some 1000 elements, where each element
#    is a list with $result and $error
# 'Transposing' means to turn this to a list of 2 elements: $result and $error, each with some 1000 elements  
result_list_10yr_t <- purrr::transpose(result_list_10yr)
result_list_long_t <- purrr::transpose(result_list_long)

# Combine list 1 to data frame
no_error <- result_list_10yr_t$error %>% map_lgl(is.null)
result_10yr <- bind_rows(result_list_10yr_t$result[no_error])

# Combine list 2 to data frame
no_error <- result_list_long_t$error %>% map_lgl(is.null)
result_long <- bind_rows(result_list_long_t$result[no_error])

if (FALSE){
  result_long %>% filter(STATION_CODE %in% "30B" & Basis == "WW" & PARAM == "HG") %>% View()
}

```


### e. Only if needed: save just these results  
Only used in special cases  
```{r}

if (FALSE){
  
  saveRDS(result_10yr, paste0("Data/120_10yr_", last_year, "_rerun.rds"))
  saveRDS(result_long, paste0("Data/120_long_", last_year, "_rerun.rds"))

}

```


## 6. Save or update the result file    

### a. Pattern for finding files 
```{r}

date_pattern <- "[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"
version_pattern <- "run[0-9][0-9]"
pattern_10yr <- paste0("^120_result_10yr_", last_year, "_", 
                       date_pattern, "_", version_pattern, ".rds$")
pattern_long <- paste0("^120_result_long_", last_year, "_", 
                       date_pattern, "_", version_pattern, ".rds$")

# Character numbers - used to extract date/version (see test below)
charnumber_date <- nchar("120_result_10yr_20xx_") + c(1,10)
charnumber_vers <- nchar("120_result_10yr_20xx_yyyy-mm-dd_run") + c(1,2)

# test
if (FALSE){
  dir("Data", pattern_10yr)
  dir("Data", pattern_long)
  dir("Data", pattern_10yr) %>% substr(charnumber_date[1], charnumber_date[2])
  dir("Data", pattern_10yr) %>% substr(charnumber_vers[1], charnumber_vers[2])
}

```

### b. Find newest existing file
```{r, results = 'hold'}

cat("Latest files for trends until", last_year, ": \n")
fn_existing_1 <- dir("Data", pattern = pattern_10yr) %>% tail(1)
fn_existing_2 <- dir("Data", pattern = pattern_long) %>% tail(1)

cat("  ", fn_existing_1, "\n")
cat("  ", fn_existing_2, "\n")

date_existing_1 <- fn_existing_1 %>% substr(charnumber_date[1], charnumber_date[2])
date_existing_2 <- fn_existing_2 %>% substr(charnumber_date[1], charnumber_date[2])

vers_existing_1 <- fn_existing_1 %>% substr(charnumber_vers[1], charnumber_vers[2])
vers_existing_2 <- fn_existing_2 %>% substr(charnumber_vers[1], charnumber_vers[2])

```

### c. Make file names to use  
```{r}

# Version number to use
# - if file with same date exists in the folder, we increase the version by 1
# - if file with same date does not exist, we set version to 1
fn_version_1 <- case_when(
  data_list$file_date == date_existing_1 ~ as.numeric(vers_existing_1) + 1,
  data_list$file_date != date_existing_1 ~ 1
) %>% 
  sprintf("%02i", .)  # put on text format 

# If first file this year
if (length(date_existing_1) == 0)
  fn_version_1 <- "01"

fn_version_2 <- case_when(
  data_list$file_date == date_existing_2 ~ as.numeric(vers_existing_2) + 1,
  data_list$file_date != date_existing_2 ~ 1
) %>% 
  sprintf("%02i", .)  # put on text format 

# If first file this year
if (length(date_existing_2) == 0)
  fn_version_2 <- "01"

# File names to use
fn1 <- paste0("120_result_10yr_", last_year, "_", 
              data_list$file_date, "_run", fn_version_1, ".rds")
fn2 <- paste0("120_result_long_", last_year, "_", 
              data_list$file_date, "_run", fn_version_2, ".rds")

cat("File names that will be used: \n")
cat(" ", fn1, "\n")
cat(" ", fn2, "\n")
cat("in folder 'Data'")

```


### d. add_extra_series = FALSE
```{r, results = 'hold'}

cat("'add_extra_series' set to", add_extra_series, "\n")

if (!add_extra_series){   # this is the normal case - we have calculated trends for all series

  saveRDS(result_10yr, paste0("Data/", fn1))
  saveRDS(result_long, paste0("Data/", fn2))
  
  cat("Results saved as \n")
  cat(" ", fn1, "\n")
  cat(" ", fn2, "\n")
  cat("in folder 'Data'")

} else {
  
  cat("This part is therefore not run \n")
  
}

```

### e. add_extra_series = TRUE (1)   

* Reads old trend results ('_original')    
* From the old trends, removes all series that have been reanalysed  
* Add reanalysed data to the old trends  
* See Results in 'result_10yr_new' and 'result_long_new'  

```{r}
# the case where we will update case - we have calculated trends for all series
if (add_extra_series){  

  #
  # Will be run only if we are adding series to existing (saved) series 
  #   ('original' below), or replacing existing series 
  #

  # Read saved 10 year series...
  result_10yr_original <- readRDS(paste0("Data/", fn_existing_1))
  cat("result_10yr: ", nrow(result_10yr_original), "rows originally \n")

  result_10yr_kept <- result_10yr_original %>%
    # Remove all series already present in result_10yr:
    safe_anti_join(result_10yr, 
                   by = c("STATION_CODE", "PARAM", "LATIN_NAME", "TISSUE_NAME", "Basis"),
                   na_matches = "never",
                   check = "BUV")
  cat("result_10yr: ", nrow(result_10yr_kept), "after removing overlapping time series")

  cat("\n")

  # Read saved 'long' series...
  result_long_original <- readRDS(paste0("Data/", fn_existing_2))
  cat("result_long: ", nrow(result_long_original), "rows originally \n")
  
  # Remove all series already present in result_long:
  result_long_kept <- result_long_original %>%
    safe_anti_join(result_long, 
                 by = c("STATION_CODE", "PARAM", "LATIN_NAME", "TISSUE_NAME", "Basis"),
                 na_matches = "never",
                 check = "BUV")
  cat("result_long: ", nrow(result_long_kept), "after removing overlapping time series")
  
  # Add the newly made series
  result_10yr_new <- bind_rows(result_10yr_kept, result_10yr)
  result_long_new <- bind_rows(result_long_kept, result_long)
  
  cat("\n\n")
  cat("Final data sets (after adding/replacing series): \n")
  cat("result_10yr_new: ", nrow(result_10yr_new), "rows \n")
  cat("result_long_new: ", nrow(result_long_new), "rows \n")
}

```
### f. add_extra_series = TRUE (2)   

Check that series are uniquely defined  

```{r}

# the case where we will update case - we have calculated trends for all series
if (add_extra_series){  
  # Check that series are uniquely defined
  check1 <- result_long_new %>%
    count(STATION_CODE, PARAM, LATIN_NAME, TISSUE_NAME, Basis) %>%
    filter(n > 1)

  check2 <- result_10yr_new %>%
    count(STATION_CODE, PARAM, LATIN_NAME, TISSUE_NAME, Basis) %>%
    filter(n > 1)

  if (nrow(check1) == 0 & nrow(check2) == 0){
    cat("Series are unique\n")
  } else {
    stop("ERROR: Series are not unique! This must be fixed. Check duplicates in check1 and/or check2. \n")
  }
}

``` 
### g. add_extra_series = TRUE (3)   

If script 118 has been used, remove series     

```{r}

# the case where we will update case - we have calculated trends for all series
if (add_extra_series & exists("data_series_remove")){  
  
  if (nrow(data_series_remove) == 0){
    cat("No series to remove\n")
  } else {
    cat(nrow(data_series_remove), "series to remove\n")
  }
  
  if (nrow(data_series_remove)>0){
    
    #
    # Long
    #
    cat("Original length of 'result_long_new':", nrow(result_long_new), "\n")
    
    result_long_new <- result_long_new %>%
      anti_join(data_series_remove, by = c("STATION_CODE", "PARAM", "LATIN_NAME", "TISSUE_NAME"))
    
    cat("New length of 'result_long_new':", nrow(result_long_new), "\n")

    #
    # 10 year
    #
    cat("Original length of 'result_10yr_new':", nrow(result_10yr_new), "\n")

    result_10yr_new <- result_10yr_new %>%
      anti_join(data_series_remove, by = c("STATION_CODE", "PARAM", "LATIN_NAME", "TISSUE_NAME"))
    
    cat("New length of 'result_10yr_new':", nrow(result_10yr_new), "\n")
    
  }
  
}

``` 

### h. add_extra_series = TRUE (4)
```{r}

if (add_extra_series){  

  saveRDS(result_10yr_new, paste0("Data/", fn1))
  saveRDS(result_long_new, paste0("Data/", fn2))
  
  cat("Data saved in 'Data' as \n")
  cat("  ", fn1, "\n")
  cat("  ", fn2, "\n")
  
  cat("\n")
  cat("=========================================================================\n")
  cat(" REMEMBER TO SET 'add_extra_series <- FALSE' NOW THAT YOU ARE FINISHED. \n")
  cat("=========================================================================\n")

}

```


## 7. Appendix  

Use the code below (run the part inside the `{}`) for reading the produced data    
```{r}

if (FALSE){
  
  date_pattern <- "[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"
  version_pattern <- "run[0-9][0-9]"
  pattern_10yr <- paste0("^120_result_10yr_", last_year, "_", 
                         date_pattern, "_", version_pattern, ".rds$")
  pattern_long <- paste0("^120_result_long_", last_year, "_", 
                         date_pattern, "_", version_pattern, ".rds$")

  # Trends - most recent estimate 
  # Read and reformat the most recent data (by default)  

  pattern_10yr <- paste0("120_result_10yr_", last_year, "_")
  pattern_long <- paste0("120_result_long_", last_year, "_")
  
  # Find files and sort with the newest first
  files1 <- dir("Data", pattern = pattern_10yr) %>% rev()
  files2 <- dir("Data", pattern = pattern_long) %>% rev()
  
  cat("Latest files: \n")
  cat(files1[1], "\n")
  cat(files2[1], "\n")
  
  # Print for user
  cat("Trend estimates - reading the last files produced:")
  cat("\n ", files1[1])
  cat("\n ", files2[1])

  # Read files
  result_10yr <- readRDS(paste0("Data/", files1[1]))
  result_long <- readRDS(paste0("Data/", files2[1]))
  
  result_10yr %>% 
    filter(PARAM == "SCCP" & STATION_CODE %in% "24B")
  
}


```