Skip to content

Commit

Permalink
resolving merge conflict in livelihood_dataprep.Rmd
Browse files Browse the repository at this point in the history
Merge branch 'gh-pages' of https://github.com/OHI-Science/ohiprep_v2024 into gh-pages

# Conflicts:
#	globalprep/le/v2024/livelihood_dataprep.Rmd
  • Loading branch information
annaramji committed Jul 2, 2024
2 parents 0a83b96 + 10a27e3 commit 853181b
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 4 deletions.
4 changes: 2 additions & 2 deletions globalprep/le/v2024/economies_dataprep.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ prelim_plot <- plot_ly(data = gdp_prop_clean, x = ~year, y = ~gdp_prop_percent,
prelim_plot
htmlwidgets::saveWidget(prelim_plot, file = "gdp_prelim_plot_all_years.html")
# htmlwidgets::saveWidget(prelim_plot, file = "gdp_prelim_plot_all_years.html")
```

Really interesting to see how the proportions drop in 202 -- intuition check: makes sense because of Covid.... especially interesting to note how extremely Macau was impacted!
Expand Down Expand Up @@ -254,7 +254,7 @@ top_diff_gdp_plot <- plot_ly(data = gdp_filled_diff, x = ~year, y = ~appx_gdp_fi
top_diff_gdp_plot
htmlwidgets::saveWidget(top_diff_gdp_plot, file = "tour_gdp_top_diff_gf_plot.html")
# htmlwidgets::saveWidget(top_diff_gdp_plot, file = "tour_gdp_top_diff_gf_plot.html")
```


Expand Down
43 changes: 41 additions & 2 deletions globalprep/le/v2024/livelihood_dataprep.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ wage_years_filled <- wage_data_years %>%
# =================
# # test gapfilling
# test gapfilling
# gap_fill_test <- wage_years_filled %>%
# mutate(ref_area_label = as.factor(ref_area_label)) %>%
# mutate(lm_est = list(lm(monthly_wage ~ year + ref_area_label)))
Expand Down Expand Up @@ -307,13 +307,21 @@ paste0("proportion of countries/regions with only 1 data point: ", round(((num_n

```{r}
# preliminary plot
line_plot <- plotly::plot_ly(wage_gf, x = ~year, y = ~appx_wage_fill, color = ~ref_area_label, type = "scatter", mode = "lines") %>%
layout(title = "All Regions: monthly Service Wages (USD)",
xaxis = list(title = "Year"),
yaxis = list(title = "monthly Wages (USD)"))
line_plot
# htmlwidgets::saveWidget(line_plot, file = "prop_tourism_laborforce.html")
```




In this plot we can see that Sao Tome and Principe is a clear high outlier with no variation -- it is one of the countries for which we gapfilled with copied values from a single observation. We will continue to join the data with the OHI regions, then plot again and reinvestigate.


Clean up ILO data, join with OHI regions
Expand All @@ -329,7 +337,38 @@ wage_region_join <- left_join(region_clean, wage_gf, by = c("eez_iso3" = "iso3")
wage_regions <- wage_region_join %>%
mutate(unit = "Currency: 2017 PPP $") %>%
select(-classif2_label) %>%
mutate(sector = "tour")
mutate(sector = "tour",
data_source = "ILO")
```


Plot

```{r}
# interactive plot after joining with OHI Regions
plotly::plot_ly(wage_regions, x = ~year, y = ~appx_wage_fill, color = ~admin_country_name, type = "scatter", mode = "lines") %>%
layout(title = "All Regions: Monthly Service Wages (USD)",
xaxis = list(title = "Year"),
yaxis = list(title = "Monthly Wages (USD)"))
```

We can see that Sao Tome and Principe is still included and is still this high, unvarying outlier.
This lack of data combined with how high of an outlier is led us to reconsider our methods for this country and consider dropping it from this intermediate data product.

Let's drop & plot what that looks like.

```{r}
wage_regions_no_sp <- wage_regions %>%
filter(!ref_area_label %in% c("Sao Tome and Principe"))
# plot to see what it looks like after dropping Sao Tome and Principe
plotly::plot_ly(wage_regions_no_sp, x = ~year, y = ~monthly_wage, color = ~admin_country_name, type = "scatter", mode = "lines") %>%
layout(title = "All Regions: Monthly Service Wages (USD)",
xaxis = list(title = "Year"),
yaxis = list(title = "Monthly Wages (USD)"))
```

Dropping Sao Tome and Principe (STP) adjusted the y-axis scale significantly to better visualize the monthly service wage data (USD) for other geographic areas. Investigating further, it looks as though the Sao Tome and Principe observation from 2017 (133814.9 USD) may have been inputted incorrectly, measured as annual income instead of monthly, or there was an error when it was adjusted by PPP. Dustin noticed that the data source from this observation was "HIES - Household Budget Survey". This was the only observation from this data source. Upon further investigation, we discovered the average annual income in STP is ~2,400 USD [WorldData]("https://www.worlddata.info/africa/sao-tome-and-principe/index.php#:~:text=With%20an%20average%20annual%20income,the%20lower%20middle%2Dincome%20countries") and a GDP per capita of ~2,817 USD [World Bank]("https://www.worldbank.org/en/country/saotome/overview"). Therefore, we have decided to drop it for the intermediate data set.

0 comments on commit 853181b

Please sign in to comment.