Skip to content

Commit

Permalink
Investigating Sao Tome and Principe, visualizing gapfilled data, eval…
Browse files Browse the repository at this point in the history
…uating methods
  • Loading branch information
sophialecuona committed Jul 2, 2024
1 parent 544d924 commit 10a27e3
Showing 1 changed file with 50 additions and 10 deletions.
60 changes: 50 additions & 10 deletions globalprep/le/v2024/livelihood_dataprep.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -253,15 +253,15 @@ wage_years_filled <- wage_data_years %>%
# =================
# test gapfilling
gap_fill_test <- wage_years_filled %>%
mutate(ref_area_label = as.factor(ref_area_label)) %>%
mutate(lm_est = list(lm(monthly_wage ~ year + ref_area_label)))
lm_test <- lm(monthly_wage ~ year + ref_area_label, data = wage_years_filled)
summary(lm_test)
lm_test$coefficients
# gap_fill_test <- wage_years_filled %>%
# mutate(ref_area_label = as.factor(ref_area_label)) %>%
# mutate(lm_est = list(lm(monthly_wage ~ year + ref_area_label)))
#
#
# lm_test <- lm(monthly_wage ~ year + ref_area_label, data = wage_years_filled)
# summary(lm_test)
#
# lm_test$coefficients
# wage_years_filled$lm_values <- lm_test$fitted.values
Expand Down Expand Up @@ -307,13 +307,21 @@ paste0("proportion of countries/regions with only 1 data point: ", round(((num_n

```{r}
# preliminary plot
line_plot <- plotly::plot_ly(wage_gf, x = ~year, y = ~appx_wage_fill, color = ~ref_area_label, type = "scatter", mode = "lines") %>%
layout(title = "All Regions: monthly Service Wages (USD)",
xaxis = list(title = "Year"),
yaxis = list(title = "monthly Wages (USD)"))
line_plot
# htmlwidgets::saveWidget(line_plot, file = "prop_tourism_laborforce.html")
```




In this plot we can see that Sao Tome and Principe is a clear high outlier with no variation -- it is one of the countries for which we gapfilled with copied values from a single observation. We will continue to join the data with the OHI regions, then plot again and reinvestigate.


Clean up ILO data, join with OHI regions
Expand All @@ -328,7 +336,39 @@ wage_region_join <- left_join(region_clean, wage_gf, by = c("eez_iso3" = "iso3")
wage_regions <- wage_region_join %>%
mutate(unit = "Currency: 2017 PPP $") %>%
select(-classif2_label)
select(-classif2_label) %>%
mutate(sector = "tour",
data_source = "ILO")
```


Plot

```{r}
# interactive plot after joining with OHI Regions
plotly::plot_ly(wage_regions, x = ~year, y = ~appx_wage_fill, color = ~admin_country_name, type = "scatter", mode = "lines") %>%
layout(title = "All Regions: Monthly Service Wages (USD)",
xaxis = list(title = "Year"),
yaxis = list(title = "Monthly Wages (USD)"))
```

We can see that Sao Tome and Principe is still included and is still this high, unvarying outlier.
This lack of data combined with how high of an outlier is led us to reconsider our methods for this country and consider dropping it from this intermediate data product.

Let's drop & plot what that looks like.

```{r}
wage_regions_no_sp <- wage_regions %>%
filter(!ref_area_label %in% c("Sao Tome and Principe"))
# plot to see what it looks like after dropping Sao Tome and Principe
plotly::plot_ly(wage_regions_no_sp, x = ~year, y = ~monthly_wage, color = ~admin_country_name, type = "scatter", mode = "lines") %>%
layout(title = "All Regions: Monthly Service Wages (USD)",
xaxis = list(title = "Year"),
yaxis = list(title = "Monthly Wages (USD)"))
```

Dropping Sao Tome and Principe (STP) adjusted the y-axis scale significantly to better visualize the monthly service wage data (USD) for other geographic areas. Investigating further, it looks as though the Sao Tome and Principe observation from 2017 (133814.9 USD) may have been inputted incorrectly, measured as annual income instead of monthly, or there was an error when it was adjusted by PPP. Dustin noticed that the data source from this observation was "HIES - Household Budget Survey". This was the only observation from this data source. Upon further investigation, we discovered the average annual income in STP is ~2,400 USD [WorldData]("https://www.worlddata.info/africa/sao-tome-and-principe/index.php#:~:text=With%20an%20average%20annual%20income,the%20lower%20middle%2Dincome%20countries") and a GDP per capita of ~2,817 USD [World Bank]("https://www.worldbank.org/en/country/saotome/overview"). Therefore, we have decided to drop it for the intermediate data set.

0 comments on commit 10a27e3

Please sign in to comment.