Skip to content

Commit

Permalink
Student income background and mobility. (#743)
Browse files Browse the repository at this point in the history
* Student income background and mobility.

* Deal with colon in meta.

* Accept submission

---------

Co-authored-by: jonthegeek <jonthegeek@users.noreply.github.com>
  • Loading branch information
jonthegeek and jonthegeek authored Sep 9, 2024
1 parent 3f966e3 commit 0940b2b
Show file tree
Hide file tree
Showing 7 changed files with 2,112 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ If you are using TidyTuesday to teach data-related skills, [please let us know](
| 34 | `2024-08-20` | [English Monarchs and Marriages](data/2024/2024-08-20/readme.md) | [A list of Monarchs by marriage](https://www.ianvisits.co.uk/articles/a-list-of-monarchs-by-marriage-6857/) | [monarchs and marriages](github.com/frankiethull/english_monarch_marriages) |
| 35 | `2024-08-27` | [The Power Rangers Franchise](data/2024/2024-08-27/readme.md) | [Power Rangers: Seasons and episodes data](https://www.kaggle.com/datasets/karetnikovn/power-rangers-dataset/data) | [National Power Rangers Day (August 28)](https://www.nationaldaycalendar.com/national-day/national-power-rangers-day-august-28) |
| 36 | `2024-09-03` | [Stack Overflow Annual Developer Survey 2024](data/2024/2024-09-03/readme.md) | [Stack Overflow Annual Developer Survey 2024](https://survey.stackoverflow.co/) | [Stack Overflow Annual Developer Survey Results](https://survey.stackoverflow.co/2024/) |
| 37 | `2024-09-10` | [Economic Diversity and Student Outcomes](data/2024/2024-09-10/readme.md) | [Opportunity Insights: College-Level Data for 139 Selective American Colleges
](https://opportunityinsights.org/data/) | [Economic diversity and student outcomes at the University of Texas at Dallas](https://www.nytimes.com/interactive/projects/college-mobility/university-of-texas-at-dallas) |

***

Expand Down
1,947 changes: 1,947 additions & 0 deletions data/2024/2024-09-10/college_admissions.csv

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions data/2024/2024-09-10/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
title: Economic Diversity and Student Outcomes
article:
title: Economic diversity and student outcomes at the University of Texas at Dallas
url: https://www.nytimes.com/interactive/projects/college-mobility/university-of-texas-at-dallas
data_source:
title: >
Opportunity Insights: College-Level Data for 139 Selective American Colleges
url: https://opportunityinsights.org/data/
images:
- file: utd-access.png
alt: >
Parent income for students attending The University of Texas at Dallas (UTD).
The median family income is $89,800, whcih is among the highest in Texas but
among the lowest among highly selective public colleges. The average income
percentile is 66th. Less than 1% of students come from the top 0.1% income level,
1% from the top 1%, 9.2% from the top 5%, 23% from the top 10%, 40% from the
top 20%, and 7.6% from the bottom 20%.
143 changes: 143 additions & 0 deletions data/2024/2024-09-10/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Economic Diversity and Student Outcomes

College students are back on campus in the US, so we're exploring economic diversity and student outcomes! The dataset this week comes from [Opportunity Insights](https://opportunityinsights.org/data/) via an [article](https://www.nytimes.com/interactive/2017/01/18/upshot/some-colleges-have-more-students-from-the-top-1-percent-than-the-bottom-60.html) and associated [interactive visualization](https://www.nytimes.com/interactive/projects/college-mobility/university-of-texas-at-dallas) from the Upshot at the New York Times. Thank you to [Havisha Khurana](https://github.com/havishak) for suggesting this dataset!

> A new study, based on millions of anonymous tax records, shows that some colleges are even more economically segregated than previously understood, while others are associated with income mobility.
This dataset offers an opportunity to explore the [three rules that make a dataset "tidy"](https://r4ds.hadley.nz/data-tidy#sec-tidy-data):

1. Each variable is a column; each column is a variable.
2. Each observation is a row; each row is an observation.
3. Each value is a cell; each cell is a single value.

How might you pivot this data to make it longer? When might you want to do that? When might you pivot this data to make it wider?

## The Data

```r
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2024-09-10')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 37)

college_admissions <- tuesdata$college_admissions

# Option 2: Read directly from GitHub

college_admissions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-10/college_admissions.csv')
```

## How to Participate

- [Explore the data](https://r4ds.hadley.nz/), watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about **causation** in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a [shiny app](https://shiny.posit.co/), or some other piece of data-science-related output, using R or another programming language.
- [Share your output and the code used to generate it](../../../sharing.md) on social media with the #TidyTuesday hashtag.
- [Submit your own dataset!](../../../.github/pr_instructions.md)

### Data Dictionary

# `college_admissions.csv`

|variable |class |description |
|:-------------------------------|:---------|:-------------------------------------|
|super_opeid |double |Institution OPEID / Cluster ID when combining multiple OPEIDs. |
|name |character |Name of college (or college group). |
|par_income_bin |double |Parent household income group based on percentile in the income distribution. |
|par_income_lab |character |Parent household income label. |
|attend |double |Test-score-reweighted absolute attendance rate: Calculated as the fraction of students attending that college among all test-takers within a parent income bin in the Pipeline Analysis Sample. |
|stderr_attend |double |Standard error on the attend variable. |
|attend_level |double |The school average estimates reweighting on test score. Divide the test-score-reweighted absolute variables by this average to calculate the test-score-reweighted relative variables. |
|attend_sat |double |Absolute attendance rate for specific test score band based on school tier/category. |
|stderr_attend_sat |double |Standard error on the attend_sat variable. |
|attend_level_sat |double |The school average estimates reweighting on test score. Divide the test-score-reweighted absolute variables by this average to calculate the test-score-reweighted relative variables. |
|rel_apply |double |Test-score-reweighted relative application rate: Calculated using adjusted score-sending rates, the relative fraction of all standardized test takers who send test scores to a given college. |
|stderr_rel_apply |double |Standard error on the rel_apply variable. |
|rel_attend |double |Test-score-reweighted relative attendance rate: Calculated as the fraction of students attending that college among all test-takers within a parent income bin in the Pipeline Analysis Sample. Relative attendance rates are reported as a proportion of the mean attendance rate across all parent income bins for each college. |
|stderr_rel_attend |double |Standard error on the rel_attend variable. |
|rel_att_cond_app |double |Calculated as the ratio of rel_attend to rel_apply. |
|rel_apply_sat |double |Relative application rate for specific test score band based on school tier/category. Selected test score band is the 50-point band that had the most attendees in each school tier/category. The selected range: Ivy Plus: SAT 1460-1510; Elite Public: SAT 1180-1230; Top Private: SAT 1410-1460; NESCAC: SAT 1370-1420; Tier 2 Private: SAT 1290-1340; Top 100 Private: SAT 1170-1220; Top 100 Public: SAT 1110-1160; Other Flagship: SAT 1070-1120 |
|stderr_rel_apply_sat |double |Standard error on the rel_apply_sat variable. |
|rel_attend_sat |double |Relative attendance rate for specific test score band based on school tier/category. |
|stderr_rel_attend_sat |double |Standard error on the rel_attend_sat variable. |
|rel_att_cond_app_sat |double |Relative attendance rate, conditional on application, for specific test score band based on school tier/category |
|attend_instate |double |Test-score-reweighted absolute attendance rate for in-state students. Only available for public schools. |
|stderr_attend_instate |double |Standard error on the attend_instate variable. |
|attend_level_instate |double |The school average estimates reweighting on test score. Divide the test-score-reweighted absolute variables by this average to calculate the test-score-reweighted relative variables. |
|attend_instate_sat |double |Absolute estimates on a specific test score for in-state students. Only available for public schools. |
|stderr_attend_instate_sat |double |Standard error on the attend_instate_sat variable. |
|attend_level_instate_sat |double |Absolute estimates on a specific test score for in-state students. Only available for public schools. |
|attend_oostate |double |Test-score-reweighted absolute attendance rate for out-of-state students. Only available for public schools. |
|stderr_attend_oostate |double |Standard error on the attend_oostate variable. |
|attend_level_oostate |double |The school average estimates reweighting on test score. Divide the test-score-reweighted absolute variables by this average to calculate the test-score-reweighted relative variables. |
|attend_oostate_sat |double |Absolute estimates on a specific test score for out-of-state students. Only available for public schools. |
|stderr_attend_oostate_sat |double |Standard error on the attend_oostate_sat variable. |
|attend_level_oostate_sat |double |Absolute estimates on a specific test score for out-of-state students. Only available for public schools. |
|rel_apply_instate |double |Test-score-reweighted relative application rate for in-state students. In-state status is measured using the students’ address when they take a standardized test. Only available for public schools. |
|stderr_rel_apply_instate |double |Standard error on the rel_apply_instate variable. |
|rel_attend_instate |double |Test-score-reweighted relative attendance rate for in-state students. Only available for public schools. |
|stderr_rel_attend_instate |double |Standard error on the rel_attend_instate variable. |
|rel_att_cond_app_instate |double |Test-score-reweighted relative attendance rate, conditional on application, for in-state students. Only available for public schools. |
|rel_apply_oostate |double |Test-score-reweighted relative application rate for out-of-state students. In-state status is measured using the students’ address when they take a standardized test. Only available for public schools. |
|stderr_rel_apply_oostate |double |Standard error on the rel_apply_oostate variable. |
|rel_attend_oostate |double |Test-score-reweighted relative attendance rate for out-of-state students. Only available for public schools. |
|stderr_rel_attend_oostate |double |Standard error on the rel_attend_oostate variable. |
|rel_att_cond_app_oostate |double |Test-score-reweighted relative attendance rate, conditional on application, for out-of-state students. Only available for public schools. |
|rel_apply_instate_sat |double |Relative estimates on a specific test score for in-state students. Only available for public schools. |
|stderr_rel_apply_instate_sat |double |Standard error on the rel_apply_instate_sat variable. |
|rel_attend_instate_sat |double |Relative estimates on a specific test score for in-state students. Only available for public schools. |
|stderr_rel_attend_instate_sat |double |Standard error on the rel_attend_instate_sat variable. |
|rel_att_cond_app_instate_sat |double |Estimates on a specific test score for in-state students. Only available for public schools. |
|rel_apply_oostate_sat |double |Relative estimates on a specific test score for out-of-state students. Only available for public schools. |
|stderr_rel_apply_oostate_sat |double |Standard error on the rel_apply_oostate_sat variable. |
|rel_attend_oostate_sat |double |Relative estimates on a specific test score for out-of-state students. Only available for public schools. |
|stderr_rel_attend_oostate_sat |double |Standard error on the rel_attend_oostate_sat variable. |
|rel_att_cond_app_oostate_sat |double |Estimates on a specific test score for out-of-state students. Only available for public schools. |
|attend_unwgt |double |Unweighted absolute attendance rate: Calculated as the fraction of students attending that college among all test-takers within a parent income bin in the Pipeline Analysis Sample. |
|stderr_attend_unwgt |double |Standard error on the attend_unwgt variable. |
|attend_unwgt_level |double |The unweighted school average estimates. Divide the unweighted absolute variables by this average to calculate the unweighted relative variables. |
|attend_unwgt_instate |double |Unweighted absolute estimates for instate students. Only available for public schools. |
|stderr_attend_unwgt_instate |double |Standard error on the attend_unwgt_instate variable. |
|attend_unwgt_oostate |double |Unweighted absolute estimates for out-of-state students. Only available for public schools. |
|stderr_attend_unwgt_oostate |double |Standard error on the attend_unwgt_oostate variable. |
|attend_unwgt_level_instate |double |The unweighted school average estimates. Divide the unweighted absolute variables by this average to calculate the unweighted relative variables. |
|attend_unwgt_level_oostate |double |The unweighted school average estimates. Divide the unweighted absolute variables by this average to calculate the unweighted relative variables. |
|rel_attend_unwgt |double |Unweighted relative attendance rate: Calculated as the fraction of students attending that college among all test-takers within a parent income bin in the Pipeline Analysis Sample. Relative attendance rates are reported as a proportion of the mean attendance rate across all parent income bins for each college. |
|rel_apply_unwgt |double |Unweighted relative application rate: Calculated using adjusted score-sending rates, the relative fraction of all standardized test takers who send test scores to a given college. |
|stderr_rel_attend_unwgt |double |Standard error on the rel_attend_unwgt variable. |
|stderr_rel_apply_unwgt |double |Standard error on the rel_apply_unwgt variable. |
|rel_att_cond_app_unwgt |double |Calculated as the ratio of rel_attend_unwgt to rel_apply_unwgt. |
|rel_attend_unwgt_instate |double |Unweighted relative estimates for instate students. Only available for public schools. |
|rel_attend_unwgt_oostate |double |Unweighted relative estimates for out-of-state students. Only available for public schools. |
|stderr_rel_attend_unwgt_instate |double |Standard error on the rel_attend_unwgt_instate variable. |
|stderr_rel_attend_unwgt_oostate |double |Standard error on the rel_attend_unwgt_oostate variable. |
|rel_apply_unwgt_instate |double |Unweighted relative estimates for instate students. Only available for public schools. |
|rel_apply_unwgt_oostate |double |Unweighted relative estimates for out-of-state students. Only available for public schools. |
|stderr_rel_apply_unwgt_instate |double |Standard error on the rel_apply_unwgt_instate variable. |
|stderr_rel_apply_unwgt_oostate |double |Standard error on the rel_apply_unwgt_oostate variable. |
|rel_att_cond_app_unwgt_instate |double |Unweighted estimates for instate students. Only available for public schools. |
|rel_att_cond_app_unwgt_oostate |double |Unweighted estimates for out-of-state students. Only available for public schools. |
|public |logical |Indicator for public universities. |
|flagship |logical |Indicator for public flagship universities (defined using the College Board Annual Survey of Colleges, 2016). |
|tier |character |Selectivity and type combination: Ivy-Plus (Ivy League colleges plus Stanford, Chicago, Duke, and MIT); Other elite college (Barron’s top selectivity category, other than the Ivy-plus, both public and private combined); Highly selective public college (Barron’s 2nd selectivity group); Highly selective private college (Barron’s 2nd selectivity group); Selective public college (Barron’s 3rd, 4th, and 5th selectivity groups); Selective private college (Barron’s 3rd, 4th, and 5th selectivity groups) See Chetty, Friedman, Saez, Turner, and Yagan (2020) for more information on how the tier is defined. |
|test_band_tier |character |School group for the test-score band statistics. |

### Cleaning Script

```r
# Mostly clean data provided by https://opportunityinsights.org.
library(tidyverse)

data_url <- "https://opportunityinsights.org/wp-content/uploads/2023/07/CollegeAdmissions_Data.csv"
college_admissions <- readr::read_csv(data_url) |>
# Drop redundant variables.
dplyr::select(
-"tier_name"
) |>
# Recode variables.
dplyr::mutate(
public = public == "Public",
flagship = as.logical(flagship)
)
```
Binary file added data/2024/2024-09-10/utd-access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions data/2024/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,5 @@ Archive of datasets and articles from the 2024 series of `#TidyTuesday` events.
| 34 | `2024-08-20` | [English Monarchs and Marriages](2024-08-20/readme.md) | [A list of Monarchs by marriage](https://www.ianvisits.co.uk/articles/a-list-of-monarchs-by-marriage-6857/) | [monarchs and marriages](github.com/frankiethull/english_monarch_marriages) |
| 35 | `2024-08-27` | [The Power Rangers Franchise](2024-08-27/readme.md) | [Power Rangers: Seasons and episodes data](https://www.kaggle.com/datasets/karetnikovn/power-rangers-dataset/data) | [National Power Rangers Day (August 28)](https://www.nationaldaycalendar.com/national-day/national-power-rangers-day-august-28) |
| 36 | `2024-09-03` | [Stack Overflow Annual Developer Survey 2024](2024-09-03/readme.md) | [Stack Overflow Annual Developer Survey 2024](https://survey.stackoverflow.co/) | [Stack Overflow Annual Developer Survey Results](https://survey.stackoverflow.co/2024/) |
| 37 | `2024-09-10` | [Economic Diversity and Student Outcomes](2024-09-10/readme.md) | [Opportunity Insights: College-Level Data for 139 Selective American Colleges
](https://opportunityinsights.org/data/) | [Economic diversity and student outcomes at the University of Texas at Dallas](https://www.nytimes.com/interactive/projects/college-mobility/university-of-texas-at-dallas) |
1 change: 1 addition & 0 deletions static/tt_data_type.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Week,Date,year,data_files,data_type,delim
37,2024-09-10,2024,college_admissions.csv,csv,","
36,2024-09-03,2024,qname_levels_single_response_crosswalk.csv,csv,","
36,2024-09-03,2024,stackoverflow_survey_questions.csv,csv,","
36,2024-09-03,2024,stackoverflow_survey_single_response.csv,csv,","
Expand Down

0 comments on commit 0940b2b

Please sign in to comment.