WhoPaysWriters

UPDATE: WhoPaysWriters.com asked that their data not be posted on a third-party site, so the datasets have been removed. Please email me with any questions.

A data scrape and analysis of WhoPaysWriters.com. A summary of the results can be found here. Collected for an article in the Columbia Journalism Review. Questions and suggestions for improvement are welcome: kevinrmcelwee@gmail.com.

WhoPaysWriters.com is an anonymous platform where freelance journalists post details about their compensation. There were approximately 3000 submissions to the site from 2012-2018, making it the largest publicly-available dataset available of its kind. Journalists not only submit their pay, but also include information about their rights, their relationship with the editor, and other contextual data.

`scrapeWPW.py`

This script opens creates three kinds of CSVs:

publications.csv, which lists all publications scraped from the opening webpage.
A CSV created for each publication's page under the data folder.
allData_raw.csv, which is one CSV of everything in data. It requires that the user download ChromeDriver in addition to its python packages.

`Clean_Data.ipynb`

Cleans data for analysis. Other than normal cleaning, here are some decisions made:

I replaced most other entries with NaNs.
I dropped everything with fewer than 100 words.
I dropped all fiction and poetry entries.
I removed entries for 2019.
Potential spam, unreasonable outliers are cut. They are addressed on a case-by-case basis. This notebook creates allData_clean.csv, what is ultimately used for analysis.

`Explore_Data.ipynb`

Explores most 2-variable relationships and creates appropriate graphs for study. Also creates publications_rank.csv, which uses rankings from totalPaid, wordRate, daysToBePaid, and paymentDifficulty to rank publications with more than 7 submissions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhoPaysWriters

`scrapeWPW.py`

`Clean_Data.ipynb`

`Explore_Data.ipynb`

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
Clean_Data.ipynb		Clean_Data.ipynb
Explore_Data.ipynb		Explore_Data.ipynb
publications.csv		publications.csv
publications_rank.csv		publications_rank.csv
readme.md		readme.md
scrapeWPW.py		scrapeWPW.py

kmcelwee/WhoPaysWriters

Folders and files

Latest commit

History

Repository files navigation

WhoPaysWriters

scrapeWPW.py

Clean_Data.ipynb

Explore_Data.ipynb

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`scrapeWPW.py`

`Clean_Data.ipynb`

`Explore_Data.ipynb`

Packages