Skip to content

kmcelwee/WhoPaysWriters

Repository files navigation

WhoPaysWriters

UPDATE: WhoPaysWriters.com asked that their data not be posted on a third-party site, so the datasets have been removed. Please email me with any questions.


A data scrape and analysis of WhoPaysWriters.com. A summary of the results can be found here. Collected for an article in the Columbia Journalism Review. Questions and suggestions for improvement are welcome: kevinrmcelwee@gmail.com.


WhoPaysWriters.com is an anonymous platform where freelance journalists post details about their compensation. There were approximately 3000 submissions to the site from 2012-2018, making it the largest publicly-available dataset available of its kind. Journalists not only submit their pay, but also include information about their rights, their relationship with the editor, and other contextual data.

scrapeWPW.py

This script opens creates three kinds of CSVs:

  • publications.csv, which lists all publications scraped from the opening webpage.
  • A CSV created for each publication's page under the data folder.
  • allData_raw.csv, which is one CSV of everything in data. It requires that the user download ChromeDriver in addition to its python packages.

Clean_Data.ipynb

Cleans data for analysis. Other than normal cleaning, here are some decisions made:

  • I replaced most other entries with NaNs.
  • I dropped everything with fewer than 100 words.
  • I dropped all fiction and poetry entries.
  • I removed entries for 2019.
  • Potential spam, unreasonable outliers are cut. They are addressed on a case-by-case basis. This notebook creates allData_clean.csv, what is ultimately used for analysis.

Explore_Data.ipynb

Explores most 2-variable relationships and creates appropriate graphs for study. Also creates publications_rank.csv, which uses rankings from totalPaid, wordRate, daysToBePaid, and paymentDifficulty to rank publications with more than 7 submissions.