shakespeare-and-company-online-readership

This repository provides data and code to accompany "The Afterlives of Shakespeare and Company in Online Social Readership."

Matching Shakespeare and Company with Goodreads

This project contributes a matching between works in the Shakespeare and Company Project and works in Goodreads. We were able to match and manually verify 4460 of the Shakespeare and Company book URIs to Goodreads book IDs. We additionally consolidated Goodreads metadata for these matched works.

data/goodreads-book-id-to-sc-uri_full-matching.json: a JSON dictionary mapping Goodreads book ID to SC book URI
data/matched-goodreads-metadata.json: a JSON list containing a dictionary for each matched Goodreads book. Example metadata keys are the year of publication (yearFirstPublished) and number of reviews (numReviews).

Code requirements

This code needs Python 3. You can install the other dependencies with:

pip3 install -r requirements.txt

You will also need to download the Bayesian Core-Periphery Stochastic Block Models and place the directory core_periphery_sbm into the current directory.
We additionally require version 1.1 of the data from the Shakespeare and Company Project. Please download the following files and place them in the data directory:

SCoData_books_v1.1_2021_01.json: the Shakespeare and Company books dataset
SCoData_members_v1.1_2021_01.json: the Shakespeare and Company members dataset
SCoData_events_v1.1_2021_01.json: the Shakespeare and Company events dataset

We scraped the Goodreads metadata using the Goodreads Scraper.

Other included data

In addition to data from the Shakespeare and Company Project, this project uses a preprocessed subset of the Goodreads data in the UCSD Book Graph.

We further restrict our analysis to 1511 titles that are 1) in both the Shakespeare and Company dataset and the UCSD Book Graph and 2) have at least one neighboring vertex in the graphs we construct. All remaining files contain only data for these 1511 titles.

Preprocessed data from the UCSD Book Graph:

data/book-uris-in-both-goodreads-and-sc.json: the URIs of books in both SC and Goodreads
data/goodreads-book-id-to-text.json: dict mapping Goodreads book ID to summary string
data/goodreads-user-to-books.json: dict mapping Goodreads user ID to a list of books the user interacted with
data/goodreads-book-id-to-num-ratings.json: dict mapping Goodreads book ID to number of user ratings on Goodreads

There are also files listing the descriptive text for each book:

data/sc-book-names.json: descriptive text for books in Shakespeare and Company
data/goodreads-book-names.json: descriptive text for books in Goodreads

And finally dictionaries linking books across SC and Goodreads:

data/goodreads-book-id-to-sc-uri.json: dict mapping Goodreads book ID to SC book URI
data/goodreads-text-to-sc-text.json: dict mapping Goodreads book summary string to SC book summary string

Reproducing the results in the article

All figures are saved in the figures subdirectory.

Scripts in the connect-to-goodreads directory perform the initial matching between SC and Goodreads books. These rely on the Goodreads API, which is now deprecated.
popularity_plots.ipynb: implements the article section "Comparing Popularity in SC and Goodreads".
plot-relative-popularity-by-year.py: plots the relative popularity by year across Goodreads and SC.
compare-neighbor-distributions.py: implements the article section "Comparing reading patterns of poular books".
core-periphery-books.ipynb: implements the network centrality analysis in the article section "Comparing network roles of popular books".

Further example of how to use the graphs

Graphs are constructed for datasets from Shakespeare and Company and Goodreads. Vertices correspond to books. Edges correspond to people: two books have an edge between them if the same user interacted with both books.

Check out example.py for some sample code that shows how to:

Print summary statistics for the graphs
Find out information about a specific book in the graph

As an example, it shows that 'Hippolytus' by Euripides has an edge to only five other books in the Shakespeare and Company graph but is connected to 68 books (many of which are 'classics') in the Goodreads graph.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
connect-to-goodreads		connect-to-goodreads
data		data
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
author-popularity-interactive.html		author-popularity-interactive.html
author-popularity.html		author-popularity.html
community_detection.py		community_detection.py
compare-neighbor-distributions.py		compare-neighbor-distributions.py
core-periphery-books.ipynb		core-periphery-books.ipynb
example.py		example.py
graph.py		graph.py
plot-relative-popularity-by-year.py		plot-relative-popularity-by-year.py
popularity-by-year-interactive.ipynb		popularity-by-year-interactive.ipynb
popularity_plots.ipynb		popularity_plots.ipynb
relative-popularity-interactive.html		relative-popularity-interactive.html
requirements.txt		requirements.txt
run-community-detection.py		run-community-detection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shakespeare-and-company-online-readership

Matching Shakespeare and Company with Goodreads

Code requirements

Other included data

Reproducing the results in the article

Further example of how to use the graphs

About

Releases

Packages

Contributors 5

Languages

License

gyauney/shakespeare-and-company-social-readership

Folders and files

Latest commit

History

Repository files navigation

shakespeare-and-company-online-readership

Matching Shakespeare and Company with Goodreads

Code requirements

Other included data

Reproducing the results in the article

Further example of how to use the graphs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages