This repository provides data and code to accompany "The Afterlives of Shakespeare and Company in Online Social Readership."
This project contributes a matching between works in the Shakespeare and Company Project and works in Goodreads. We were able to match and manually verify 4460 of the Shakespeare and Company book URIs to Goodreads book IDs. We additionally consolidated Goodreads metadata for these matched works.
data/goodreads-book-id-to-sc-uri_full-matching.json
: a JSON dictionary mapping Goodreads book ID to SC book URIdata/matched-goodreads-metadata.json
: a JSON list containing a dictionary for each matched Goodreads book. Example metadata keys are the year of publication (yearFirstPublished
) and number of reviews (numReviews
).
- This code needs Python 3. You can install the other dependencies with:
pip3 install -r requirements.txt
-
You will also need to download the Bayesian Core-Periphery Stochastic Block Models and place the directory
core_periphery_sbm
into the current directory. -
We additionally require version 1.1 of the data from the Shakespeare and Company Project. Please download the following files and place them in the
data
directory:
SCoData_books_v1.1_2021_01.json
: the Shakespeare and Company books datasetSCoData_members_v1.1_2021_01.json
: the Shakespeare and Company members datasetSCoData_events_v1.1_2021_01.json
: the Shakespeare and Company events dataset
We scraped the Goodreads metadata using the Goodreads Scraper.
In addition to data from the Shakespeare and Company Project, this project uses a preprocessed subset of the Goodreads data in the UCSD Book Graph.
We further restrict our analysis to 1511 titles that are 1) in both the Shakespeare and Company dataset and the UCSD Book Graph and 2) have at least one neighboring vertex in the graphs we construct. All remaining files contain only data for these 1511 titles.
Preprocessed data from the UCSD Book Graph:
data/book-uris-in-both-goodreads-and-sc.json
: the URIs of books in both SC and Goodreadsdata/goodreads-book-id-to-text.json
: dict mapping Goodreads book ID to summary stringdata/goodreads-user-to-books.json
: dict mapping Goodreads user ID to a list of books the user interacted withdata/goodreads-book-id-to-num-ratings.json
: dict mapping Goodreads book ID to number of user ratings on Goodreads
There are also files listing the descriptive text for each book:
data/sc-book-names.json
: descriptive text for books in Shakespeare and Companydata/goodreads-book-names.json
: descriptive text for books in Goodreads
And finally dictionaries linking books across SC and Goodreads:
data/goodreads-book-id-to-sc-uri.json
: dict mapping Goodreads book ID to SC book URIdata/goodreads-text-to-sc-text.json
: dict mapping Goodreads book summary string to SC book summary string
All figures are saved in the figures
subdirectory.
-
Scripts in the
connect-to-goodreads
directory perform the initial matching between SC and Goodreads books. These rely on the Goodreads API, which is now deprecated. -
popularity_plots.ipynb
: implements the article section "Comparing Popularity in SC and Goodreads". -
plot-relative-popularity-by-year.py
: plots the relative popularity by year across Goodreads and SC. -
compare-neighbor-distributions.py
: implements the article section "Comparing reading patterns of poular books". -
core-periphery-books.ipynb
: implements the network centrality analysis in the article section "Comparing network roles of popular books".
Graphs are constructed for datasets from Shakespeare and Company and Goodreads. Vertices correspond to books. Edges correspond to people: two books have an edge between them if the same user interacted with both books.
Check out example.py
for some sample code that shows how to:
- Print summary statistics for the graphs
- Find out information about a specific book in the graph
As an example, it shows that 'Hippolytus' by Euripides has an edge to only five other books in the Shakespeare and Company graph but is connected to 68 books (many of which are 'classics') in the Goodreads graph.