This repository contains data, a codebook, and other resources for the detection of storytelling in online communities.
If you use our data, codebook, or models, please cite the following preprint.
Where do people tell stories online? Story Detection Across Online Communities
Maria Antoniak, Joel Mire, Maarten Sap, Elliott Ash, Andrew Piper
You can view a demonstration of how to load our annotations, fetch the texts, load our fine-tuned model from Hugging Face, and run predictions. If you use the Colab link, you don't need to download anything or set up anything on your local machine; everything will run in your internet browser.
Colab: link
Github: link
This dataset includes 502 texts annotated with story- and event-spans.
You can view the data annotations here.
We sampled Reddit posts and comments from the Webis-TLDR-17 dataset. You must "rehydrate" the data by linking to the original dataset using the id
column in our CSV.
We assign each of the top 500 subreddits in the dataset to a thematic category. These 33 categories can be found here.
Our definition of storytelling and our full codebook can be found here.
The document classification model is available here and can be accessed via Hugging Face.
Please open an issue or contact Maria Antoniak with any questions.