Skip to content

Data, codebook, and models to automatically detect storytelling.

Notifications You must be signed in to change notification settings

maria-antoniak/storyseeker

Repository files navigation

🔭StorySeeker

This repository contains data, a codebook, and other resources for the detection of storytelling in online communities.


Preprint

If you use our data, codebook, or models, please cite the following preprint.

Where do people tell stories online? Story Detection Across Online Communities
Maria Antoniak, Joel Mire, Maarten Sap, Elliott Ash, Andrew Piper


Quick Start with Colab

You can view a demonstration of how to load our annotations, fetch the texts, load our fine-tuned model from Hugging Face, and run predictions. If you use the Colab link, you don't need to download anything or set up anything on your local machine; everything will run in your internet browser.

Colab: link

Github: link


🔭StorySeeker Dataset

This dataset includes 502 texts annotated with story- and event-spans.

You can view the data annotations here.

We sampled Reddit posts and comments from the Webis-TLDR-17 dataset. You must "rehydrate" the data by linking to the original dataset using the id column in our CSV.

We assign each of the top 500 subreddits in the dataset to a thematic category. These 33 categories can be found here.


🔭StorySeeker Codebook

Our definition of storytelling and our full codebook can be found here.


🔭StorySeeker Models

The document classification model is available here and can be accessed via Hugging Face.


Questions

Please open an issue or contact Maria Antoniak with any questions.

About

Data, codebook, and models to automatically detect storytelling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published