Skip to content

Code and data for the "Sonnet or Not, Bot?" paper published at EMNLP Findings 2024

Notifications You must be signed in to change notification settings

maria-antoniak/poetry-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sonnet or Not, Bot?

This repository contains code and data for the following research study.

Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets
Melanie Walsh, Anna Preus, Maria Antoniak
EMNLP Findings 2024

Please cite this paper when using resources found in this repository.



Data

The data in this repository includes:

  • 1.4k+ public domain poems tagged by poetic form by the Poetry Foundation, the Academy of American Poets, or both — with accompanying metadata such as subject tags and author birth and death dates where available
  • retrieval metadata from Dolma using the WIMBD platform including source domains for each detected poem
  • memorization predictions using n-gram overlap between true poems and generated poem continuations by GPT-4



Code

The code in this repository includes:

  • a Python notebook demonstrating how to query for data from Dolma using the WIMBD platform
  • a Python notebook analyzing the query data from Dolma
  • a Python notebeook demonstrating the memorization experiments
  • Python scripts demonstrating how to prompt models for the poetry form classifcation task
  • a Python notebook demonstrating analysis of classification results

About

Code and data for the "Sonnet or Not, Bot?" paper published at EMNLP Findings 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published