Sonnet or Not, Bot?

This repository contains code and data for the following research study.

Please cite this paper when using resources found in this repository.

Data

The data in this repository includes:

1.4k+ public domain poems tagged by poetic form by the Poetry Foundation, the Academy of American Poets, or both — with accompanying metadata such as subject tags and author birth and death dates where available
retrieval metadata from Dolma using the WIMBD platform including source domains for each detected poem
memorization predictions using n-gram overlap between true poems and generated poem continuations by GPT-4

The code in this repository includes:

a Python notebook demonstrating how to query for data from Dolma using the WIMBD platform
a Python notebook analyzing the query data from Dolma
a Python notebeook demonstrating the memorization experiments
Python scripts demonstrating how to prompt models for the poetry form classifcation task
a Python notebook demonstrating analysis of classification results

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
analysis		analysis
data		data
llm-poetic-form-tagging-scripts		llm-poetic-form-tagging-scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md