Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 676 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 676 Bytes

EAD-corpus

A collection of encoded archival description XML documents for text and content analysis.

These materials were collected between January and April of 2024 to form an unannotated text corpus for content analysis. Text was identified and extracted using the eadretrieve wrapper script invoking xmllint and compiled with Unix text processing utilities.

Elements selected for the project include abstract, scopecontent, bioghist, custodhist, and head. Content was evaluated using the style command, and results listed in the readability-data table.