Bring me the Horizon Bibliography - Documentation

Corpus

The corpus included in this github repository is a corpus of song lyrics from the British Metalcore band Bring Me The Horizon. In the texts folder are 6 folders, corresponding to their six albums with commercial success, namely: ‘Suicide Season’, ‘There Is A hell Believe Me I’ve Seen It. There is a Heaven Let’s Keep It A Secret’, ‘Sempiternal’, ‘ That’s the Spirit’, ‘Amo’ and ‘Post Human: Survival Horror’.

Audice and use

The target audience for this project is people interested in lyrics analyses, programming, but also people who are fans of BMTH. This corpus can be used to have a handy CSV/folder of lyrics of the six most commercially viable BMTH albums, and can then also be used to analyze their music, style, word-use etc. It is also used to conserve the lyrics, since I could not find any csv files with the corpus of lyrics from this band online.

Text selection

This corpus only contains all albums released by Bring Me The Horizon from 2008 until now. BMTH has released an album before 2008, but it was not that popular. Besides this, the band has released many EP’s, singles and features, which were not included in the corpus. Only commercially viable albums have been chosen in this corpus, so that they can be compared.

Data Collection

The lyrics have been manually copied to txt files from the website www.azlyrics.com, without altering the structure of the lyrics, except for deleting vocal attribution since it is not necessary for the uses of our dataset and inconsistent. Doing this I checked if the lyrics made sense, there were no mistakes and if there are redundancies.

Pre-processing

The bare lyrics are in txt files, one txt file per song. Furthermore, the txt files are spread over 6 folders, each folder corresponding to a album. These files have been converted to a CSV file using python. Using python, the CSV file has been cleaned, deleting newlines and transforming all text to lowercase.

Annotations

To the original CSV I used the spaCy library to add Doc, Tokens, Lemma's, Part-Of-Speech and Named_Entities and their corresponding NE_Words to the songs. To make this easier I have converted the CSV into a dataframe.

Description of columns in annotated CSV

Variable	Description
Album	The album this particular song belongs to
Song Name	The name of the BMTH song
Lyrics	the lyrics of the song as formatted in the txt
Cleaned Lyrics	The lyrics without whitespace and lowercase
Doc	spaCy processed text document
Tokens	The tokenized text of the song lyrics
Lemmas	The lemmatized text of the song lyrics
POS	Part of speech of that the tokens in the song lyrics
Named_Entities	The categories of Named Entities that are in the song lyrics
NE_Words	The actual words of the Named Entities in the song lyrics

## Format The individual song lyrics are in TXT, and the complete set is in CSV.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
texts		texts
BMTH lyrics annotation.ipynb		BMTH lyrics annotation.ipynb
BMTH_Discography.csv		BMTH_Discography.csv
BMTH_Discography_annotation.csv		BMTH_Discography_annotation.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bring me the Horizon Bibliography - Documentation

Corpus

Audice and use

Text selection

Data Collection

Pre-processing

Annotations

Description of columns in annotated CSV

About

Releases

Packages

Languages

hbredewold/BMTH_Bibliography

Folders and files

Latest commit

History

Repository files navigation

Bring me the Horizon Bibliography - Documentation

Corpus

Audice and use

Text selection

Data Collection

Pre-processing

Annotations

Description of columns in annotated CSV

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages