Welcome to our repository! Our project is an exploration of recent trends in song popularity based on global phenomena.
Please see RUNNING.md
.
All the raw data about economic factors and happiness scores can be found in the pre-etl_data directory, which contains:
gdp_per_capita
directory: Contains an Excel file for each country that contains their annual GDP per capita in USD (source: Statista).inflation
directory: Contains an Excel file for each country that contains their annual inflation rate (source: Statista).unemployment
directory: Contains an Excel file for each country that contains their annual unemployment rate (source: Statista).happiness.xls
: An Excel file that contains the annual happiness scores out of 10 for each country (source: World Happiness Report).
Note: There are two datasets for Spotify tracks which are too big for GitHub and must be downloaded from Kaggle:
Song Dataset
: Database with details about top songs in each country per week in each year.Lyrics Dataset
: Database with some lyrics from the Genius website.
We use the Genius API to grab songs with lyrics.
Then, we use two language libraries in Python to detect the language of the lyrics: langdetect
and pycld2
by using language_etl
We use Hugging Face models to predict the mood of our lyrics, by separating them into three categories.
All the related code in this directory is to run the sentiment analysis portion of our project and store in moods data
.
We also have a PowerBI dashboard to show what we found based on our cleaned data. You can download the file and upload it to your SFU PowerBI workspace to view and interact with it.