This is a simple youtube music/video recommender. You can checkout the deployed solution on Heroku:
https://thawing-shore-99052.herokuapp.com/
The problem is solved by:
- Scrapping video data from Youtube pages;
- Extracting the video information from each page;
- Preprocess data from each video into a single dataset;
- Manually label some of samples, active learning the rest;
- Extract features from the dataset;
- Train a Random Forest and a LightGBM model and ensamble them;
- Build a simple app to serve the model through Heroku.
Clone the repository and then go to /src/data_science folder.
- Scrap data from Youtube pages using:
python search_data_collection.py
- Extract information from the pages by:
python search_data_parsing.py
- Process the video data using
python video_data_processing.py
- Manually label the data creating a new column named "y".
- Fit the final model using:
python final_model.py
Now that the model is trained, we can deploy it. Go to /src/deploy folder.
Start the database with:
python db_starter.py
Then run the app with:
python app.py
Build the docker image:
docker build . -t deploy_ytr
Run the Docker image:
docker run -e PORT=8000 -p 80:80 deploy_ytr