Skip to content

The goal of this project was to get familiar with certain best practices and packages regarding MLOPS such as MLflow. For this purpose, we relied on the abalone age prediction Kaggle contest. This is a fork of a group project of my DSB Master's Degree at HEC Paris for our MLOPS course in cooperation with Artifact (https://www.artefact.com/)

Notifications You must be signed in to change notification settings

DataThomas/xhec-mlops-project-student

 
 

Repository files navigation

Abalone Kaggel Contest Industrialization

CI status Python Version

Code style: black Imports: isort Linting: ruff Pre-commit

Authors: Mykyta Alekseiev, Elizaveta Barysheva, Joao Melo, Thomas Schneider, Harshit Shangari and Maria Stoelben

Description

This repository has for purpose to industrialize the Abalone age prediction Kaggle contest.

The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age.

Goal: predict the age of abalone (column "Rings") from physical measurements ("Shell weight", "Diameter", etc...)

You can download the dataset on the Kaggle page.

Note that we add a column "age" to the dataset which corresponds to the number of rings plus 1.5 and predict this age as detailed in the kaggle link above.

Setup

Use a virtual environment to install the dependencies of the project:

conda env create --file environment.yml
conda activate <envname>

Install the following requirements (except if you directly jump to Running the FastAPI application):

pip install -r requirements-dev.txt
pre-commit install

Training and saving model

Set the API URL for prefect:

prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api

Check that you have SQLite installed (Prefect backend database system):

sqlite3 --version

Start a local prefect server:

prefect server start --host 0.0.0.0

In order to build the model, run:

python src/modelling/main.py

You can visit the UI at http://0.0.0.0:4200/dashboard and checkout the flow runs.

If you want to reset the database, run:

prefect server database reset

⚠️ We assumed that the prefect flows were not supposed be deployed. If this should be achieved, replace the call of the main function in main.py with the following code:

from prefect import serve

main_deploy = main.to_deployment(
    name="train",
    cron="0 0 1 * *",  # run once a month on the first day at midnight
    parameters={
        "trainset_path": args.trainset_path,
        "model_path": args.model_path,
    },
)
serve(main_deploy)

To checkout the mlflow experiments, run:

mlflow ui --host 0.0.0.0 --port 5002

Running the FastAPI application

Build the docker image from the Dockerfile:

docker build -t abalone-age-prediction -f Dockerfile .

Run the docker container from the created image:

docker run -p 8000:8000 abalone-age-prediction

To get access to the FastAPI dashboard use this url: http://0.0.0.0:8000/docs

About

The goal of this project was to get familiar with certain best practices and packages regarding MLOPS such as MLflow. For this purpose, we relied on the abalone age prediction Kaggle contest. This is a fork of a group project of my DSB Master's Degree at HEC Paris for our MLOPS course in cooperation with Artifact (https://www.artefact.com/)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 52.1%
  • Python 45.8%
  • Dockerfile 2.1%