Skip to content

Latest commit

 

History

History
98 lines (76 loc) · 3.74 KB

README.md

File metadata and controls

98 lines (76 loc) · 3.74 KB

Abalone Kaggel Contest Industrialization

CI status Python Version

Code style: black Imports: isort Linting: ruff Pre-commit

Authors: Mykyta Alekseiev, Elizaveta Barysheva, Joao Melo, Thomas Schneider, Harshit Shangari and Maria Stoelben

Description

This repository has for purpose to industrialize the Abalone age prediction Kaggle contest.

The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age.

Goal: predict the age of abalone (column "Rings") from physical measurements ("Shell weight", "Diameter", etc...)

You can download the dataset on the Kaggle page.

Note that we add a column "age" to the dataset which corresponds to the number of rings plus 1.5 and predict this age as detailed in the kaggle link above.

Setup

Use a virtual environment to install the dependencies of the project:

conda env create --file environment.yml
conda activate <envname>

Install the following requirements (except if you directly jump to Running the FastAPI application):

pip install -r requirements-dev.txt
pre-commit install

Training and saving model

Set the API URL for prefect:

prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api

Check that you have SQLite installed (Prefect backend database system):

sqlite3 --version

Start a local prefect server:

prefect server start --host 0.0.0.0

In order to build the model, run:

python src/modelling/main.py

You can visit the UI at http://0.0.0.0:4200/dashboard and checkout the flow runs.

If you want to reset the database, run:

prefect server database reset

⚠️ We assumed that the prefect flows were not supposed be deployed. If this should be achieved, replace the call of the main function in main.py with the following code:

from prefect import serve

main_deploy = main.to_deployment(
    name="train",
    cron="0 0 1 * *",  # run once a month on the first day at midnight
    parameters={
        "trainset_path": args.trainset_path,
        "model_path": args.model_path,
    },
)
serve(main_deploy)

To checkout the mlflow experiments, run:

mlflow ui --host 0.0.0.0 --port 5002

Running the FastAPI application

Build the docker image from the Dockerfile:

docker build -t abalone-age-prediction -f Dockerfile .

Run the docker container from the created image:

docker run -p 8000:8000 abalone-age-prediction

To get access to the FastAPI dashboard use this url: http://0.0.0.0:8000/docs