MLOps pipeline with DVC and CML using Github Actions and IBM Cloud
Documentation and Implementation Guide
- Data Versioning: DVC
- Machine Learning Pipeline: DVC Pipeline (preprocess, train, evaluate)
- CI/CD: Unit testing with Pytest, pre-commit and Github Actions
- CML: Continuous Machine Learning and Github Actions
- Deploy on release: Github Actions and IBM Watson
- Monitoring: OpenScale
- Infrastructure-as-a-code: Terraform script
- DVC
- Python3 and pip
- Access to IBM Cloud Object Storage
Setup your credentials on ~/.aws/credentials
and ~/.aws/config
. DVC works perfectly with IBM Obejct Storage, although it uses S3 protocol, you can also see this in other portions of the repository.
~/.aws/credentials
[default]
aws_access_key_id = {{Key ID}}
aws_secret_access_key = {{Access Key}}
In order to activate pre-commit testing you need pre-commit
Installing pre-commit with pip
pip install pre-commit
Installing pre-commit on your local repository. Keep in mind this creates a Github Hook.
pre-commit install
Now everytime you make a commit, it will run some tests defined on .pre-commit-config.yaml
before allowing your commit.
Example
$ git commit -m "Example commit"
black....................................................................Passed
pytest-check.............................................................Passed
Download data from the DVC repository(analog to git pull
)
dvc pull
Reproduces the pipeline using DVC
dvc repro
✂️ Preprocessing pipeline
dvc run -n preprocess -d ./src/preprocess_data.py -d data/weatherAUS.csv \
-o ./data/weatherAUS_processed.csv -o ./data/features.csv \
python3 ./src/preprocess_data.py ./data/weatherAUS.csv
📘 Training pipeline
dvc run -n train -d ./src/train.py -d ./data/weatherAUS_processed.csv \
-d ./src/model.py \
-o ./models/model.joblib \
python3 ./src/train.py ./data/weatherAUS_processed.csv ./src/model.py 200
📊 Evaluate pipeline
dvc run -n evaluate -d ./src/evaluate.py -d ./data/weatherAUS_processed.csv \
-d ./src/model.py -d ./models/model.joblib -o ./results/metrics.json \
-o ./results/precision_recall_curve.png -o ./results/roc_curve.png \
python3 ./src/evaluate.py ./data/weatherAUS_processed.csv ./src/model.py ./models/model.joblib
🔐 IBM Credentials
Fill the credentials_example.yaml
file and rename it to credentials.yaml
to be able to run the scripts that require IBM keys.
To use Git Actions to deploy your model, you'll need to encrypt it, to do that run the command bellow and choose a strong password.
gpg --symmetric --cipher-algo AES256 credentials.yaml
Now in the GitHub page for the repository, go to Settings->Secrets
and add the keys to the following secrets:
AWS_ACCESS_KEY_ID (Bucket Credential)
AWS_SECRET_ACCESS_KEY (Bucket Credential)
IBM_CREDENTIALS_PASS (password for the encrypted file)