Skip to content

Latest commit

 

History

History
54 lines (37 loc) · 1.62 KB

1-local-reproducibility.md

File metadata and controls

54 lines (37 loc) · 1.62 KB

Local reproducibility

We have a DVC Pipeline defined in dvc.yaml file.

The pipeline is composed of stages using Python scripts, defined in src:

flowchart TD
        node2[eval]
        node3[get-data]
        node4[split-data]
        node5[train]
        node3-->node4
        node4-->node2
        node4-->node5
        node5-->node2
Loading

We use DVC Params, defined in params.yaml, to configure the pipeline.

The pipeline enables local reproducibility and can be run with dvc repro / dvc exp run:

$ export GITHUB_TOKEN={YOUR_GITHUB_TOKEN}
$ export LOGURU_LEVEL=INFO
$ dvc exp run -S train.epochs=8

The pipeline generates DVC Metrics and DVC Plots to evaluate model performance, which can be found in outs

$ dvc exp diff
$ dvc plots diff --open

Because the metrics and plots files are small enough to be tracked by git, after we run the pipeline we can share the results with others:

git add `dvc.lock` outs
git push

You can connect the repo with https://studio.iterative.ai/ in order to have a better visualization for the metrics, parameters and plots associated to each commit:

https://studio.iterative.ai/user/daavoo/views/workshop-uncool-mlops-5fgmd70rkt

However, the rest of the outputs are gitignored because they are too big to be tracked by git.

Bigger Boat