A framework built on top of Ploomber that allows code-first definition of pipelines. No YAML needed!
To get the minimum code needed to use the pipelines, install it from PyPI:
pip install code-first-pipelines
import pandas as pd
from sklearn import datasets
from cf_pipelines import Pipeline
iris_pipeline = Pipeline("My Cool Pipeline")
@iris_pipeline.step("Data ingestion")
def data_ingestion():
d = datasets.load_iris()
df = pd.DataFrame(d["data"])
df.columns = d["feature_names"]
df["target"] = d["target"]
return {"raw_data.csv": df}
iris_pipeline.run()
See the tutorial notebook for a more comprehensive example.
import pandas as pd
from sklearn import datasets
from cf_pipelines.ml import MLPipeline
iris_pipeline = MLPipeline("My Cool Pipeline")
@iris_pipeline.data_ingestion
def data_ingestion():
d = datasets.load_iris()
df = pd.DataFrame(d["data"])
df.columns = d["feature_names"]
df["target"] = d["target"]
return {"raw_data.csv": df}
iris_pipeline.run()
See the tutorial notebook for a more comprehensive example.
Once installed, you can create a new pipeline template by running:
pipelines new [pipeline name]