If you haven't already, check out the quickstart guide on Feast's website (http://docs.feast.dev/quickstart), which
uses this repo. A quick view of what's in this repository's feature_repo/
directory:
data/
contains raw demo parquet datafeature_repo/example_repo.py
contains demo feature definitionsfeature_repo/feature_store.yaml
contains a demo setup configuring where data sources arefeature_repo/test_workflow.py
showcases how to run all key Feast commands, including defining, retrieving, and pushing features.
You can run the overall workflow with python test_workflow.py
.
See more details in Running Feast in production
- First: you should start with a different Feast template, which delegates to a more scalable offline store.
- For example, running
feast init -t gcp
orfeast init -t aws
orfeast init -t snowflake
. - You can see your options if you run
feast init --help
.
- For example, running
feature_store.yaml
points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a SQL registry. See registry docs for more details.- This example uses a file offline store to generate training data. It does not scale. We recommend instead using a data warehouse such as BigQuery, Snowflake, Redshift. There is experimental support for Spark as well.
- Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See docs.
- (optional) Regularly scheduled materialization to power low latency feature retrieval (e.g. via Airflow). See Batch data ingestion for more details.
- (optional) Deploy feature server instances with
feast serve
to expose endpoints to retrieve online features.- See Python feature server for details.
- Use cases can also directly call the Feast client to fetch features as per Feature retrieval