spaCy projects let you manage and share end-to-end spaCy workflows for different use cases and domains, and orchestrate training, packaging and serving your custom pipelines. You can start off by cloning a pre-defined project template, adjust it to fit your needs, load in your data, train a pipeline, export it as a Python package, upload your outputs to a remote storage and share your results with your team.
⚠️ spaCy project templates require spaCy v3. You can install it from pip withpip install spacy
or conda withconda install spacy -c conda-forge
. Make sure to use a fresh virtual environment.See the
master
branch for the previous version of this repo.
Name | Description |
---|---|
pipelines |
Templates for training NLP pipelines with different components on different corpora. |
tutorials |
Templates that work through a specific NLP use case end-to-end. |
integrations |
Templates showing integrations with third-party libraries and tools for managing your data and experiments, iterating on demos and prototypes and shipping your models into production. |
benchmarks |
Templates to reproduce our benchmarks and produce quantifiable results that are easy to compare against other systems or versions of spaCy. |
experimental |
Experimental workflows and other cutting-edge stuff to use at your own risk. |
Projects can be used via the new
spacy project
CLI. To find out more about
a command, add --help
. For detailed instructions, see the
usage guide.
- Clone the project template you want to use.
python -m spacy project clone tutorials/ner_fashion_brands
- Fetch assets (data, weights) defined in the
project.yml
.cd ner_fashion_brands python -m spacy project assets
- Run a command defined in the
project.yml
.python -m spacy project run preprocess
- Run a workflow of multiple steps in order.
python -m spacy project run all
- Adjust the template for your specific use case, load in your own data, adjust the settings and model and share the result with your team.
To keep the project templates and their documentation up to date, this repo contains several scripts:
Script | Description |
---|---|
update_docs.py |
Update all auto-generated docs in the given root. Calls into spacy project document and only replaces the auto-generated sections, not any custom content before or after. |
update_category_docs.py |
Update the auto-generated README.md in the category directories listing the available project templates. |
update_configs.py |
Update and auto-fill all config.cfg files included in the repo, similar to spacy init fill-config . Can be used to keep the configs up to date with changes in spaCy. |
update_projects_jsonl.py |
Update projects.jsonl file in the given root. Should be used at the root level of the repo. |