The database, website, and API to present Holofood samples, and unify the datasets stored in supporting services.
HoloFood is a consortium and project focussed on understanding the biomolecular and physiological processes triggered by incorporating feed additives and novel sustainable feeds in farmed animals.
This codebase is the public website and API for browsing the Samples and datasets created by the project, which are stored in publicly-accessible data repositories.
The website is built with Django.
Management commands are used to import a cache of Samples and their metadata from ENA and Biosamples.
There is a normal Django Admin panel as well.
Install development tools (including pre-commit hooks to run Black code formatting).
pip install -r requirements-dev.txt
pre-commit install
Use Black. Use djLint. These are both configured if you install the pre-commit tools as above.
To manually run them:
black .
and djlint . --extension=html --lint
(or --reformat
).
Once a database is created and migrated (see below), there is a management command to fill the database with some minimal fake data for development ease.
python manage.py generate_dev_data
# You most likely need (see below):
# brew install chromedriver
pip install -r requirements-dev.txt
pytest
The web interface tests need the Chrome browser and chromedriver
to communicate with the browser.
To install chromedriver
on a Mac or Linux machine, use the Homebrew formula
or any other sensible installation method. On GitHub Actions, a "Setup Chromedriver" action step exists for this.
On a Mac, you’ll probably get Gate Keeper permissions problems running chromedriver
; so:
which chromedriver # probably: /usr/local/bin/chromedriver
spctl --add /usr/local/bin/chromedriver
If this doesn't work, open /usr/local/bin
, then find chromedriver
in Finder, right click, Open.
We use Pydantic to formalise Config files. Configuration is split between:
config/local.env
as a convenience for env vars.config/data_config.json
contains what are expected to be somewhat change-able config options.config/secrets.env
is needed during development, whilst some data are private.
source config/secrets.env
python manage.py migrate
python manage.py fetch_project_samples
python manage.py refresh_external_data
python manage.py runserver
TODO: update README once importer supports hierarchical samples
refresh_external_data
has several options, for fetching data for some or all samples/projects,
and for fetching data from some or all supporting APIs.
To (re)fetch sample metadata from ENA and Biosamples only, for a specific sample:
python manage.py refresh_external_data --samples SAMEA7687881 --types METADATA
or to refresh metagenomic data for all sample in a project:
python manage.py refresh_external_data --projects PRJEB39110 --types METAGENOMIC
(Note that it is much more efficient to fetch metagenomics data on a per-project basis than a per-sample basis – this is because it uses many fewer API calls and the MGnify API has strict rate-limits.)
Or to (re)fetch sample metadata and metagenomic existence data from ENA, Biosamples, and MGnify, for samples with accessions in a certain range:
python manage.py refresh_external_data --sample_filters accession__gt=SAMEA7687880 accession__lt=SAMEA7687900 --types METADATA METAGENOMIC
--sample_filters...
is expected to be useful in cases where refreshing a large number of samples fails,
and you therefore need to retry from a certain accession onwards.
(Because samples are iterated through in ascending order of accession.)
MAG and Viral catalogues can be imported using management commands. Each MAG catalogue needs to relate to a single public MAG catalogue on MGnify; in that the species representative of each MAG must exist on MGnify. Each Viral Catalogue must relate (biome-wise) to a single MAG catalogue on the portal. In other words, the order of data insertion needs to be:
- public MAG catalogue into MGnify
- MAG catalogue into this data portal
- Viral catalogue into this data portal
The uploaders expect TSV files (and a folder of GFFs in the case of viral catalogue).
For the format / column naming, inspect the files in holofood/tests/static_fixtures
.
Run python manage.py import_mag_catalogue
or import_viral_catalogue
for help, but essentially:
python manage.py import_mag_catalogue hf-salmon-mags-v1 ./salmon.tsv "HoloFood Salmon V1" mgnify-salmon-v1-0 "Some:Biome:String" salmon
python manage.py import_viral_catalogue hf-salmon-vir-v1 './salmon_viral_cat.tsv' './salmon_viral_annotations.gff' --title="HoloFood Salmon Viruses V1" --related_mag_catalogue_id=hf-salmon-mags-v1
The import_viral_catalogue
command can be run multiple times to populate the catalogue with several TSV/GFF combinations if needed –
fragments are appended to the existing catalogue if it already exists.
Superusers can do everything in the admin panel, including managing other users.
python manage.py createsuperuser
Superusers can go to e.g. http://localhost:8000/admin and create other users there.
"Staff" users can access the admin panel, but won't by default have permissions to do anything there. Add them to the "authors" user group to give them permissions to author "Analysis Summary" documents via the admin panel.
The Django application can be deployed to AWS Cloud via Elastic Beanstalk, a scalable web app deployment service.
There is an Elastic Beanstalk configuration in .ebextensions/django.config
.
This config will migrate the db on deployment, as well as compile scss styles and collect static files.
Run pip install -r requirements-aws.txt
to install the CLI tool for EB.
Create an
Elastic Beanstalk environment.
You need a Python 3.8 on Amazon Linux 2
platform, and an RDS Postgres database (a db.t4g.micro
instance is fine).
Run eb use <whatever-the-name-of-your-elastic-beanstalk-environment-is>
, e.g.
eb use holofood-data-portal-dev-env
.
Deploy the latest git commit with eb deploy
To log into the lead instance e.g. to run management commands:
eb ssh
cd /var/app/current
source ../venv/*/bin/activate
export $(/opt/elasticbeanstalk/bin/get-config --output YAML environment | sed -r 's/: /=/' | xargs)
# ^ this sources the env vars
python manage.py ...
Secret environment variables can be configured in the AWS EB Console.
If using the EBI AWS cloud, via an SSO login.
Refer to
the Confluence page on AWS SSO
for the SSO parameters. Use aws configure sso --profile eb-cli
to sign in.
Occasionally you’ll need aws sso login --profile eb-cli
to get a new token.
- Use
minikube
orkind
. - Make a secrets .env file at
k8s/secrets-k8s.env
with e.g.DJANGO_SECRET_KEY
. kubectl create secret generic holofood-secret --from-env-file=k8s/secrets-k8s.env
minikube image build -t holofood -f k8s/Dockerfile .
kubectl apply -f k8s
kubectl get pods -A
and find the pod ID forholofood-app-...
kubectl exec --stdin --tty holofood-app-......... -- /bin/bash
python manage.py migrate
will make the/app/data/db.sqlite3
minikube service holofood
- EBI operates a two-clusters-per-service policy (primary in "HL" data centre a.k.a. "HH" in some places, fallback in "HX"). The app needs to be deployed to both. There are stub configs in
k8s-hl
andk8s-hx
for these. - K8s cluster configurations are provided as YML files by EBI's webprod team. You need these to deploy.
- Build the image (with some customisation for EBI's NFS filesystem):
docker build -f k8s-hl/Dockerfile -t quay.io/microbiome-informatics/holofood-data-portal:ebi-k8s-hl .
docker push quay.io/microbiome-informatics/holofood-data-portal:ebi-k8s-hl
(you need appropriate Quay credentials for this).- Make a secrets .env file at
k8s-hl/secrets-k8s.env
with e.g.DJANGO_SECRET_KEY=....
.- Push it with e.g.:
kubectl --kubeconfig ~/webprod-configs/mgnify-k8s-team-admin-hh.conf --namespace holofood-hl-prod create secret generic holofood-secret --from-env-file=k8s-hl/secrets-k8s.env
- Push it with e.g.:
- Get authentication credentials for quay.io (the built image is private). You can get a Kubernetes secrets yaml file from your Quay.io user settings, in the "CLI Password" section.
- Download the secrets yaml and name the secret
name: quay-pull-secret
in the metadata section. Put this into thek8s-hl
folder.
- Download the secrets yaml and name the secret
- Deploy:
kubectl --kubeconfig ~/webprod-configs/mgnify-k8s-team-admin-hh.conf apply -f k8s-hl
. If the namespace doesn't exist, you might need to apply twice.
There is an Quarto based documentation pack in the docs/
folder,
configured by docs/_quarto.yml
.
This uses a mixture of Markdown and rendered Jupyter Notebooks. (This choice allows a code-based tutorial to be included in the docs as well as run by users.)
To develop documentation:
Just edit the .qmd
(essentially just Markdown) files, and commit to GitHub.
GitHub Actions will render your changes to the
GitHub Pages site.
(Because there is a .github/workflows/docs.yml
action to do this.)
Install Quarto on your system.
pip install -r requirements-docs.txt
jupyter lab
and edit the .qmd
or .ipynb
files in docs/
.
Run
quarto preview docs
to open a live-preview of the documentation site that updates as you save changes.
If you add a new file (page), add it to the navigation by editing the
website.sidebar.contents
list in docs/_quarto.yml
.
Note: any Jupyter Notebooks will be rendered to the documentation site exactly as you leave them.
This is because
Quarto defaults to not executing Jupyter Notebooks during rendering
which is a good thing.
If you've added any executable Quarto docs other than Jupyter Notebooks (like an R script...) run quarto render docs
and commit the docs/_freeze/
dir.