From 4eb0414ff9e6003ebd28db001927ecfdb81a6e82 Mon Sep 17 00:00:00 2001 From: "Samuel E. Miller" Date: Fri, 6 Sep 2024 16:18:13 -0500 Subject: [PATCH] add setup section --- anvio/docs/programs/anvi-reaction-network.md | 44 +++++++++++++++----- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/anvio/docs/programs/anvi-reaction-network.md b/anvio/docs/programs/anvi-reaction-network.md index 9f36880fb5..907ace8fa0 100644 --- a/anvio/docs/programs/anvi-reaction-network.md +++ b/anvio/docs/programs/anvi-reaction-network.md @@ -1,25 +1,49 @@ This program **stores a metabolic %(reaction-network)s in a %(contigs-db)s or %(pan-db)s.** -The network consists of data on biochemical reactions predicted to be encoded by the genome or pangenome, referencing the [KEGG Orthology (KO)](https://www.genome.jp/kegg/ko.html) and [ModelSEED Biochemistry](https://github.com/ModelSEED/ModelSEEDDatabase) databases. +The network consists of data on biochemical reactions predicted to be encoded by the genome or pangenome. -Information on the predicted reactions and the involved metabolites are stored in two tables of the %(contigs-db)s or %(pan-db)s. The program, %(anvi-get-metabolic-model-file)s, can be used to export the %(reaction-network)s from the database to a %(reaction-network-json)s file formatted for flux balance analysis. +Information on the predicted reactions and the involved metabolites are stored in tables of the %(contigs-db)s or %(pan-db)s. The program, %(anvi-get-metabolic-model-file)s, can be used to export the %(reaction-network)s from the database to a %(reaction-network-json)s file formatted for input into programs for flux balance analysis. -## Usage +## Setup -%(anvi-reaction-network)s takes a either a %(contigs-db)s OR a %(pan-db)s and %(genomes-storage-db)s as required input. Genes stored within the %(contigs-db)s or %(genomes-storage-db)s must have KO protein annotations, which can be assigned by %(anvi-run-kegg-kofams)s. +%(anvi-setup-kegg-data)s downloads [binary relations files](https://www.genome.jp/brite/br08906) needed to construct a %(reaction-network)s from [KEGG Orthology (KO)](https://www.genome.jp/kegg/ko.html) sequence annotations. Make sure to run that program with the `--kegg-snapshot` option to use the newest snapshot of %(kegg-data)s, [`v2024-08-30`](https://figshare.com/articles/dataset/KEGG_build_2024-08-30/26880559?file=48903154), which includes binary relations files. + +{{ codestart }} +anvi-setup-kegg-data --kegg-snapshot v2024-08-30 +{{ codestop }} -The KO and ModelSEED Biochemistry databases must be set up and available to the program. By default, these are expected to be set up in default anvi'o data directories. %(anvi-setup-kegg-data)s and %(anvi-setup-modelseed-database)s must be run to set up these databases. +%(anvi-setup-modelseed-database)s sets up the ModelSEED Biochemistry database, which harmonizes biochemical data from various reference databases, including KEGG. {{ codestart }} -anvi-reaction-network -c /path/to/contigs-db +anvi-setup-modelseed-database {{ codestop }} -Custom locations for the reference databases can be provided with the flags, `--ko-dir` and `--modelseed-dir`. +### Download newest available KEGG files + +Alternatively, KEGG data including binary relations files can be set up not from a snapshot but by downloading the newest files available from KEGG using the `-D` flag. In the following command, a higher number of download threads than the default of 1 is provided by `-T`, which significantly speeds up downloading. {{ codestart }} -anvi-reaction-network -c /path/to/contigs-db \ - --ko-dir /path/to/set-up/ko-dir \ - --modelseed-dir /path/to/set-up/modelseed-dir +anvi-setup-kegg-data -D -T 5 +{{ codestop }} + +### Install in non-default location + +At the moment, KEGG data that includes binary relations files does _not_ include "stray" KOs (see %(anvi-setup-kegg-data)s) due to changes in the available model files. To preserve KEGG data that you already have set up, for this reason or another, the new snapshot or download can be placed in a non-default location using the option, `--kegg-data-dir`. + +{{ codestart }} +anvi-setup-kegg-data --kegg-snapshot v2024-08-30 --kegg-data-dir path/to/other/directory +{{ codestop }} + +`anvi-reaction-network` requires a `--kegg-dir` argument to seek KEGG data in a non-default location. + +Likewise, different versions of the ModelSEED Biochemistry database can be set up in non-default locations and used with the `--modelseed-dir` argument. + +## Usage + +%(anvi-reaction-network)s takes a either a %(contigs-db)s OR a %(pan-db)s and %(genomes-storage-db)s as required input. Genes stored within the %(contigs-db)s or %(genomes-storage-db)s must have KO protein annotations, which can be assigned by %(anvi-run-kegg-kofams)s. + +{{ codestart }} +anvi-reaction-network -c /path/to/contigs-db {{ codestop }} If a %(contigs-db)s already contains a %(reaction-network)s from a previous run of this program, the flag `--overwrite-existing-network` can overwrite the existing network with a new one. For example, if %(anvi-run-kegg-kofams)s is run again on a database using a newer version of KEGG, then %(anvi-reaction-network)s should be rerun to update the %(reaction-network)s derived from the KO annotations.