A Python package for benchmarking pathway databases with functional enrichment and prediction methods tasks.
If you find pathway_forte
useful for your work, please consider citing:
[1] | Mubeen, S., et al (2019). The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling. Front. Genet., 10:1203. |
pathway_forte
can be installed from PyPI
with the following command in your terminal:
$ python3 -m pip install pathway_forte
The latest code can be installed from GitHub with:
$ python3 -m pip install git+https://github.com/pathwayforte/pathway-forte.git
For developers, the code can be installed with:
$ git clone https://github.com/pathwayforte/pathway-forte.git
$ cd pathway-forte
$ python3 -m pip install -e .
The table below lists the main commands of PathwayForte.
Command | Action |
---|---|
datasets | Lists of Cancer Datasets |
export | Export Gene Sets using ComPath |
ora | List of ORA Analyses |
fcs | List of FCS Analyses |
prediction | List of Prediction Methods |
- ora. Lists Over-Representation Analyses (e.g., one-tailed hyper-geometric test).
- fcs. Lists Functional Class Score Analyses such as GSEA and ssGSEA using GSEAPy.
pathway_forte
enables three classification methods (i.e., binary classification, training SVMs for
multi-classification tasks, or survival analysis) using individualized pathway activity scores. The scores can be
calculated from any pathway with a variety of tools (see [2]) using any pathway database that enables to export its
gene sets.
- binary. Trains an elastic net model for a binary classification task (e.g., tumor vs. normal patients). The training is conducted using a nested cross validation approach (the number of cross validation in both loops can be selected). The model used can be easily changed since most of the models in scikit-learn (the machine learning library used by this package) required the same input.
- subtype. Trains a SVM model for a multi-class classification task (e.g., predict tumor subtypes). The training is conducted using a nested cross validation approach (the number of cross validation in both loops can be selected). Similarly as the previous classification task, other models can quickly be implemented.
- survival. Trains a Cox's proportional hazard's model with elastic net penalty. The training is conducted using a nested cross validation approach with a grid search in the inner loop. This analysis requires pathway activity scores, patient classes and lifetime patient information.
- export. Export GMT files with current gene sets for the pathway databases included in ComPath [3].
- datasets. Lists the TCGA data sets [4] that are ready to run in
pathway_forte
.
[2] | Lim, S., et al. (2018). Comprehensive and critical evaluation of individualized pathway activity measurement tools on pan-cancer data. Briefings in bioinformatics, bby125. |
[3] | Domingo-Fernández, D., et al. (2018). ComPath: An ecosystem for exploring, analyzing, and curating mappings across pathway databases. npj Syst Biol Appl., 4(1):43. |
[4] | Weinstein, J. N., et al. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics, 45(10), 1113. |
The Pathway Forte logo is derived from "Muscle Fat" by Lorc, used under CC BY 3.0.
PathForte is a scientific software that has been developed in an academic capacity, and thus comes with no warranty or guarantee of maintenance, support, or back-up of data.