Contextualization of protein-protein interaction databases by cell line
$ git clone https://github.com/montilab/ppi-context
$ cd ppi-context
$ pip install -r requirements.txt
If you just want the data it’s easy to load into R…
$ R
ppi <- read.delim("data/v_1_00/PPI-Context.txt", header=TRUE, sep="\t", stringsAsFactors=FALSE)
data.frame(sort(table(ppi$cell_name), decreasing=TRUE)) %>%
set_colnames(c("var", "freq")) %>%
head(30) %>%
ggbarplot(x="var", y="freq", fill="freq") +
labs(title="", x="Cell Line Name", y="PPI") +
scale_fill_viridis_c(option="inferno", begin=0, end=0.8) +
theme(legend.position="none",
axis.text.x=element_text(angle=45, hjust=1, size=12, face="bold"))
| PPI - Context (v1.0)
usage: ppictx.py [-h] [-r] [-d]
[-fh PATH_HIPPIE]
[-fp PATH_PUBTATOR]
[-fc PATH_CELLOSAURUS]
optional arguments:
-h, --help show this help message and exit
-r, --run run pipeline
-d, --download download raw data first
-fh PATH_HIPPIE path to downloaded Hippie data (optional)
-fp PATH_PUBTATOR path to downloaded Pubtator data (optional)
-fc PATH_CELLOSAURUS path to downloaded Cellosaurus data (optional)
In most cases you will need to download the latest bulk data first and then process it…
$ python ppictx.py --download --run
| PPI - Context (v1.0)
| Downloading raw data...
| Processing raw data
~ [PPI]
~ [PID -> CLA]
~ [CLA -> CID]
~ [PPI -> PID -> CLA -> CID]
In other cases, you might have the previous versions of the data to process…
$ python ppictx.py --run \
-fh path/to/HIPPIE.mitab \
-fp path/to/PUBTATOR.gz \
-fc path/to/CELLOSAURUS.txt
-
Cell lines that are primarily used in research due to their efficiency as an expression vector (e.g. HeLa, HEK, CHO, Sf9) may not be useful representations of cell-specific protein dynamics. However it may be useful to filter out PPIs annotated with these cell lines.
-
Cellosaurus contains synonymous cell lines, therefore some annotations such as HEK (CVCL_M624) and HEK293 (CVCL_0045) refer to the same cell line. Users should be aware of synonymous cell lines relevant to their interests and filter accordingly.
Federico A, Monti S (2021) Contextualized Protein-Protein Interactions. Patterns. https://doi.org/10.1016/j.patter.2020.100153.