This project aims to create a network out of a protein structure stored in the standard PDB format.
The proximity network (in this context) of a protein is defined as a graph
First clone this repository locally & enter it:
git clone https://github.com/raysas/protein-structure-proximity-network.git
cd protein-structure-proximity-network
Install the following dependencies through pip:
- Biopython
- NetworkX
- Pyvis
- Pandas
- Numpy
- Seaborn
- Matplotlib
pip install -r requirements.txt
The workflow takes from a use a pdb id to generate the network. All that has to be done is to run the following command on the terminal:
python code/generate_network.py <pdb_id>
Or if you wish to set the distance threshold to a different value, you can run:
python code/generate_network.py <pdb_id> <distance_threshold>
Make sure you input a valid PDB id. Check here for more info. You can also provide a pdb file path instead of an id, under testing - succesful trials so far.
If you wish you can run each function from extracting the pdb file to visualzing tee network by importing the python module and calling the functions.
import sys
sys.path.append('<path-to-this-repo>/code')
from generate_network import *
You can also run the code on a Jupyter notebook (this file) on Google Colab. Either add this repository to you google drive or connect it to github (might give issues without saving the PDB file anywhere).
p.s. this is a work in progress and the code is not yet optimized. More features to be added soon listed in the to-do list
In the data folder you will find a new directory of named by pdb_id after running
data
└── pdb_id
├── output.log
├──pdb_id.pdb
├──pdb_id_contact_map.png
├──pdb_id_t_network.graphml
└──pdb_id_t_network_viz.html
The output.log file contains the log of the process. The pdb_id.pdb file is the PDB structure of the protein. The pdb_id_contact_map.png is the contact map of the protein. The pdb_id_t_network.graphml is the network in graphml format. The pdb_id_t_network_viz.html is the interactive visualization of the network generated from pyvixz
.
Example run: building a network for 6xdc
python code/generate_network.py 6xdc
The following changes will be made in the data folder:
After retrieving the PDB structure, it will extract residues and coordinates information to generate a contact map. The contact map is a heatmap of the distance between residues. Note that the coordinates for each residues are defined by the coordinates of their alpha carbon.
The network is then generated (by default threshold distance=8) and saved in graphml format. The network is also visualized in an interactive html file. The layout is set based on the x and y coordinates for each residues (plane z=0).
Subsequent analysis and visualization using softwares and tools like networkx, PyG, Gephi and CytoScape can be done out of the generated .graphml
network file.
Recent works have been using protein structural networks to apply deep learning teachniques like Graph Neural Networks (GNNs) and Graph attention Networks (GATs) (nature paper reference).
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class implemented in Python. Bioinformatics 19: 2308–2310
Hagberg, A., Swart, P. J., & Schult, D. A. (2008). Exploring network structure, dynamics, and function using NetworkX (No. LA-UR-08-05495; LA-UR-08-5495). Los Alamos National Laboratory (LANL), Los Alamos, NM (United States).
This repository is licensed under the MIT License, contributions are welcome!
Don't forget to cite this repository if you use it in your work:
@software{Adam_Protein_structure_proximity,
author = {Adam, Rayane},
license = {MIT},
title = {{Protein structure proximity network generator}},
url = {https://github.com/raysas/protein-structure-proximity-network},
version = {1.0.0}
}