This script, "Pathway_Feature_Identification.py", is designed to analyze microbial genomic data, particularly focusing on identifying pathway features associated with antimicrobial resistance (AMR). It utilizes KEGG pathway data and applies logistic regression for feature identification. This script is designed to process the output files of MicrobeAnnotator, a tool for microbial genome annotation.
The script preprocesses input genomic data and KEGG pathway information to identify pathways associated with AMR. It performs feature selection using logistic regression to determine the significance of each pathway in predicting AMR.
- Python 3.x
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
python Pathway_Feature_Identification.py
Input Files:
Susceptibility Groups File: input/susceptibility_groups.txt
KEGG Pathway Data File: input/KEGG_pathways.tab
Primary Output: output/module_association_plot_corrected_legend.pdf: A PDF plot showing the association of KEGG modules with carbapenem resistance, with a corrected legend.
output/module_association_plot_corrected_legend_coefficients.txt: A text file containing the coefficients of the logistic regression model for each KEGG module.
Susceptibility Groups File (input/susceptibility_groups.txt):
Organism Category
Organism1 Carbapenem-Resistant
Organism2 Carbapenem-Susceptible
...
KEGG Pathway Data File (input/KEGG_pathways.tab):
module name pathway group Organism1 Organism2 ...
M00001 Glycolysis Carbohydrate Metabolism 1 0 ...
M00002 TCA cycle Energy Metabolism 1 1 ...
...
This script is designed for analyzing microbial genomic data, identifying antimicrobial resistance-associated pathways using KEGG data, and applying logistic regression for feature selection.We would like to acknowledge the developers of MicrobeAnnotator for their contribution to microbial genomics research. MicrobeAnnotator is a valuable resource for genome annotation, and its documentation is available in the BMC Bioinformatics article: MicrobeAnnotator: a user-friendly, comprehensive microbial genome annotation pipeline. We acknowledge the developers of KEGG for their valuable resource in pathway analysis. KEGG documentation is available at KEGG website.
If you use this script for your research, please consider citing it as follows:
Sharma, V. (2024). Pathway_Feature_Identification.py [Python script]. Retrieved from https://github.com/vsmicrogenomics/Pathway-Feature-Identification