Folder Structure:
- exploration: various verions of notebooks and data files used in exploration and model building
- ncbi_crawl: discontinued attempt at building Scrapy crawling service from NCBI database/webpage
- service: stores the dockerfile as well as streamlit UI files
File Description:
Data:
- valid_non_virulent_sequences.csv
- valid_virulent_sequences.csv
- pathogenic_sequences_table.csv
- non_pathogenic_sequences_table.csv
Generated File: virus_analysis.csv
Code:
exploration/virus_exploration_classification_tool.ipynb : Wholistic walk-through of the data exploration, preprocessing, and model building and performance evaluation/comparison.
sequence_preprocessing.py : Performs feature extraction from input nucleotide or protein sequence.
structures.py : Basic bioinformatic structures defined.
style.css: CSS for UI
virus_app.py : UI generation
misclassification_log.json: Log file for misclassified data
Model:
virus_extra_trees_model.joblib : Classifier Model
Docker Files:
Dockerfile : performs docker operations
requirements.txt: specifies library requirements for docker image.
Execution steps:
http://www.inteligems.com/virus-prediction-app
If service is down, go to folder where dockerFile is located using cmd or Powershell
- docker build . -t virus_app
- docker run -p 8501:8501 virus_app:latest
- docker ps : Check the image
- docker tag dockerhubname/imagename:tagname
- docker push dockerhubname/imagename:tagname
Test:
- ui_samples.fasta (Fasta File Upload)
- testingtext.csv (CSV File Upload)