Skip to content

Easy data acquisition and PLM fine-tuning for biologists.

License

Notifications You must be signed in to change notification settings

tyang816/ProFactory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# ProFactory

Recent News:

  • Welcome to ProFactory!

✏️ Table of Contents

📑 Features

  • Vaious protein langugae models: ESM2, ESM-b, ESM-1v, ProtBert, ProtT5, Ankh, etc
  • Comprehensive supervised datasets: Localization, Fitness, Solubility, Stability, etc
  • Easy and quick data collector: AlphaFold2 Database, RCSB, InterPro, Uniprot, etc
  • Experiment moitors: Wandb, Local
  • Friendly interface: Gradio UI

🤖 Supported Models

Model Model size Template
ESM2 8M/35M/150M/650M/3B/15B facebook/esm2_t33_650M_UR50D
ESM-1b 650M facebook/esm1b_t33_650M_UR50S
ESM-1v 650M facebook/esm1v_t33_650M_UR90S_1
ProtBert-Uniref100 420M Rostlab/prot_bert_bfd
ProtBert-BFD100 420M Rostlab/prot_bert_bfd
ProtT5-Uniref50 3B/11B Rostlab/prot_t5_xl_uniref50
ProtT5-BFD100 3B/11B Rostlab/prot_t5_xl_bfd
Ankh 450M/1.2B ElnaggarLab/ankh-base

🔬 Supported Training Approaches

Approach Full-tuning Freeze-tuning LoRA SES-Adapter
Pre-Training
Supervised Fine-Tuning

📚 Supported Datasets

Pre-training datasets
Supervised fine-tuning datasets (amino acid sequences/ foldseek sequences/ ss8 sequences)

[!TIP] Only structural sequences are different for the same dataset, for example, DeepLocBinary_ESMFold and DeepLocBinary_AlphaFold2 share the same amino acid sequences, this means if you only want to use the aa_seqs, both are ok!

Supervised fine-tuning datasets (amino acid sequences)

📈 Supported Metrics

Metric Name Full Name Problem Type
accuracy Accuracy single_label_classification/ multi_label_classification
recall Recall single_label_classification/ multi_label_classification
precision Precision single_label_classification/ multi_label_classification
f1 F1Score single_label_classification/ multi_label_classification
mcc MatthewsCorrCoef single_label_classification/ multi_label_classification
auc AUROC single_label_classification/ multi_label_classification
f1_max F1ScoreMax multi_label_classification
spearman_corr SpearmanCorrCoef regression

✈️ Reuirement

Conda Enviroment

Please make sure you have installed Anaconda3 or Miniconda3.

Hardware

We recommend a 24GB RTX 3090 or better, but it mainly depends on which PLM you choose.

🧬 Get Started

Installation

Quick Start

🙌 Citation

Please cite our work if you have used our code or data.


🎊 Acknowledgement

Thanks the support of Liang's Lab.

About

Easy data acquisition and PLM fine-tuning for biologists.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published