- Clone this repository and navigate to DrugAssist folder
git clone https://github.com/blazerye/DrugAssist.git
cd DrugAssist
- Install Package
conda create -n drugassist python=3.8 -y
conda activate drugassist
pip install -r requirements.txt
We release the dataset on Hugging Face at blazerye/MolOpt-Instructions, and you can use it for training.
You can use LoRA to finetune Llama2-7B-Chat
model on the MolOpt-Instructions
dataset, the running command is as follows:
sh run_sft_lora.sh
You can merge LoRA weights to generate full model weights using the following command:
python merge_model.py \
--base_model $BASE_MODEL_PATH \
--lora_model $LORA_MODEL_PATH \
--output_dir $OUTPUT_DIR \
--output_type huggingface \
--verbose
Alternatively, you can download our DrugAssist model weights from blazerye/DrugAssist-7B.
You can use gradio to launch web demo by running the following command:
python gradio_service.py \
--base_model $FULL_MODEL_PATH \
--ip $IP \
--port $PORT
In order to deploy DrugAssist model on devices with lower hardware configurations (such as personal laptops without GPUs), we used llama.cpp to perform 4-bit quantization on the DrugAssist-7B model, resulting in the DrugAssist-7B-4bit model. You can use the text-generation-webui tool to load and use this quantized model. For specific methods, please refer to the quantized_model_deploy.md.
After deploying the DrugAssist-7B model, you can refer to the evaluate.md document and run the evaluate script to verify the molecular optimization results.
If you find DrugAssist useful for your research and applications, please cite using this BibTeX:
@article{ye2023drugassist,
title={DrugAssist: A Large Language Model for Molecule Optimization},
author={Ye, Geyan and Cai, Xibao and Lai, Houtim and Wang, Xing and Huang, Junhong and Wang, Longyue and Liu, Wei and Zeng, Xiangxiang},
journal={arXiv preprint arXiv:2401.10334},
year={2023}
}
We appreciate LLaMA, Chinese-LLaMA-Alpaca-2, Alpaca, iDrug and many other related works for their open-source contributions.