This is the code for the AAAI 2024 Paper: Text-Guided Molecule Generation with Diffusion Language Model.
- Install Package
cd TGMDLMCODE; pip install -e improved-diffusion/; pip install -e transformers/
. - Download Scibert and put it into file
scibert
.
cd improved-diffusion; cd scripts
- Encode text input
python process_text.py -i train_val_256; python process_text.py -i test
- Train model for Phase One:
python train.py
- Train model for Phase Two:
python train_correct_withmask.py
The following details are important for you to know before you actually train this model by yourself!
- For this model it always needs more than 100,000 steps of training before sampling so you can get a normal result. The perfomance converges long after the convergence of the loss.
- The loss will finally converges to around 0.015 (This value depends on the amount of trainable parameters, 0.015 is for the model in this code. Within reasonble range, bigger the model, smaller the loss). It is possible that the loss in your experiment will not converge to 0.015 (below 0.02) and stuck at a relative high value (such as 0.08), we suggest you re-run the training procedure with another random seed. Normally, the loss should converge really quickly to below 0.03 within 15,000 steps. If your loss doesn't behave so, just try another time :) (Thanks to @YHanJG who report this problem)
We havn't got any idea why this problem will show up. I did observe once that the loss stuck at a high value, and another researcher reach to me after running my code and report this problem also.This problem should be fixed after we half the learning rate :)
python text_sample.py; python post_sample.py
The final fileOURMODEL_OUTPUT.txt
is our output.
you can evaluate all metrics except for Text2Mol by runnning ev.py
. For Text2Mol please go to MolT5 for more details.
- python3
- pytorch 2.0
- transformers (Be careful to follow the readme installation exactly.)
Please cite our paper if you use the code:
@article{gong2024text,
title={Text-Guided Molecule Generation with Diffusion Language Model},
author={Gong, Haisong and Liu, Qiang and Wu, Shu and Wang, Liang},
volume={38},
url={https://ojs.aaai.org/index.php/AAAI/article/view/27761},
DOI={10.1609/aaai.v38i1.27761},
number={1},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2024},
month={Mar.},
pages={109-117}
}
This code is based on https://github.com/XiangLi1999/Diffusion-LM and https://github.com/blender-nlp/MolT5