Skip to content

MolGrapher: Graph-based Visual Recognition of Chemical Structures

License

Notifications You must be signed in to change notification settings

DS4SD/MolGrapher

Repository files navigation

MolGrapher

Huggingface Huggingface arXiv ICCV

This is the repository for MolGrapher: Graph-based Visual Recognition of Chemical Structures.

MolGrapher

Citation

If you find this repository useful, please consider citing:

@InProceedings{Morin_2023_ICCV,
    author = {Morin, Lucas and Danelljan, Martin and Agea, Maria Isabel and Nassar, Ahmed and Weber, Valery and Meijer, Ingmar and Staar, Peter and Yu, Fisher},
    title = {MolGrapher: Graph-based Visual Recognition of Chemical Structures},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month = {October},
    year = {2023},
    pages = {19552-19561}
}

Publication in ICCV (DOI: https://doi.org/10.1109/iccv51070.2023.01791)

Publication in Arxiv (DOI: https://doi.org/10.48550/arXiv.2308.12234)

Installation

Create a virtual environment.

conda create -n molgrapher python=3.11
conda activate molgrapher

Install MolGrapher and MolDepictor for CPU.

pip install -e .["cpu"]

Install MolGrapher and MolDepictor for GPU. (Tested for x86_64, Linux Ubuntu 20.04, CUDA 11.7, CUDNN 8.4)

pip install -e .["gpu"]

CUDA and CDNN versions can be edited in setup.py.

To install and run MolGrapher using Docker, please refer to README_DOCKER.md.

Model

Models are available on Hugging Face.

wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_gcn_model.ckpt -P ./data/models/graph_classifier/
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_no_stereo_model.ckpt -P ./data/models/graph_classifier/
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_stereo_model.ckpt -P ./data/models/graph_classifier/
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/keypoint_detector/kd_model.ckpt -P ./data/models/keypoint_detector/

After downloading, the folder models from Hugging Face should be placed in: ./data/. Models can be selected by modifying attributes of GraphRecognizer in ./molgrapher/models/graph_recognizer.py (The steps to follow are detailed in this issue).

Inference

Your input images can be placed in the folder: ./data/benchmarks/default/.

bash molgrapher/scripts/annotate/run.sh

Output predictions are saved in: ./data/predictions/default/.

USPTO-30K Benchmark

USPTO-30K is available on Hugging Face.

  • USPTO-10K contains 10,000 clean molecules, i.e. without any abbreviated groups.
  • USPTO-10K-abb contains 10,000 molecules with superatom groups.
  • USPTO-10K-L contains 10,000 clean molecules with more than 70 atoms.

Synthetic Dataset

The synthetic dataset is available on Hugging Face. Images and graphs are generated using MolDepictor.

Training

To train the keypoint detector:

python3 ./molgrapher/scripts/train/train_keypoint_detector.py

To train the node classifier:

python3 ./molgrapher/scripts/train/train_graph_classifier.py