With the advent of deep learning, drug development can be sped up just by learning the patterns within the molecules regarding their chemical properties and composition. The pursuit of good candidates for drugs can be achieved using the generative model which can extrapolate the unseen molecular structure. In this project, one of the most popular generative models, Generative Adversarial Network
or GAN
, is utilized. The generator of GAN consists of MLP, and the discriminator of GAN consists of R-GCN + MLP. Nowadays, there are plenty of open-sourced datasets that can be used for this purpose such as the QM9 (Quantum Machines 9) dataset
. The GAN model is trained on QM9 dataset and its performances are assessed by means of molecular metrics, i.e., quantitative estimate of druglikeness (QED), solubility (defined as the log octanol-water partition coefficient or logP), synthetizability, natural product, drug candidate, valid, unique, novel, and diversity.
All of the experiments are summed up in this notebook.
The performance of the model through a normally distributed latent vector sample in 6561
runs against the QM9 dataset
is presented below.
Metrics | Score |
---|---|
QED | 0.406 |
Solubility | 0.317 |
Synthetizability | 0.344 |
Natural Product | 0.758 |
Drug Candidate | 0.478 |
Valid | 0.797 |
Unique | 0.033 |
Novel | 0.790 |
Diversity | 0.567 |
GAN's generator and discriminator loss curve in the training process.
Here are some samples of the qualitative results of the model.
The qualitative results of the generated molecules. The chemical structure, the SMILES representation, and the QED scores are provided.
- Quantum Machines 9 Dataset
- Drug Molecule Generation with VAE
- WGAN-GP with R-GCN for the generation of small molecular graphs
- Modeling Relational Data with Graph Convolutional Networks
- (Paper) MolGAN: An implicit generative model for small molecular graphs
- (Code) MolGAN: An implicit generative model for small molecular graphs
- PyTorch-GAN
- RDKit
- PyTorch Lightning