CT-ADE

CT-ADE: An Evaluation Benchmark for Adverse Drug Event Prediction from Clinical Trial Results

Citation

@article{yazdani2024ct,
  title={CT-ADE: An Evaluation Benchmark for Adverse Drug Event Prediction from Clinical Trial Results},
  author={Yazdani, Anthony and Bornet, Alban and Zhang, Boya and Khlebnikov, Philipp and Amini, Poorya and Teodoro, Douglas},
  journal={arXiv preprint arXiv:2404.12827},
  year={2024}
}

Developed with

Operating System: Ubuntu 22.04.3 LTS
- Kernel: Linux 4.18.0-513.18.1.el8_9.x86_64
- Architecture: x86_64
Python:
- 3.10.12

Prerequisites

Set up your environment and install the necessary Python libraries as specified in requirements.txt. Note that you will need to install the development versions of certain libraries from their respective Git repositories.
Place your unzipped MedDRA files in the directory ./data/MedDRA_25_0_English and your DrugBank XML database in the directory ./data/drugbank.

Ensure you clone and install the following libraries directly from their Git repositories for the development versions:

Repository Structure

.
├── a0_download_clinical_trials.py
├── a1_extract_completed_or_terminated_interventional_results_clinical_trials.py
├── a2_extract_and_preprocess_monopharmacy_clinical_trials.py
├── b0_download_pubchem_cids.py
├── b1_download_pubchem_cid_details.py
├── c0_extract_drugbank_dbid_details.py
├── d0_extract_chembl_approved_CHEMBL_details.py
├── data
│   ├── MedDRA_25_0_English
│   │   └── empty.null
│   ├── chembl_approved
│   │   └── empty.null
│   ├── chembl_usan
│   │   └── empty.null
│   ├── clinicaltrials_gov
│   │   └── empty.null
│   ├── drugbank
│   │   └── empty.null
│   └── pubchem
│       └── empty.null
├── e0_extract_chembl_usan_CHEMBL_details.py
├── f0_create_unified_chemical_database.py
├── g0_create_ct_ade_raw.py
├── g1_create_ct_ade_meddra.py
├── g2_create_ct_ade_classification_datasets.py
├── modeling
│   ├── DLLMs
│   │   ├── config.py
│   │   ├── custom_metrics.py
│   │   ├── model.py
│   │   ├── train.py
│   │   └── utils.py
│   └── GLLMs
│       ├── config-llama3.py
│       ├── config-meditron.py
│       ├── config-openbiollm.py
│       ├── config.py
│       ├── train_S.py
│       ├── train_SG.py
│       └── train_SGE.py
├── requirements.txt
└── src
    └── meddra_graph.py

Download Publically Available CT-ADE-SOC and CT-ADE-PT Versions

You can download the publicly available CT-ADE-SOC and CT-ADE-PT versions from HuggingFace. These datasets contain standardized annotations from ClinicalTrials.gov:

The above datasets are identical to the SOC and PT versions you will produce in the Typical Pipeline from Checkpoint section.

Typical Pipeline from Checkpoint

Follow this procedure if you aim to recreate the dataset detailed in our paper for all levels (SOC, HLGT, HLT, and PT).

1. Place your licensed data

Place your unzipped MedDRA files in the directory ./data/MedDRA_25_0_English and your DrugBank XML database in the directory ./data/drugbank.

2. Download checkpoint from HuggingFace

Download chembl_approved, chembl_usan, clinicaltrials_gov, pubchem files and place them accordingly.

3. Extract DrugBank DBID Details

Extract drug details from the DrugBank database.

python c0_extract_drugbank_dbid_details.py

4. Create Unified Chemical Database

Create a unified database combining information from PubChem, DrugBank, and ChEMBL.

python f0_create_unified_chemical_database.py

5. Create Raw CT-ADE Dataset

Generate the raw CT-ADE dataset from the processed clinical trials data.

python g0_create_ct_ade_raw.py

6. Create MedDRA Annotations

Annotate the CT-ADE dataset with MedDRA terms.

python g1_create_ct_ade_meddra.py

7. Create Classification Datasets

Generate the final classification datasets for modeling.

python g2_create_ct_ade_classification_datasets.py

Training Models

Discriminative Models (DLLMs)

Navigate to the modeling/DLLMs directory and run the training scripts with the desired configuration.

cd modeling/DLLMs

For single-GPU training, use this command:

export CUDA_VISIBLE_DEVICES="0"; \
export MIXED_PRECISION="bf16"; \
FIRST_GPU=$(echo $CUDA_VISIBLE_DEVICES | cut -d ',' -f 1); \
BASE_PORT=29500; \
PORT=$(( $BASE_PORT + $FIRST_GPU )); \
accelerate launch \
--mixed_precision=$MIXED_PRECISION \
--num_processes=$(( $(echo $CUDA_VISIBLE_DEVICES | grep -o "," | wc -l) + 1 )) \
--num_machines=1 \
--dynamo_backend=no \
--main_process_port=$PORT \
train.py

For multi-GPU training, use this command:

export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"; \
export MIXED_PRECISION="bf16"; \
FIRST_GPU=$(echo $CUDA_VISIBLE_DEVICES | cut -d ',' -f 1); \
BASE_PORT=29500; \
PORT=$(( $BASE_PORT + $FIRST_GPU )); \
accelerate launch \
--mixed_precision=$MIXED_PRECISION \
--num_processes=$(( $(echo $CUDA_VISIBLE_DEVICES | grep -o "," | wc -l) + 1 )) \
--num_machines=1 \
--dynamo_backend=no \
--main_process_port=$PORT \
train.py

Generative Models (GLLMs)

Navigate to the modeling/GLLMs directory and run the training scripts for different configurations.

cd modeling/GLLMs

Example configurations for LLama3, OpenBioLLM, and Meditron are provided in the folder. You can copy the desired configuration into config.py and adjust it to your convenience. Next, you can execute the following for the SGE configuration:

python train_SGE.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CT-ADE

Citation

Developed with

Prerequisites

Repository Structure

Download Publically Available CT-ADE-SOC and CT-ADE-PT Versions

Typical Pipeline from Checkpoint

1. Place your licensed data

2. Download checkpoint from HuggingFace

3. Extract DrugBank DBID Details

4. Create Unified Chemical Database

5. Create Raw CT-ADE Dataset

6. Create MedDRA Annotations

7. Create Classification Datasets

Training Models

Discriminative Models (DLLMs)

Generative Models (GLLMs)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
data		data
modeling		modeling
src		src
LICENSE		LICENSE
README.md		README.md
a0_download_clinical_trials.py		a0_download_clinical_trials.py
a1_extract_completed_or_terminated_interventional_results_clinical_trials.py		a1_extract_completed_or_terminated_interventional_results_clinical_trials.py
a2_extract_and_preprocess_monopharmacy_clinical_trials.py		a2_extract_and_preprocess_monopharmacy_clinical_trials.py
b0_download_pubchem_cids.py		b0_download_pubchem_cids.py
b1_download_pubchem_cid_details.py		b1_download_pubchem_cid_details.py
c0_extract_drugbank_dbid_details.py		c0_extract_drugbank_dbid_details.py
d0_extract_chembl_approved_CHEMBL_details.py		d0_extract_chembl_approved_CHEMBL_details.py
e0_extract_chembl_usan_CHEMBL_details.py		e0_extract_chembl_usan_CHEMBL_details.py
f0_create_unified_chemical_database.py		f0_create_unified_chemical_database.py
g0_create_ct_ade_raw.py		g0_create_ct_ade_raw.py
g1_create_ct_ade_meddra.py		g1_create_ct_ade_meddra.py
g2_create_ct_ade_classification_datasets.py		g2_create_ct_ade_classification_datasets.py
requirements.txt		requirements.txt

License

ds4dh/CT-ADE

Folders and files

Latest commit

History

Repository files navigation

CT-ADE

Citation

Developed with

Prerequisites

Repository Structure

Download Publically Available CT-ADE-SOC and CT-ADE-PT Versions

Typical Pipeline from Checkpoint

1. Place your licensed data

2. Download checkpoint from HuggingFace

3. Extract DrugBank DBID Details

4. Create Unified Chemical Database

5. Create Raw CT-ADE Dataset

6. Create MedDRA Annotations

7. Create Classification Datasets

Training Models

Discriminative Models (DLLMs)

Generative Models (GLLMs)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages