About the Project | Dataset | Getting Started | Usage | Licence | Citation
This repository contains custom pipes and models to classify tables contained in scientific publications in PubMed Open Access, depending on whether they contain pharmacokinetic (PK) parameter estimates from in vivo studies or associated study population characteristic information.
- The main code is found in the root of the repository (see Usage below for more information).
├── annotation guidelines # used by annotators for annotating data in this project
├── configs # config files for training and inference arguments.
├── pk_tableclass # code for data preprocessing, post-processing, and prompt templates.
├── scripts # scripts for model training and inference.
├── .gitignore
├── LICENCE
├── README.md
├── requirements.txt
└── setup.py
The annotated PKTableClassification (PKTC) corpus can be downloaded from zenodo. The data is available under an MIT licence. The code assumes data is located in the data
folder.
To clone the repo:
git clone https://github.com/PKPDAI/PKTabClassifier
To create a suitable environment:
conda create --name PKTabClassifier python==3.9
conda activate PKTabClassifier
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -e .
Using GPU is recommended. Single-GPU training has been tested with:
NVIDIA® GeForce RTX 30 series
cuda 12.2
python scripts/train_xgb_classifier.py \
--path-to-config configs/config.json \
--train-data-path data/train.pkl \
--val-data-path data/validation.pkl \
--model-save-dir trained_models/
python scripts/evaluate_xgb_classifier.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl \
--path-to-trained-model trained_models/best_classifier.pkl
Evaluate zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.
python scripts/evaluate_zero_shot_classifier.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl
Evaluate combined supervised & zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.
python scripts/evaluate_combined_approach.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl \
--path-to-trained-model trained_models/best_classifier.pkl \
--confidence-threshold 0.9
python scripts/inference.py \
--path-to-config configs/config.json \
--path-to-trained-model trained_models/best_classifier.pkl \
--inference-data-path data/inference.pkl \
--confidence-threshold 0.9 \
--batch-size 500
The codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.
See LICENSE for more information.
tbc