Skip to content

PKPDAI/PKTabClassifier

Repository files navigation

PKTabClassifier

License version

About the Project | Dataset | Getting Started | Usage | Licence | Citation

About the Project

This repository contains custom pipes and models to classify tables contained in scientific publications in PubMed Open Access, depending on whether they contain pharmacokinetic (PK) parameter estimates from in vivo studies or associated study population characteristic information.

Project Structure

  • The main code is found in the root of the repository (see Usage below for more information).
├── annotation guidelines # used by annotators for annotating data in this project
├── configs # config files for training and inference arguments. 
├── pk_tableclass # code for data preprocessing, post-processing, and prompt templates.
├── scripts  # scripts for model training and inference.
├── .gitignore
├── LICENCE
├── README.md
├── requirements.txt
└── setup.py

Built With

Python v3.9

Dataset

The annotated PKTableClassification (PKTC) corpus can be downloaded from zenodo. The data is available under an MIT licence. The code assumes data is located in the data folder.

Getting Started

Installation

To clone the repo:

git clone https://github.com/PKPDAI/PKTabClassifier

To create a suitable environment:

  • conda create --name PKTabClassifier python==3.9
  • conda activate PKTabClassifier
  • conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
  • pip install -e .

GPU Support

Using GPU is recommended. Single-GPU training has been tested with:

  • NVIDIA® GeForce RTX 30 series
  • cuda 12.2

Usage

Train the supervised classifier pipeline:

python scripts/train_xgb_classifier.py \
--path-to-config configs/config.json \
--train-data-path data/train.pkl \
--val-data-path data/validation.pkl \
--model-save-dir trained_models/

Evaluate the supervised classifier:

python scripts/evaluate_xgb_classifier.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl \
--path-to-trained-model trained_models/best_classifier.pkl

Evaluate zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

python scripts/evaluate_zero_shot_classifier.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl

Evaluate combined supervised & zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

python scripts/evaluate_combined_approach.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl \
--path-to-trained-model trained_models/best_classifier.pkl \
--confidence-threshold 0.9

Inference with the supervised classifier:

python scripts/inference.py \
--path-to-config configs/config.json \
--path-to-trained-model trained_models/best_classifier.pkl \
--inference-data-path data/inference.pkl \
--confidence-threshold 0.9 \
--batch-size 500

License

The codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

See LICENSE for more information.

Citation

tbc 

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages