PKTabClassifier

About the Project

This repository contains custom pipes and models to classify tables contained in scientific publications in PubMed Open Access, depending on whether they contain pharmacokinetic (PK) parameter estimates from in vivo studies or associated study population characteristic information.

Project Structure

The main code is found in the root of the repository (see Usage below for more information).

├── annotation guidelines # used by annotators for annotating data in this project
├── configs # config files for training and inference arguments. 
├── pk_tableclass # code for data preprocessing, post-processing, and prompt templates.
├── scripts  # scripts for model training and inference.
├── .gitignore
├── LICENCE
├── README.md
├── requirements.txt
└── setup.py

Built With

Dataset

The annotated PKTableClassification (PKTC) corpus can be downloaded from zenodo. The data is available under an MIT licence. The code assumes data is located in the data folder.

Getting Started

Installation

To clone the repo:

git clone https://github.com/PKPDAI/PKTabClassifier

To create a suitable environment:

conda create --name PKTabClassifier python==3.9
conda activate PKTabClassifier
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -e .

GPU Support

Using GPU is recommended. Single-GPU training has been tested with:

NVIDIA® GeForce RTX 30 series
cuda 12.2

Usage

Train the supervised classifier pipeline:

python scripts/train_xgb_classifier.py \
--path-to-config configs/config.json \
--train-data-path data/train.pkl \
--val-data-path data/validation.pkl \
--model-save-dir trained_models/

Evaluate the supervised classifier:

python scripts/evaluate_xgb_classifier.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl \
--path-to-trained-model trained_models/best_classifier.pkl

Evaluate zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

python scripts/evaluate_zero_shot_classifier.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl

Evaluate combined supervised & zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

python scripts/evaluate_combined_approach.py \
--path-to-config configs/config.json \
--test-data-path data/test.pkl \
--path-to-trained-model trained_models/best_classifier.pkl \
--confidence-threshold 0.9

Inference with the supervised classifier:

python scripts/inference.py \
--path-to-config configs/config.json \
--path-to-trained-model trained_models/best_classifier.pkl \
--inference-data-path data/inference.pkl \
--confidence-threshold 0.9 \
--batch-size 500

License

The codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

See LICENSE for more information.

Citation

tbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PKTabClassifier

About the Project

Project Structure

Built With

Dataset

Getting Started

Installation

GPU Support

Usage

Train the supervised classifier pipeline:

Evaluate the supervised classifier:

Evaluate zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

Evaluate combined supervised & zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

Inference with the supervised classifier:

License

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
annotation_guidelines		annotation_guidelines
configs		configs
pk_tableclass		pk_tableclass
scripts		scripts
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

PKPDAI/PKTabClassifier

Folders and files

Latest commit

History

Repository files navigation

PKTabClassifier

About the Project

Project Structure

Built With

Dataset

Getting Started

Installation

GPU Support

Usage

Train the supervised classifier pipeline:

Evaluate the supervised classifier:

Evaluate zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

Evaluate combined supervised & zero-shot classification. Please note to run this script you will need aan OpenAI API key and organization key which you will need to add to the config file.

Inference with the supervised classifier:

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages