| LLMs4OL Paradigm | Task A: Term Typing | Task B: Type Taxonomy Discovery | Task C: Type Non-Taxonomic Relation Extraction | Finetuning | Task A Detailed Results | Task B Detailed Results | Task C Detailed Results | Task A Datasets | Task B Datasets | Task C Datasets | Finetuning Datasets |
Hamed Babaei Giglou, Jennifer D'Souza, and Sören Auer
{hamed.babaei, jennifer.dsouza, auer}@tib.eu
TIB Leibniz Information Center for Science and Technology, Hannover, Germany
Accepted for publication at ISWC 2023 - Research Track
The LLMs4OL Challenge consists of 3 tasks:
- Task A - Term Typing: Discover the generalized type for a lexical term.
- Task B - Taxonomy Discovery: Discover the taxonomic hierarchy between type pairs.
- Task C - Non-Taxonomic Relation Extraction: Identify non-taxonomic, semantic relations between types.
More information can be found on Challenge Website and Challenge Github page and challenge Codalab page.
The deadline is July 18, 2024.
Ontology Learning (OL) addresses the challenge of knowledge acquisition and representation in a variety of domains. Recent advances in NLP and the emergence of Large Language Models, which have shown a capability to be good at crystallizing knowledge and patterns from vast text sources, we introduced the LLMs4OL: Large Language Models for Ontology Learning paradigm as an empirical study of LLMs for automated construction of ontologies from various domains. The LLMs4OL paradigm tests Does the capability of LLMs to capture intricate linguistic relationships translate effectively to OL, given that OL mainly relies on automatically extracting and structuring knowledge from natural language text?.
- Repository Structure
- LLMs4OL Paradigm
- LLMs4OL Paradigm Setups
- Experiments
- Results Overview
- How to run tasks
- Citation
.
└── LLMs4OL <- root directory of the repository
├── tuning <- Few-Shot finetuning directory
│ └── ...
├── TaskA <- Term Typing task directory
│ └── ...
├── TaskB <- Type Taxonomy Discovery task directory
│ └── ...
├── TaskC <- Type Non-Taxonomic Relation Extraction task directory
│ └── ...
├── assets <- artifacts directory
│ ├── LLMs <- contains pretrained LLMs
│ ├── FSL <- contains fine-tuned LLMs (for training you should create this)
│ ├── WordNetDefinitions <- contains wordnet word definitions
│ └── CountryCodes <- GeoNames country codes
├── datasets <- contains datasets
│ ├── FSL <- contains few-shot learning training datasets
│ ├── TaskA <- contains directories for task A sources
│ ├── TaskB <- contains directories for task B sources
│ └── TaskC <- contains directories for task C sources
├── docs <- contains supplementary documents
│ └── Supplementary-Material.pdf <- contains directories for task C sources
├── images <- contains the figures
├── README.md <- README file for documenting the service.
└── requirements.txt <- contains Python requirements listed
The LLMs4OL paradigm offers a conceptual framework to accelerate the automated construction of ontologies exclusively by domain experts. OL tasks are based on the ontology primitives which consist of:
- Corpus preparation – selecting and collecting the source texts to build the ontology.
- Terminology extraction – identifying and extracting relevant terms from the source text.
- Term typing – grouping similar terms as conceptual types.
- Taxonomy construction – identifying the “is-a” hierarchies between types.
- Relationship extraction – identifying and extracting “non-is-a” or semantic relationships between types
- Axiom discovery – discovering constraints and inference rules for the ontology
Toward realizing LLMs4OL, we empirically ground three core tasks of OL leveraging LLMs as a foundational basis for future work. They are presented as:
- Term Typing
- Type Taxonomy Discovery
- Type Non-Taxonomic Relation Extraction
The LLMs4OL task paradigm is an end-to-end conceptual framework for learning ontologies in different knowledge domains with the aim of automation of ontology learning.
The tasks within the blue arrow (in Figure-1) are the three OL tasks empirically validated. For each task, we created a directory with a detailed description of the task information as follows:
To comprehensively assess LLMs for the three OL tasks we cover a variety of ontological knowledge domain sources, i.e. lexicosemantics – WN18RR (WordNet), geography – GeoNames, biomedicine – NCI, MEDICIN, SNOMEDCT_US, and web content types – Schema.Org. These sources are different for each task, so for each task, the detailed information is available as follows:
- Task A. Term Typing Datasets: GeoNames, NCI, MEDICIN, SNOMEDCT_US, and WN18RR
- Task B. Type Taxonomy Discovery Datasets: GeoNames, Schema.Org, and UMLS
- Task C. Type Non-Taxonomic Relation Extraction Datasets: UMLS
The evaluation metric for Task A is reported as the mean average precision at k (MAP@K), where k = 1, And evaluations for Tasks B and C are reported in terms of the standard F1-score based on precision and recall. Complete and detailed results for tasks are presented in the following tables:
- Task A. Term Typing Detailed Results Table
- Task B. Type Taxonomy Discovery Detailed Results Table
- Task C. Type Non-Taxonomic Relation Extraction Detailed Results Table
We created experimentations using five different LMs. These LMs described as followings:
- Encoder-Only:
- BERT-Large with 340M parameters
- PubMedBERT with 340M parameters
- Encoder-Decoder:
- BART-Large with 400M parameters
- Flan-T5-Large with 780M parameters
- Flan-T5-XL with 3B parameters
- Decoder-Only:
First we created prompt templates based on existing experimental language models and their nature -- specifically for tasks A and B we created 8 templates per source, and for task C only a single template --. Next, we probe LMs as zero-shot testing. More later we attempt to boost the performance of two LLMs (Flan-T5-Large and Flan-T5-XL) in the form of few-shot learning using predefined prompt templates (different than zero-shot testing) and we test the model using zero-shot testing prompt templates.
Prompt templates for zero-shot testing are represented as follows:
Dataset | Task | prompt templates path | answer set mapper path |
---|---|---|---|
WN18RR | A | datasets/TaskA/WN18RR/templates.json |
datasets/TaskA/WN18RR/label_mapper.json |
GeoNames | A | datasets/TaskA/Geonames/templates.json |
datasets/TaskA/Geonames/label_mapper.json |
NCI, MEDICIN, SNOMEDCT_US | A | datasets/TaskA/UMLS/templates.json |
datasets/TaskA/UMLS/label_mapper.json |
Schema.Org, UMLS, GeoNames | B | datasets/TaskB/templates.txt |
datasets/TaskB/label_mapper.json |
UMLS | C | datasets/TaskC/templates.txt |
datasets/TaskC/label_mapper.json |
Prompt templates for training model is represented as follows:
Dataset | Task | prompt templates path |
---|---|---|
WN18RR, UMLS (NCI only), GeoNames, Schema.Org | A, B, C | tuning/templates.py |
Software Requirements:
- Python 3.9
requirements.txt
libraries
Instructions:
First, install the conda using conda installation guideline, and then create and activate your environments as follows:
conda create -n yourenvname python=3.9
conda activate yourenvname
Next, clone the repository and install the requirements from requirements.txt
in your environments:
git clone https://github.com/HamedBabaei/LLMs4OL.git
cd LLMs4OL
pip install -r requirements.txt
Next, add your OpenAI key to the .env
file for experimentations on OpenAI models. Finally, start the experiments as described in the task directories.
To make each task behave separately as an encapsulated module, we have created separated directories for datasets as well as tasks and each task consists of a test_auto.sh
shell script that automatically runs zero-shot testing on all the task datasets and produces results that will be stored in TaskX/results/DATASET_NAME/
directory. Also, you can easily run any model on your desired input dataset by running test_manual.sh
and it will ask for the dataset, output logs to store outputs, as well as model name and device (CPU or GPU). For each of the important direcotries we produced the test.py
scripts which will be called in test_manual.sh
and test_auto.sh
multiple times on different datasets. The strucutre of TaskA
, TaskB
, and TaskC
directories are presented as follows (LLMs4OL/TaskX
directory):
.
└── LLMs4OL
├── tuning
│ ├── ....
│ ├── trainer.py
│ └── train_eval.sh
├── TaskX
│ ├── ...
│ ├── results
│ | ├── dataset1
| | └── ....
│ ├── ...
│ ├── test.py
│ ├── test_auto.sh
│ ├── test_manual.sh
│ └── README.md
...
The train_eval.sh
in the tuning
directory runs trainer.py
for representative datasets and then walks through TaskX
directories and calls test.py
for evaluation of trained models for each dataset. How to run models in detail is described tasks directories readme.md files.
@InProceedings{10.1007/978-3-031-47240-4_22,
author="Babaei Giglou, Hamed
and D'Souza, Jennifer
and Auer, S{\"o}ren",
editor="Payne, Terry R.
and Presutti, Valentina
and Qi, Guilin
and Poveda-Villal{\'o}n, Mar{\'i}a
and Stoilos, Giorgos
and Hollink, Laura
and Kaoudi, Zoi
and Cheng, Gong
and Li, Juanzi",
title="LLMs4OL: Large Language Models for Ontology Learning",
booktitle="The Semantic Web -- ISWC 2023",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="408--427",
isbn="978-3-031-47240-4"
}