A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction

DeepKE is a knowledge extraction toolkit for knowledge graph construction supporting cnSchema，low-resource, document-level and multimodal scenarios for entity, relation and attribute extraction. We provide documents, Google Colab tutorials, online demo, paper, slides and poster for beginners.

Reading Materials:

Data-Efficient Knowledge Graph Construction, 高效知识图谱构建 (Tutorial on CCKS 2022) [slides]

Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]

PromptKG Family: a Gallery of Prompt Learning & KG-related Research Works, Toolkits, and Paper-list [Resources]

Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective [Survey][Paper-list]

Reasoning with Language Model Prompting [Survey][Paper-list]

Related Toolkit:

Doccano、MarkTool、LabelStudio: Data Annotation Toolkits

LambdaKG: A library and benchmark for PLM-based KG embeddings

What's New

Feb, 2023

We have supported using LLM (GPT-3) with in-context learning (based on Promptify) & data generation, added a NER model W2NER.

Nov, 2022

Add data annotation instructions for entity recognition and relation extraction, automatic labelling of weakly supervised data (entity extraction and relation extraction), and optimize multi-GPU training.

Sept, 2022

The paper DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population has been accepted by the EMNLP 2022 System Demonstration Track.

Aug, 2022

We have added data augmentation (Chinese, English) support for low-resource relation extraction.

June, 2022

We have added multimodal support for entity and relation extraction.

May, 2022

We have released DeepKE-cnschema with off-the-shelf knowledge extraction models.

Jan, 2022

We have released a paper DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Dec, 2021

We have added dockerfile to create the enviroment automatically.

Nov, 2021

The demo of DeepKE, supporting real-time extration without deploying and training, has been released.
The documentation of DeepKE, containing the details of DeepKE such as source codes and datasets, has been released.

Oct, 2021

pip install deepke
The codes of deepke-v2.0 have been released.

Aug, 2019

The codes of deepke-v1.0 have been released.

Aug, 2018

The project DeepKE startup and codes of deepke-v0.1 have been released.

Prediction Demo

There is a demonstration of prediction. The GIF file is created by Terminalizer. Get the code.

Model Framework

DeepKE contains a unified framework for named entity recognition, relation extraction and attribute extraction, the three knowledge extraction functions.
Each task can be implemented in different scenarios. For example, we can achieve relation extraction in standard, low-resource (few-shot), document-level and multimodal settings.
Each application scenario comprises of three components: Data including Tokenizer, Preprocessor and Loader, Model including Module, Encoder and Forwarder, Core including Training, Evaluation and Prediction.

Quick Start

DeepKE supports pip install deepke.
Take the fully supervised relation extraction for example.

Step1 Download the basic code

git clone --depth 1 https://github.com/zjunlp/DeepKE.git

Step2 Create a virtual environment using Anaconda and enter it.

❗NOTE: We provide a Dockerfile with tutorials please refer to the Tips to speed up installation

conda create -n deepke python=3.8

conda activate deepke

Install DeepKE with source code (Recommended)

python setup.py install

python setup.py develop

Install DeepKE with pip
```
pip install deepke
```

Step3 Enter the task directory

cd DeepKE/example/re/standard

Step4 Download the dataset, or follow the annotation instructions to obtain data

wget 120.27.214.45/Data/re/standard/data.tar.gz

tar -xzvf data.tar.gz

Many types of data formats are supported,and details are in each part.

Step5 Training (Parameters for training can be changed in the conf folder)

We support visual parameter tuning by using wandb.

python run.py

Step6 Prediction (Parameters for prediction can be changed in the conf folder)

Modify the path of the trained model in predict.yaml.The absolute path of the model needs to be used，such as xxx/checkpoints/2019-12-03_ 17-35-30/cnn_ epoch21.pth.

python predict.py

❗NOTE: if you encounter any errors, please refer to the Tips or submit a GitHub issue.

Requirements

python == 3.8

torch == 1.5
hydra-core == 1.0.6
tensorboard == 2.4.1
matplotlib == 3.4.1
transformers == 3.4.0
jieba == 0.42.1
scikit-learn == 0.24.1
pytorch-transformers == 1.2.0
seqeval == 1.2.2
tqdm == 4.60.0
opt-einsum==3.3.0
wandb==0.12.7
ujson

Introduction of Three Functions

1. Named Entity Recognition

Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.

The data is stored in .txt files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):

Sentence	Person	Location	Organization
本报北京9月4日讯记者杨涌报道：部分省区人民日报宣传发行工作座谈会9月3日在4日在京举行。	杨涌	北京	人民日报
《红楼梦》由王扶林导演，周汝昌、王蒙、周岭等多位专家参与制作。	王扶林，周汝昌，王蒙，周岭
秦始皇兵马俑位于陕西省西安市,是世界八大奇迹之一。	秦始皇	陕西省，西安市

Read the detailed process in specific README
- STANDARD (Fully Supervised)
  
  We support LLM and provide the off-the-shelf model, DeepKE-cnSchema-NER, which will extract entities in cnSchema without training.
  
  Step1 Enter DeepKE/example/ner/standard. Download the dataset.
```
wget 120.27.214.45/Data/ner/standard/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  
  The dataset and parameters can be customized in the data folder and conf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- FEW-SHOT
  
  Step1 Enter DeepKE/example/ner/few-shot. Download the dataset.
```
wget 120.27.214.45/Data/ner/few_shot/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training in the low-resouce setting
  
  The directory where the model is loaded and saved and the configuration parameters can be cusomized in the conf folder.
```
python run.py +train=few_shot
```
  Users can modify load_path in conf/train/few_shot.yaml to use existing loaded model.
  
  Step3 Add - predict to conf/config.yaml, modify loda_path as the model path and write_path as the path where the predicted results are saved in conf/predict.yaml, and then run python predict.py
```
python predict.py
```
- MULTIMODAL
  
  Step1 Enter DeepKE/example/ner/multimodal. Download the dataset.
```
wget 120.27.214.45/Data/ner/multimodal/data.tar.gz

tar -xzvf data.tar.gz
```
  We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
  
  Step2 Training in the multimodal setting
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify load_path in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

2. Relation Extraction

Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.

The data is stored in .csv files. Some instances as following (Users can label data based on the tools Doccano, MarkTool, or they can use the Weak Supervision with DeepKE to obtain data automatically):

Sentence	Relation	Head	Head_offset	Tail	Tail_offset
《岳父也是爹》是王军执导的电视剧，由马恩然、范明主演。	导演	岳父也是爹	1	王军	8
《九玄珠》是在纵横中文网连载的一部小说，作者是龙马。	连载网站	九玄珠	1	纵横中文网	7
提起杭州的美景，西湖总是第一个映入脑海的词语。	所在城市	西湖	8	杭州	2

!NOTE: If there are multiple entity types for one relation, entity types can be prefixed with the relation as inputs.
Read the detailed process in specific README
- STANDARD (Fully Supervised)
  
  We support LLM and provide the off-the-shelf model, DeepKE-cnSchema-RE, which will extract relations in cnSchema without training.
  
  Step1 Enter the DeepKE/example/re/standard folder. Download the dataset.
```
wget 120.27.214.45/Data/re/standard/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  
  The dataset and parameters can be customized in the data folder and conf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- FEW-SHOT
  
  Step1 Enter DeepKE/example/re/few-shot. Download the dataset.
```
wget 120.27.214.45/Data/re/few_shot/data.tar.gz

tar -xzvf data.tar.gz
```
  Step 2 Training
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify train_from_saved_model in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- DOCUMENT
  
  Step1 Enter DeepKE/example/re/document. Download the dataset.
```
wget 120.27.214.45/Data/re/document/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify train_from_saved_model in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- MULTIMODAL
  
  Step1 Enter DeepKE/example/re/multimodal. Download the dataset.
```
wget 120.27.214.45/Data/re/multimodal/data.tar.gz

tar -xzvf data.tar.gz
```
  We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
  
  Step2 Training
  - The dataset and parameters can be customized in the data folder and conf folder respectively.
  - Start with the model trained last time: modify load_path in conf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized by log_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

3. Attribute Extraction

Attribute extraction is to extract attributes for entities in a unstructed text.

The data is stored in .csv files. Some instances as following:

Sentence	Att	Ent	Ent_offset	Val	Val_offset
张冬梅，女，汉族，1968年2月生，河南淇县人	民族	张冬梅	0	汉族	6
诸葛亮，字孔明，三国时期杰出的军事家、文学家、发明家。	朝代	诸葛亮	0	三国时期	8
2014年10月1日许鞍华执导的电影《黄金时代》上映	上映时间	黄金时代	19	2014年10月1日	0

Read the detailed process in specific README
- STANDARD (Fully Supervised)
  
  Step1 Enter the DeepKE/example/ae/standard folder. Download the dataset.
```
wget 120.27.214.45/Data/ae/standard/data.tar.gz

tar -xzvf data.tar.gz
```
  Step2 Training
  
  The dataset and parameters can be customized in the data folder and conf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

Notebook Tutorial

This toolkit provides many Jupyter Notebook and Google Colab tutorials. Users can study DeepKE with them.

Standard Setting

NER Notebook

NER Colab

RE Notebook

RE Colab

AE Notebook

AE Colab
Low-resource

NER Notebook

NER Colab

RE Notebook

RE Colab
Document-level

RE Notebook

RE Colab
Multimodal

NER Notebook

NER Colab

RE Notebook

RE Colab

Tips

1.Using nearest mirror, THU in China, will speed up the installation of Anaconda; aliyun in China, will speed up pip install XXX.

2.When encountering ModuleNotFoundError: No module named 'past'，run pip install future .

3.It's slow to install the pretrained language models online. Recommend download pretrained models before use and save them in the pretrained folder. Read README.md in every task directory to check the specific requirement for saving pretrained models.

4.The old version of DeepKE is in the deepke-v1.0 branch. Users can change the branch to use the old version. The old version has been totally transfered to the standard relation extraction (example/re/standard).

5.It's recommended to install DeepKE with source codes. Because user may meet some problems in Windows system with 'pip',and the source code modification will not work,seeissue

6.More related low-resource knowledge extraction works can be found in Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective.

7.Make sure the exact versions of requirements in requirements.txt.

To do

In next version, we plan to add event extraction to the toolkit.

Meanwhile, we will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.

Citation

Please cite our paper if you use DeepKE in your work

@inproceedings{zhang-etal-2022-deepke,
    title = "{D}eep{KE}: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population",
    author = "Zhang, Ningyu  and
      Xu, Xin  and
      Tao, Liankuan  and
      Yu, Haiyang  and
      Ye, Hongbin  and
      Qiao, Shuofei  and
      Xie, Xin  and
      Chen, Xiang  and
      Li, Zhoubo  and
      Li, Lei",
    booktitle = "Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-demos.10",
    pages = "98--108",
    abstract = "We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re{\_}doc{\_}show.html for real-time extraction of various tasks, and a demo video.",
}

Contributors

Zhejiang University: Ningyu Zhang, Liankuan Tao, Xin Xu, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Peng Wang, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Shumin Deng, Wen Zhang, Guozhou Zheng, Huajun Chen

Community Contributors: thredreams, eltociear

Alibaba Group: Feiyu Xiong, Qiang Chen

DAMO Academy: Zhenru Zhang, Chuanqi Tan, Fei Huang

Other Knowledge Extraction Open-Source Projects

Name		Name	Last commit message	Last commit date
Latest commit History 930 Commits
.github		.github
docker		docker
docs		docs
example		example
pics		pics
pretrained		pretrained
src/deepke		src/deepke
tutorial-notebooks		tutorial-notebooks
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
README_CNSCHEMA.md		README_CNSCHEMA.md
README_CNSCHEMA_CN.md		README_CNSCHEMA_CN.md
README_TAG.md		README_TAG.md
README_TAG_CN.md		README_TAG_CN.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction

Table of Contents

What's New

Feb, 2023

Nov, 2022

Sept, 2022

Aug, 2022

June, 2022

May, 2022

Jan, 2022

Dec, 2021

Nov, 2021

Oct, 2021

Aug, 2019

Aug, 2018

Prediction Demo

Model Framework

Quick Start

Requirements

Introduction of Three Functions

1. Named Entity Recognition

2. Relation Extraction

3. Attribute Extraction

Notebook Tutorial

Tips

To do

Citation

Contributors

Other Knowledge Extraction Open-Source Projects

About

Releases

Packages

Languages

License

wangjitoan/DeepKE

Folders and files

Latest commit

History

Repository files navigation

A Deep Learning Based Knowledge Extraction Toolkitfor Knowledge Graph Construction

Table of Contents

What's New

Feb, 2023

Nov, 2022

Sept, 2022

Aug, 2022

June, 2022

May, 2022

Jan, 2022

Dec, 2021

Nov, 2021

Oct, 2021

Aug, 2019

Aug, 2018

Prediction Demo

Model Framework

Quick Start

Requirements

Introduction of Three Functions

1. Named Entity Recognition

2. Relation Extraction

3. Attribute Extraction

Notebook Tutorial

Tips

To do

Citation

Contributors

Other Knowledge Extraction Open-Source Projects

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction

Packages