KB_QA (Knowledge Base Question Answering)

Introduction

This is knowledge base QA task with the data from http://tcci.ccf.org.cn/conference/2017/taskdata.php (task 5: Open Domain Question Answering). Chinese version introduction: https://blog.csdn.net/m0_37531129/article/details/103321814

Repeat Task

1. prepare the environment

My development environment is ubuntu 14.04 with pytorch 1.2.0.
Please make sure 'mysql' was installed in your desktop/PC. There are many guidances for mysql installing in Ubuntu.

2. repeat the model

You can clone or download this 'KB_QA' repository.

Step-1: prepare data. (no parameter, just run by script in termial by cmd like 'python xxx.py')

please 'cd preProcessData'

1. run splitTest.py
There are train and test dataset in NLPCC2017 task5. We can spilt test data by 1:1 to get test and dev data.
1. run preCleanData.py
There are three functions in this script: getNERData, getDBData and getSimilarityData. You will get three folders named NERData, DBData, SIMData.
1. run uploadDB.py
(pls create a KB_QA database in mysql). Running the script, it will upload data in DBData folder to KB_QA database.

Step-2: train NER model.

1. Pls download Bert Pretraining model(pytorch version) adding them to subModel/BertPreTrainedModel. and pls make sure NERData folder existed under preProcessData/NERData, there should be three text files(train.txt, dev.txt, test.txt)
1. training NER model
  
  run NERMain.py --data_dir preProcessData/NERData --vocab_file BertPreTrainedModel/vocab.txt --model_config BertPreTrainedModel/conig.json --output_dir output_model --pre_train_model BertPreTrainedModel/pytorch_model.bin --max_seq_length 64 --do_train --train_batch_size 8 --eval_batch_szie 8 --gradient_accumulation_steps 16 --num_train_epochs 8

Step-3: train classification model.

1. Pls make sure Bert Pretraining model(pytorch version) under subModel/BertPreTrainedModel existed.
1. training classification model
run SIMMain.py --data_dir preProcessData/SIMData --vocab_file BertPreTrainedModel/vocab.txt --model_config BertPreTrainedModel/config.json --output_dir output_model --pre_train_model BertPreTrainedModel/pytorch_model.bin --max_seq_length 64 --do_train --train_epoch_size 8 --eval_batch_size 8 --gradient_accumulation_steps 16 -num_train_epochs 8

Step-4: Run model test.

python RunTask.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
preProcessData		preProcessData
subModels		subModels
LICENSE		LICENSE
README.md		README.md
RunTask.py		RunTask.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KB_QA (Knowledge Base Question Answering)

Introduction

Repeat Task

1. prepare the environment

2. repeat the model

Step-1: prepare data. (no parameter, just run by script in termial by cmd like 'python xxx.py')

Step-2: train NER model.

Step-3: train classification model.

Step-4: Run model test.

About

Releases

Packages

Languages

License

jessie0624/Knowledge-Base-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

KB_QA (Knowledge Base Question Answering)

Introduction

Repeat Task

1. prepare the environment

2. repeat the model

Step-1: prepare data. (no parameter, just run by script in termial by cmd like 'python xxx.py')

Step-2: train NER model.

Step-3: train classification model.

Step-4: Run model test.

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages