MPC-BERT-2.0: An Upgraded Pre-Trained Language Model for Multi-Party Conversation Understanding

This repository contains the upgraded source code version of MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding (https://github.com/JasonForJoy/MPC-BERT)

MPC-BERT-2.0 is an upgraded version which supports for Tensorflow version 2.

Dependencies

Python 3.8
Tensorflow 2.10.0

Download

Download the BERT released by the Google research, and move to path: ./uncased_L-12_H-768_A-12
MPC-BERT also release the pre-trained MPC-BERT model, and move to path: ./uncased_L-12_H-768_A-12_MPCBERT. You just need to fine-tune it to reproduce the original results.
Download the Hu et al. (2019) dataset used in the original paper, and move to path: ./data/ijcai2019/
Download the Ouchi and Tsuboi (2016) dataset used in the original paper, and move to path: ./data/emnlp2016/
Unzip the dataset and run the following commands.
```
cd data/emnlp2016/
python data_preprocess.py
```

Pre-training

Create the pre-training data.

python create_pretraining_data.py

Running the pre-training process.

cd scripts/
bash run_pretraining.sh

The pre-trained model will be saved to the path ./uncased_L-12_H-768_A-12_MPCBERT.
Modify the filenames in this folder to make it the same as those in Google's BERT.

Fine-tuning

Take the task of addressee recognition as an example.
Create the fine-tuning data.

python create_finetuning_data_ar.py

Running the fine-tuning process.

cd scripts/
bash run_finetuning.sh

Testing

Modify the variable restore_model_dir in run_testing.sh
Running the testing process.

cd scripts/
bash run_testing.sh

Downstream Tasks

Replace these scripts and its corresponding data when evaluating on other downstream tasks.

create_finetuning_data_{ar, si, rs}.py
run_finetuning_{ar, si, rs}.py  
run_testing_{ar, si, rs}.py

Specifically for the task of response selection, a output_test.txt file which records scores for each context-response pair will be saved to the path of restore_model_dir after testing.
Modify the variable test_out_filename in compute_metrics.py and then run python compute_metrics.py, various metrics will be shown.

Cite

If you think of using the code, please cite the following paper: "MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding" Jia-Chen Gu, Chongyang Tao, Zhen-Hua Ling, Can Xu, Xiubo Geng, Daxin Jiang. ACL (2021)

@inproceedings{gu-etal-2021-mpc,
 title = "{MPC}-{BERT}: A Pre-Trained Language Model for Multi-Party Conversation Understanding",
 author = "Gu, Jia-Chen  and
           Tao, Chongyang  and
           Ling, Zhen-Hua  and
           Xu, Can  and
           Geng, Xiubo  and
           Jiang, Daxin",
 booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
 month = aug,
 year = "2021",
 address = "Online",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2021.acl-long.285",
 pages = "3682--3692",
}

Acknowledgments

Thank Wenpeng Hu and Zhangming Chan for providing the processed Hu et al. (2019) dataset used in their paper.
Thank Ran Le for providing the processed Ouchi and Tsuboi (2016) dataset used in their paper.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
image		image
scripts		scripts
uncased_L-12_H-768_A-12		uncased_L-12_H-768_A-12
README.md		README.md
__init__.py		__init__.py
compute_metrics.py		compute_metrics.py
create_finetuning_data_ar.py		create_finetuning_data_ar.py
create_finetuning_data_rs.py		create_finetuning_data_rs.py
create_finetuning_data_si.py		create_finetuning_data_si.py
create_pretraining_data.py		create_pretraining_data.py
metrics.py		metrics.py
modeling_speaker.py		modeling_speaker.py
optimization.py		optimization.py
run_finetuning_ar.py		run_finetuning_ar.py
run_finetuning_rs.py		run_finetuning_rs.py
run_finetuning_si.py		run_finetuning_si.py
run_pretraining.py		run_pretraining.py
run_testing_ar.py		run_testing_ar.py
run_testing_rs.py		run_testing_rs.py
run_testing_si.py		run_testing_si.py
tokenization.py		tokenization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MPC-BERT-2.0: An Upgraded Pre-Trained Language Model for Multi-Party Conversation Understanding

Dependencies

Download

Pre-training

Fine-tuning

Testing

Downstream Tasks

Cite

Acknowledgments

About

Releases

Packages

Languages

CyraxSector/MPC-BERT-2.0

Folders and files

Latest commit

History

Repository files navigation

MPC-BERT-2.0: An Upgraded Pre-Trained Language Model for Multi-Party Conversation Understanding

Dependencies

Download

Pre-training

Fine-tuning

Testing

Downstream Tasks

Cite

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages