We annotated a dialogue data set, User Satisfaction Simulation (USS), that includes 6,800 dialogues. All user utterances in those dialogues, as well as the dialogues themselves, have been labeled based on a 5-level satisfaction scale. See dataset.
These resources are developed within the following paper:
Weiwei Sun, Shuo Zhang, Krisztian Balog, Zhaochun Ren, Pengjie Ren, Zhumin Chen, Maarten de Rijke. "Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems". In SIGIR. Paper link
The dataset (see dataset) is provided a TXT format, where each line is separated by "\t":
- speaker role (USER or SYSTEM),
- text,
- action,
- satisfaction (repeated annotation are separated by ","),
- explanation text (only for JDDC at dialogue level, and repeated annotation are separated by ";")
And sessions are separated by blank lines.
Since the original dataset does not provide actions, we use the action annotation provided by IARD and included it in ReDial-action.txt.
The JDDC data set provides the action of each user utterances, including 234 categories. We compress them into 12 categories based on a manually defined classification method (see JDDC-ActionList.txt).
The USS dataset is based on five benchmark task-oriented dialogue datasets: JDDC, Schema Guided Dialogue (SGD), MultiWOZ 2.1, Recommendation Dialogues (ReDial), and Coached Conversational Preference Elicitation (CCPE).
Domain | JDDC | SGD | MultiWOZ | ReDial | CCPE |
---|---|---|---|---|---|
Language | Chinese | English | English | English | English |
#Dialogues | 3,300 | 1,000 | 1,000 | 1,000 | 500 |
Avg# Turns | 32.3 | 26.7 | 23.1 | 22.5 | 24.9 |
#Utterances | 54,517 | 13,833 | 12,553 | 11,806 | 6,860 |
Rating 1 | 120 | 5 | 12 | 20 | 10 |
Rating 2 | 4,820 | 769 | 725 | 720 | 1,472 |
Rating 3 | 45,005 | 11,515 | 11,141 | 9,623 | 5,315 |
Rating 4 | 4,151 | 1,494 | 669 | 1,490 | 59 |
Rating 5 | 421 | 50 | 6 | 34 | 4 |
The code for baseline reproduction can be found within /baselines
.
@inproceedings{Sun:2021:SUS,
author = {Sun, Weiwei and Zhang, Shuo and Balog, Krisztian and Ren, Zhaochun and Ren, Pengjie and Chen, Zhumin and de Rijke, Maarten},
title = {Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems},
booktitle = {Proceedings of the 44rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
series = {SIGIR '21},
year = {2021},
publisher = {ACM}
}
If you have any questions, please contact sunnweiwei@gmail.com