Implementation and comparison of several solutions for Dialogue Act Classification.
The Switchboard Dialog Act Corpus (SwDA) is used for training.
swda GitHub repo is used to obtain the dataset.
Data is split into train, valid and test subsets according to "Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks" NAACL 2016 paper.
Model | Accuracy, % |
---|---|
Tf-Idf + LightGBM without context | 63.89 |
Fasttext + LightGBM without context | 66.57 |
Pretrained Bert + LightGBM without context | 66.61 |
Fasttext + Hierarchical RNN | 76.63 |
Pretrained Bert + RNN | 76.56 |
Fine-tuned Bert + RNN | 78.05 |
- Clone the repo:
git clone --recurse-submodules https://github.com/JandJane/DialogueActClassification.git
- Unzip data:
unzip DialogueActClassification/swda/swda.zip -d DialogueActClassification/swda/swda
- Install requirements:
pip install -r DialogueActClassification/requirements.txt
- Run notebooks 01-07