Bridging the Gap Between Relevance Matching and Semantic Matching for Short Text Similarity Modeling
This repo contains code and data for our paper published in EMNLP'19.
If you are using this code or dataset, please kindly cite the paper below:
@inproceedings{rao2019bridging,
title={Bridging the Gap Between Relevance Matching and Semantic Matching for Short Text Similarity Modeling},
author={Rao, Jinfeng and Liu, Linqing and Tay, Yi and Yang, Wei and Shi, Peng and Lin, Jimmy},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
pages={5373--5384},
year={2019}
}
- Python 2.7
- Tensorflow (tested on 1.9.0)
- Keras (tested on 2.1.5)
- Download our repo:
git clone https://github.com/jinfengr/hcan.git
cd hcan
- Install Tensorflow and Keras dependency:
$ pip install -r requirements.txt
- Install gdrive
- Download required data and word2vec:
$ chmod +x *.sh; ./download.sh
$ ./generate_idf.sh
- Run on TrecQA/Quora/TwitterURL datasets:
CUDA_VISIBLE_DEVICES=0 python -u train.py --dataset TrecQA -j hcan
The path of best model and output predictions will be shown in the log.
- Run on Twitter datasets (test on
trec-2013
):
CUDA_VISIBLE_DEVICES=0 python -u train.py --dataset twitter -t trec-2013 -j hcan
Note: you might need around ~40GB memory to create the twitter dataset (because of the large size of IDF weights). Please file a issue if you have any problem in creating the dataset.
- Parameter sweep to find the best parameter set (make sure the dataset is created before sweep):
./param_sweep.sh TrecQA hcan 0 &
This command will save all the outputs under tune_logs folder.
option | input format | default | description |
---|---|---|---|
-l |
[true, false] | false | whether to load pre-created dataset (set to true when data is ready) |
-j |
[matching, biattention, hcan] | matching | attention choices, matching for relevance matching in Sec. 2.2, biattention for semantic matching in Sec. 2.3, hcan for the complete hcan model |
-e |
[deepconv, wideconv, bilstm] | deepconv | encoder choices described in Sec. 2.1 |
-w |
[none, query] | none | whether to include IDF weighting, none for not include, query for include |
--nb_layers |
[1, n) | 5 | number of convolutional or BiLSTM layers |
--nb_filters |
[1, n) | 256 | number of convolutional filters or BiLSTM hidden dim |
--model_option |
[complete, word-only] | complete | what input sources to use, complete for using both word and character-level ngram representations, word-only for using only word representations |
--conv_option |
[normal, ResNet] | normal | convolutional model, normal or ResNet |
--co-attention |
[BiDAF, ESIM] | BiDAF | different biattention implementations |
--highway |
[true, false] | false | whether to include highway layer |
-t |
[trec-2011, trec-2012, trec-2013, trec-2014] | trec-2013 | test set, only needed for twitter datasets |
--load_model |
[true, false] | false | whether to load pre-trained model |
-b |
[1, n) | 64 | batch size |
-d |
[0, 1] | 0.1 | dropout rate |
-o |
[sgd, adam, rmsprop] | sgd | optimization method |
--lr |
[0, 1] | 0.05 | learning rate |
--epochs |
[1, n) | 15 | number of training epochs |
--trainable |
[true, false] | true | whether to train word embeddings |
--val_split |
(0, 1) | 0.15 | percentage of validation set sampled from training set |
-v |
[0, 1, 2] | 1 | verbose (for logging), 0 for silent, 1 for interactive, 2 for per-epoch logging |