This repository contains the Input-to-Output gate proposed in Input-to-Output Gate to Improve RNN Language Models (IOG) published in IJCNLP 2017. IOG, which is the method to improve existing trained RNN language models, has an extremely simple structure, and thus, can be easily combined with any RNN language models.
In addition, this repository also contains various RNN language models including LSTM with dropout, variational LSTM, and Recurrent Highway Network for the base RNN language models.
If you use this code or our results in your research, please cite:
@inproceedings{iog4lm,
title={{Input-to-Output Gate to Improve RNN Language Models}},
author={Takase, Sho and Suzuki, Jun and Nagata, Masaaki},
booktitle={Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)},
year={2017}
}
The following tables show the performance (word-level perplexity) of IOG with some baselines. The paper describes more detailed experiments.
Model | Param | Valid | Test |
---|---|---|---|
LSTM (medium) | 20M | 86.2 | 82.7 |
LSTM (medium) + IOG | 26M | 84.1 | 81.1 |
Variational LSTM (medium) | 20M | 81.9 | 79.7 |
Variational LSTM (medium) + IOG | 26M | 81.2 | 78.1 |
Variational Recurrent Highway Network (depth 8) | 32M | 71.2 | 68.5 |
Variational Recurrent Highway Network (depth 8) + IOG | 38M | 69.2 | 66.5 |
Variational Recurrent Highway Network (depth 8) + WT | 23M | 69.2 | 66.3 |
Variational Recurrent Highway Network (depth 8) + WT + IOG | 29M | 67.0 | 64.4 |
Model | Param | Valid | Test |
---|---|---|---|
LSTM (medium) | 50M | 102.2 | 96.2 |
LSTM (medium) + IOG | 70M | 99.2 | 93.8 |
Variational LSTM (medium) | 50M | 97.2 | 91.8 |
Variational LSTM (medium) + IOG | 70M | 95.9 | 91.0 |
Variational Recurrent Highway Network + WT | 43M | 80.1 | 76.1 |
Variational Recurrent Highway Network + WT + IOG | 63M | 76.9 | 73.9 |
- Python 2
- Chainer
We note that it is impossible to run these codes on Python 3.
- Run
preparedata.sh
to obtain the preprocessed Penn Treebank dataset and construct .npz files. - Train the base language model
- E.g., to obtain LSTM language model with dropout, run
python learnLSTMLM.py -g 0 --train data/ptb/ptb.train.npz --valid data/ptb/ptb.valid.npz -v data/ptb/ptb.train.vocab
--valid_with_batch --output mediumLSTMLM
- Train Input-to-Output gate for the above base language model
- E.g., run the following command to train Input-to-Output gate for the lstm language model in the above example.
python learnIOG.py -g 0 --train data/ptb/ptb.train.npz --valid data/ptb/ptb.valid.npz -s mediumLSTMLM.setting
-m mediumLSTMLM.epoch39_fin.bin --output iog4mediumLSTMLM
- Run
testLM.py
for computing perplexity of the base language model- E.g., run
python testLM.py -g 0 -t data/ptb/ptb.test.npz -s mediumLSTMLM.setting -m mediumLSTMLM.epoch39_fin.bin
for the lstm language model in the above example.
- E.g., run
- Run
testIOG.py
for computing perplexity of the base language model with Input-to-Output gate.- E.g., run
python testIOG.py -g 0 -t data/ptb/ptb.test.npz -s iog4mediumLSTMLM.setting -m iog4mediumLSTMLM.bin
for the Input-to-Output gate in the above example.
- E.g., run
- If you want to use the cache mecanism, specify
--cachesize N
in runningtestLM.py
ortestIOG.py