This respository includes the example scripts of the following paper:
Context-aware Neural Machine Translation with Mini-batch Embedding
Makoto Morishita, Jun Suzuki, Tomoharu Iwata, Masaaki Nagata
https://www.aclweb.org/anthology/2021.eacl-main.214/
- Python 3
- PyTorch
- sentencepiece
- sacrebleu
pip install "sacrebleu[ja]"
- NVIDIA GPU with CUDA
This will download the corpora and preprocess the files.
$ cd ./corpus
$ ./process.sh
In order to run fairseq, you need to build.
$ cd ./tools/fairseq_doc
$ pip install --editable .
The training scripts are available in ./en-ja/
.
You may need to change the PROJECT_DIR
variable in the scripts.
This is an example of training a MBE enc model.
$ cd ./en-ja
$ nohup train_model_mbe_enc.sh 1 &> train_model_mbe_enc.log &
Please send an issue on GitHub or contact us by email.
NTT Communication Science Laboratories
Makoto Morishita
makoto.morishita.gr -a- hco.ntt.co.jp