Skip to content

NLP2CT/Difficulty-Aware-MT-Evaluation

 
 

Repository files navigation

Difficulty-Aware Machine Translation Evaluation

License: MIT

Automatic Evaluation Metric described in the paper Difficulty-Aware Machine Translation Evaluation (ACL 2021).

Citation

@inproceedings{zhan-etal-2021-difficulty,
    title = "Difficulty-Aware Machine Translation Evaluation",
    author = "Zhan, Runzhe and Liu, Xuebo and Wong, Derek F. and Chao, Lidia S.",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    pages = "26--32"
 }

Acknowledgement

This repo wouldn't be possible without the awesome BERTScore, bert, fairseq, and transformers.

Installation

  • Python version >= 3.6
  • PyTorch version >= 1.0.0

Install it from the source by:

git clone https://github.com/NLP2CT/Difficulty-Aware-MT-Evaluation
cd Difficulty-Aware-MT-Evaluation
pip install --editable .

Usage

This package not only preserves the original functions of BERTScore (ver.0.3.7) but also can coexist with BERTScore.

Parameters

Parameters Descriptions
--with_diff Score multiple MT systems using difficulty information. Otherwise, switch to the original BERTScore implementation.
--cand_list Paths of system output files.
--save_path Path of result file

Please refer to BERTScore for other parameters.

CLI Example

For reproducing the the WMT19 En-De Top-6 scoring results, you can use the provided example files and CLI tool as follows:

WMT19_DATA_PATH=example_data/wmt19-ende

mkdir wmt2019_res/en-de/topk

da-bert-score --with_diff --batch_size 256 --lang de --ref ${WMT19_DATA_PATH}/ref/newstest2019-ende-ref.de \
    --cand_list ${WMT19_DATA_PATH}/sys/en-de/newstest2019.Facebook_FAIR.6862.en-de ${WMT19_DATA_PATH}/sys/en-de/newstest2019.Microsoft-WMT19-sentence_document.6974.en-de ${WMT19_DATA_PATH}/sys/en-de/newstest2019.Microsoft-WMT19-document-level.6808.en-de ${WMT19_DATA_PATH}/sys/en-de/newstest2019.MSRA.MADL.6926.en-de ${WMT19_DATA_PATH}/sys/en-de/newstest2019.UCAM.6731.en-de ${WMT19_DATA_PATH}/sys/en-de/newstest2019.NEU.6763.en-de \
    --save_path wmt2019_res/en-de/topk

For reprocuding other results, please download the WMT19 raw data from the official website.

This implementation follows the default behaviors (models, layers) of BERTScore when evaluating the different languages.

Variant

We will keep exploring the possible variants of DA-BERTScore.

The default parameter settings are used to reproduce the results reported in the paper. For achieving better correlation results across multiple languages, you can try one variant by enabling the parameter --ref_diff --softmax_norm --range_one. The explanations are as follows:

Parameters Descriptions
--ref_diff The weight of the hypothesis word could be unknown if there is no identical word in the reference. By enabling this parameter, it will use the weight of corresponding reference word whose similarity score is maximal.
--softmax_ norm Use the softmax function to smooth the distribution of difficulty weight.
--range_one Scale the score to range [0,1]

About

Difficulty-Aware Machine Translation Evaluation (ACL 2021)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%