-
1.4.7 (2020-04-19)
- Added Google's extra wmt19/en-de refs (-t wmt19/google/{ar,arp,hqall,hqp,hqr,wmtp}) (Freitag, Grangier, & Caswell BLEU might be Guilty but References are not Innocent https://arxiv.org/abs/2004.06063)
- Restored SACREBLEU_DIR and smart_open to exports (thanks to Thomas Liao @tholiao)
-
1.4.6 (2020-03-28)
- Large internal reorganization as a module (thanks to Thamme Gowda @thammegowda)
-
1.4.5 (2020-03-28)
- Added Japanese MeCab tokenize (
-tok ja-mecab
) (thanks to Makoto Morishita @MorinoseiMorizo) - Added wmt20/dev test sets (thanks to Martin Popel @martinpopel)
- Added Japanese MeCab tokenize (
-
1.4.4 (2020-03-10)
- Smoothing changes (Sebastian Nickels @sn1c)
- Fixed bug that only applied smoothing to n-grams for n > 2
- Added default smoothing values for methods "floor" (0) and "add-k" (1)
--list
now returns a list of all language pairs for a task when combined with-t
(e.g.,sacrebleu -t wmt19 --list
)- added missing languages for IWSLT17
- Minor code improvements (Thomas Liao @tholiao)
- Smoothing changes (Sebastian Nickels @sn1c)
-
1.4.3 (2019-12-02)
- Bugfix: handling of result object for CHRF
- Improved API example
-
1.4.2 (2019-10-11)
- Tokenization variant omitted from the chrF signature; it is relevant only for BLEU (thanks to Martin Popel)
- Bugfix: call to sentence_bleu (thanks to Rachel Bawden)
- Documentation example for Python API (thanks to Vlad Lyalin)
- Calls to corpus_chrf and sentence_chrf now return a an object instead of a float (use result.score)
-
1.4.1 (2019-09-11)
- Added sentence-level scoring via -sl (--sentence-level)
-
1.4.0 (2019-09-10)
- Many thanks to Martin Popel for all the changes below!
- Added evaluation on concatenated test sets (e.g.,
-t wmt17,wmt18
). Works as long as they all have the same language pair. - Added
sacrebleu --origlang
(both for evaluation on a subset and for--echo
). Note that while echoing prints just the subset, evaluation expects the complete test set (and just skips the irrelevant parts). - Added
sacrebleu --detail
for breakdown by domain-specific subsets of the test sets. (Available for WMT19). - Minor changes
- Improved display of
sacrebleu -h
- Added
sacrebleu --list
- Code refactoring
- Documentation and tests updates
- Fixed a race condition bug (
os.makedirs(outdir, exist_ok=True)
instead ofif os.path.exists
)
- Improved display of
-
1.3.7 (2019-07-12)
- Lazy loading of regexes cuts import time from ~1s to nearly nothing (thanks, @louismartin!)
- Added a simple (non-atomic) lock on downloading
- Can now read multiple refs from a single tab-delimited file.
You need to pass
--num-refs N
to tell it to run the split. Only works with a single reference file passed from the command line.
-
1.3.6 (2019-06-10)
- Removed another f-string for Python 3.5 compatibility
-
1.3.5 (2019-06-07)
- Restored Python 3.5 compatibility
-
1.3.4 (2019-05-28)
- Added MTNT 2019 test sets
- Added a BLEU object
-
1.3.3 (2019-05-08)
- Added WMT'19 test sets
-
1.3.2 (2018-04-24)
- Bugfix in test case (thanks to Adam Roberts, @adarob)
- Passing smoothing method through
sentence_bleu
-
1.3.1 (2019-03-20)
- Added another smoothing approach (add-k) and a command-line option for choosing the smoothing method
(
--smooth exp|floor|add-n|none
) and the associated value (--smooth-value
), when relevant. - Changed interface to some functions (backwards incompatible)
- 'smooth' is now 'smooth_method'
- 'smooth_floor' is now 'smooth_value'
- Added another smoothing approach (add-k) and a command-line option for choosing the smoothing method
(
-
1.2.21 (19 March 2019)
- Ctrl-M characters are now treated as normal characters, previously treated as newline.
-
1.2.20 (28 February 2018)
- Tokenization now defaults to "zh" when language pair is known
-
1.2.19 (19 February 2019)
- Updated checksum for wmt19/dev (seems to have changed)
-
1.2.18 (19 February 2019)
- Fixed checksum for wmt17/dev (copy-paste error)
-
1.2.17 (6 February 2019)
- Added kk-en and en-kk to wmt19/dev
-
1.2.16 (4 February 2019)
- Added gu-en and en-gu to wmt19/dev
-
1.2.15 (30 January 2019)
- Added MD5 checksumming of downloaded files for all datasets.
-
1.2.14 (22 January 2019)
- Added mtnt1.1/train mtnt1.1/valid mtnt1.1/test data from MTNT
-
1.2.13 (22 January 2019)
- Added 'wmt19/dev' task for 'lt-en' and 'en-lt' (development data for new tasks).
- Added MD5 checksum for downloaded tarballs.
-
1.2.12 (8 November 2018)
- Now outputs only only digit after the decimal
-
1.2.11 (29 August 2018)
- Added a function for sentence-level, smoothed BLEU
-
1.2.10 (23 May 2018)
- Added wmt18 test set (with references)
-
1.2.9 (15 May 2018)
- Added zh-en, en-zh, tr-en, and en-tr datasets for wmt18/test-ts
-
1.2.8 (14 May 2018)
- Added wmt18/test-ts, the test sources (only) for WMT18
- Moved README out of
sacrebleu.py
and the CHANGELOG into a separate file
-
1.2.7 (10 April 2018)
- fixed another locale issue (with --echo)
- grudgingly enabled
-tok none
from the command line
-
1.2.6 (22 March 2018)
- added wmt17/ms (Microsoft's additional ZH-EN references).
Try
sacrebleu -t wmt17/ms --cite
. --echo ref
now pastes together all references, if there is more than one
- added wmt17/ms (Microsoft's additional ZH-EN references).
Try
-
1.2.5 (13 March 2018)
- added wmt18/dev datasets (en-et and et-en)
- fixed logic with --force
- locale-independent installation
- added "--echo both" (tab-delimited)
-
1.2.3 (28 January 2018)
- metrics (
-m
) are now printed in the order requested - chrF now prints a version string (including the beta parameter, importantly)
- attempt to remove dependence on locale setting
- metrics (
-
1.2 (17 January 2018)
- added the chrF metric (
-m chrf
or-m bleu chrf
for both) See 'CHRF: character n-gram F-score for automatic MT evaluation' by Maja Popovic (WMT 2015) [http://www.statmt.org/wmt15/pdf/WMT49.pdf] - added IWSLT 2017 test and tuning sets for DE, FR, and ZH (Thanks to Mauro Cettolo and Marcello Federico).
- added
--cite
to produce the citation for easy inclusion in papers - added
--input
(-i
) to set input to a file instead of STDIN - removed accent mark after objection from UN official
- added the chrF metric (
-
1.1.7 (27 November 2017)
- corpus_bleu() now raises an exception if input streams are different lengths
- thanks to Martin Popel for:
- small bugfix in tokenization_13a (not affecting WMT references)
- adding
--tok intl
(international tokenization)
- added wmt17/dev and wmt17/dev sets (for languages intro'd those years)
-
1.1.6 (15 November 2017)
- bugfix for tokenization warning
-
1.1.5 (12 November 2017)
- added -b option (only output the BLEU score)
- removed fi-en from list of WMT16/17 systems with more than one reference
- added WMT16/tworefs and WMT17/tworefs for scoring with both en-fi references
-
1.1.4 (10 November 2017)
- added effective order for sentence-level BLEU computation
- added unit tests from sockeye
-
1.1.3 (8 November 2017).
- Factored code a bit to facilitate API:
- compute_bleu: works from raw stats
- corpus_bleu for use from the command line
- raw_corpus_bleu: turns off tokenization, command-line sanity checks, floor smoothing
- Smoothing (type 'exp', now the default) fixed to produce mteval-v13a.pl results
- Added 'floor' smoothing (adds 0.01 to 0 counts, more versatile via API), 'none' smoothing (via API)
- Small bugfixes, windows compatibility (H/T Christian Federmann)
- Factored code a bit to facilitate API:
-
1.0.3 (4 November 2017).
- Contributions from Christian Federmann:
- Added explicit support for encoding
- Fixed Windows support
- Bugfix in handling reference length with multiple refs
- Contributions from Christian Federmann:
-
version 1.0.1 (1 November 2017).
- Small bugfix affecting some versions of Python.
- Code reformatting due to Ozan Çağlayan.
-
version 1.0 (23 October 2017).
- Support for WMT 2008--2017.
- Single tokenization (v13a) with lowercase fix (proper lower() instead of just A-Z).
- Chinese tokenization.
- Tested to match all WMT17 scores on all arcs.