Skip to content

Latest commit

 

History

History
202 lines (160 loc) · 8.08 KB

CHANGELOG.md

File metadata and controls

202 lines (160 loc) · 8.08 KB

VERSION HISTORY

  • 1.4.7 (2020-04-19)

    • Added Google's extra wmt19/en-de refs (-t wmt19/google/{ar,arp,hqall,hqp,hqr,wmtp}) (Freitag, Grangier, & Caswell BLEU might be Guilty but References are not Innocent https://arxiv.org/abs/2004.06063)
    • Restored SACREBLEU_DIR and smart_open to exports (thanks to Thomas Liao @tholiao)
  • 1.4.6 (2020-03-28)

    • Large internal reorganization as a module (thanks to Thamme Gowda @thammegowda)
  • 1.4.5 (2020-03-28)

    • Added Japanese MeCab tokenize (-tok ja-mecab) (thanks to Makoto Morishita @MorinoseiMorizo)
    • Added wmt20/dev test sets (thanks to Martin Popel @martinpopel)
  • 1.4.4 (2020-03-10)

    • Smoothing changes (Sebastian Nickels @sn1c)
      • Fixed bug that only applied smoothing to n-grams for n > 2
      • Added default smoothing values for methods "floor" (0) and "add-k" (1)
    • --list now returns a list of all language pairs for a task when combined with -t (e.g., sacrebleu -t wmt19 --list)
    • added missing languages for IWSLT17
    • Minor code improvements (Thomas Liao @tholiao)
  • 1.4.3 (2019-12-02)

    • Bugfix: handling of result object for CHRF
    • Improved API example
  • 1.4.2 (2019-10-11)

    • Tokenization variant omitted from the chrF signature; it is relevant only for BLEU (thanks to Martin Popel)
    • Bugfix: call to sentence_bleu (thanks to Rachel Bawden)
    • Documentation example for Python API (thanks to Vlad Lyalin)
    • Calls to corpus_chrf and sentence_chrf now return a an object instead of a float (use result.score)
  • 1.4.1 (2019-09-11)

    • Added sentence-level scoring via -sl (--sentence-level)
  • 1.4.0 (2019-09-10)

    • Many thanks to Martin Popel for all the changes below!
    • Added evaluation on concatenated test sets (e.g., -t wmt17,wmt18). Works as long as they all have the same language pair.
    • Added sacrebleu --origlang (both for evaluation on a subset and for --echo). Note that while echoing prints just the subset, evaluation expects the complete test set (and just skips the irrelevant parts).
    • Added sacrebleu --detail for breakdown by domain-specific subsets of the test sets. (Available for WMT19).
    • Minor changes
      • Improved display of sacrebleu -h
      • Added sacrebleu --list
      • Code refactoring
      • Documentation and tests updates
      • Fixed a race condition bug (os.makedirs(outdir, exist_ok=True) instead of if os.path.exists)
  • 1.3.7 (2019-07-12)

    • Lazy loading of regexes cuts import time from ~1s to nearly nothing (thanks, @louismartin!)
    • Added a simple (non-atomic) lock on downloading
    • Can now read multiple refs from a single tab-delimited file. You need to pass --num-refs N to tell it to run the split. Only works with a single reference file passed from the command line.
  • 1.3.6 (2019-06-10)

    • Removed another f-string for Python 3.5 compatibility
  • 1.3.5 (2019-06-07)

    • Restored Python 3.5 compatibility
  • 1.3.4 (2019-05-28)

    • Added MTNT 2019 test sets
    • Added a BLEU object
  • 1.3.3 (2019-05-08)

    • Added WMT'19 test sets
  • 1.3.2 (2018-04-24)

    • Bugfix in test case (thanks to Adam Roberts, @adarob)
    • Passing smoothing method through sentence_bleu
  • 1.3.1 (2019-03-20)

    • Added another smoothing approach (add-k) and a command-line option for choosing the smoothing method (--smooth exp|floor|add-n|none) and the associated value (--smooth-value), when relevant.
    • Changed interface to some functions (backwards incompatible)
      • 'smooth' is now 'smooth_method'
      • 'smooth_floor' is now 'smooth_value'
  • 1.2.21 (19 March 2019)

    • Ctrl-M characters are now treated as normal characters, previously treated as newline.
  • 1.2.20 (28 February 2018)

    • Tokenization now defaults to "zh" when language pair is known
  • 1.2.19 (19 February 2019)

    • Updated checksum for wmt19/dev (seems to have changed)
  • 1.2.18 (19 February 2019)

    • Fixed checksum for wmt17/dev (copy-paste error)
  • 1.2.17 (6 February 2019)

    • Added kk-en and en-kk to wmt19/dev
  • 1.2.16 (4 February 2019)

    • Added gu-en and en-gu to wmt19/dev
  • 1.2.15 (30 January 2019)

    • Added MD5 checksumming of downloaded files for all datasets.
  • 1.2.14 (22 January 2019)

    • Added mtnt1.1/train mtnt1.1/valid mtnt1.1/test data from MTNT
  • 1.2.13 (22 January 2019)

    • Added 'wmt19/dev' task for 'lt-en' and 'en-lt' (development data for new tasks).
    • Added MD5 checksum for downloaded tarballs.
  • 1.2.12 (8 November 2018)

    • Now outputs only only digit after the decimal
  • 1.2.11 (29 August 2018)

    • Added a function for sentence-level, smoothed BLEU
  • 1.2.10 (23 May 2018)

    • Added wmt18 test set (with references)
  • 1.2.9 (15 May 2018)

    • Added zh-en, en-zh, tr-en, and en-tr datasets for wmt18/test-ts
  • 1.2.8 (14 May 2018)

    • Added wmt18/test-ts, the test sources (only) for WMT18
    • Moved README out of sacrebleu.py and the CHANGELOG into a separate file
  • 1.2.7 (10 April 2018)

    • fixed another locale issue (with --echo)
    • grudgingly enabled -tok none from the command line
  • 1.2.6 (22 March 2018)

    • added wmt17/ms (Microsoft's additional ZH-EN references). Try sacrebleu -t wmt17/ms --cite.
    • --echo ref now pastes together all references, if there is more than one
  • 1.2.5 (13 March 2018)

    • added wmt18/dev datasets (en-et and et-en)
    • fixed logic with --force
    • locale-independent installation
    • added "--echo both" (tab-delimited)
  • 1.2.3 (28 January 2018)

    • metrics (-m) are now printed in the order requested
    • chrF now prints a version string (including the beta parameter, importantly)
    • attempt to remove dependence on locale setting
  • 1.2 (17 January 2018)

    • added the chrF metric (-m chrf or -m bleu chrf for both) See 'CHRF: character n-gram F-score for automatic MT evaluation' by Maja Popovic (WMT 2015) [http://www.statmt.org/wmt15/pdf/WMT49.pdf]
    • added IWSLT 2017 test and tuning sets for DE, FR, and ZH (Thanks to Mauro Cettolo and Marcello Federico).
    • added --cite to produce the citation for easy inclusion in papers
    • added --input (-i) to set input to a file instead of STDIN
    • removed accent mark after objection from UN official
  • 1.1.7 (27 November 2017)

    • corpus_bleu() now raises an exception if input streams are different lengths
    • thanks to Martin Popel for:
      • small bugfix in tokenization_13a (not affecting WMT references)
      • adding --tok intl (international tokenization)
    • added wmt17/dev and wmt17/dev sets (for languages intro'd those years)
  • 1.1.6 (15 November 2017)

    • bugfix for tokenization warning
  • 1.1.5 (12 November 2017)

    • added -b option (only output the BLEU score)
    • removed fi-en from list of WMT16/17 systems with more than one reference
    • added WMT16/tworefs and WMT17/tworefs for scoring with both en-fi references
  • 1.1.4 (10 November 2017)

    • added effective order for sentence-level BLEU computation
    • added unit tests from sockeye
  • 1.1.3 (8 November 2017).

    • Factored code a bit to facilitate API:
      • compute_bleu: works from raw stats
      • corpus_bleu for use from the command line
      • raw_corpus_bleu: turns off tokenization, command-line sanity checks, floor smoothing
    • Smoothing (type 'exp', now the default) fixed to produce mteval-v13a.pl results
    • Added 'floor' smoothing (adds 0.01 to 0 counts, more versatile via API), 'none' smoothing (via API)
    • Small bugfixes, windows compatibility (H/T Christian Federmann)
  • 1.0.3 (4 November 2017).

    • Contributions from Christian Federmann:
      • Added explicit support for encoding
      • Fixed Windows support
      • Bugfix in handling reference length with multiple refs
  • version 1.0.1 (1 November 2017).

    • Small bugfix affecting some versions of Python.
    • Code reformatting due to Ozan Çağlayan.
  • version 1.0 (23 October 2017).

    • Support for WMT 2008--2017.
    • Single tokenization (v13a) with lowercase fix (proper lower() instead of just A-Z).
    • Chinese tokenization.
    • Tested to match all WMT17 scores on all arcs.