14 Nov 12:08

devrimcavusoglu

2cf2de8

v2.1.2

What's Changed

Bug fix: bleu returning same score with different max_order is fixed. by @devrimcavusoglu in #59
nltk version upgraded as >=3.6.4 (from >=3.6.2). by @devrimcavusoglu in #61

Full Changelog: 2.1.1...2.1.2

Contributors

devrimcavusoglu

Assets 2

10 Nov 12:53

devrimcavusoglu

2.1.1

c6db3e8

v2.1.1

What's Changed

Seqeval: json normalization added. by @devrimcavusoglu in #55
Read support from folders by @devrimcavusoglu in #57

Full Changelog: 2.1.0...2.1.1

Contributors

devrimcavusoglu

Assets 2

25 Oct 15:10

devrimcavusoglu

2.1.0

8801eb1

v2.1.0

What's New 🚀

Tasks 📝

We added task based new metric system which allows us to evaluate different type of inputs rather than old system which could only evaluate from strings (generated text) for only language generation tasks. Hence, jury now is able to support broader set of metrics works with different types of input.

With this, on jury.Jury API, the consistency of set of tasks given is under control. Jury will raise an error if any pair of metrics are not consistent with each other in terms of task (evaluation input).

AutoMetric ✨

AutoMetric is introduced as a main factory class for automatically loading metrics, as a side note load_metric is still available for backward compatibility and is preferred (it uses AutoMetric under the hood).
Tasks are now distinguished within metrics. For example, precision can be used for language-generation or sequence-classification task, where one evaluates from string (generated text) while other one evaluates from integers (class labels).
On configuration file, metrics can be now stated with HuggingFace's datasets' metrics initializiation parameters. The keyword arguments for metrics that are used on computation are now separated in "compute_kwargs" key.

Full Changelog: 2.0.0...2.1.0

Assets 2

11 Oct 17:47

devrimcavusoglu

2.0.0

bca6ca4

v2.0.0

Jury 2.0.0 is out 🎉🥳

New Metric System

datasets package Metric implementation is adopted (and extended) to provide high performance 💯 and more unified interface 🤗.
Custom metric implementation changed accordingly (it now requires 3 abstract methods to be implemented).
Jury class is now callable (implements call() method to be used thoroughly) though evaluate() method is still available for backward compatibility.
In the usage of evaluate of Jury, predictions and references parameters are restricted to be passed as keyword arguments to prevent confusion/wrong computations (like datasets' metrics).
MetricCollator is removed, the methods for metrics are attached directly to Jury class. Now, metric addition and removal can be performed from a Jury instance directly.
Jury now supports reading metrics from string, list and dictionaries. It is more generic to input type of metrics given along with parameters.

New metrics

Accuracy, F1, Precision, Recall are added to Jury metrics.
All metrics on datasets package are still available on jury through the use of jury.load_metric()

Development

Test cases are improved with fixtures, and test structure is enchanced.
Expected outputs are now required for tests as a json with proper name.

Assets 2

15 Sep 13:26

devrimcavusoglu

1.1.2

6115bf9

v1.1.2

SQuAD bug fixed for evaluating with multiple references.
Test design & cases revised with fixtures (improvement).

Assets 2

15 Aug 10:54

devrimcavusoglu

1.1.1

490d0fd

v1.1.1

Malfunctioning multiple prediction calculation caused by multiple reference input for BLEU and SacreBLEU is fixed.
CLI Implementation is completed. 🎉

Assets 2

13 Aug 17:22

devrimcavusoglu

1.0.1

046ff65

v1.0.1

Fix for nltk version (Colab is fixed as well).

Assets 2

09 Aug 16:29

devrimcavusoglu

1.0.0

df18a9f

v1.0.0

Release Notes

New metric structure is completed.
- Custom metric support is improved and no longer required to extend datasets.Metric, rather uses jury.metrics.Metric.
- Metric usage is unified with compute, preprocess and postprocess functions, which the only required implementation for custom metric is compute.
- Both string and Metric objects can be passed to Jury(metrics=metrics) now in a mixed fashion.
- load_metric function was rearranged to capture end score results and several metrics added accordingly (e.g. load_metric("squad_f1") will load squad metric which returns F1-score).
Example notebook has added to example.
- MT and QA tasks were illustrated.
- Custom metric creation added as example.

Acknowledgments

@fcakyon @cemilcengiz @devrimcavusoglu

Contributors

cemilcengiz, fcakyon, and devrimcavusoglu

Assets 2

28 Jul 21:21

devrimcavusoglu

0.0.6

f06bba7

v0.0.6

Release v0.0.6 (#28)

Co-authored-by: cemilcengiz <cemil.cengiz94@gmail.com>

Assets 2

27 Jul 10:38

devrimcavusoglu

0.0.5

6e85666

v0.0.5

Jury moved. black added. (#17)

* requirements.txt updated (tqdm version loosen).

* requirements.txt updated. Removed packages that `datasets` depends on.

* Jury moved under __init__.py. black package added to requirements-dev.txt.

* v0.0.5

* requirements.txt updated. Removed packages that `datasets` depends on.

* Jury moved under __init__.py. black package added to requirements-dev.txt.

* v0.0.5

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's New 🚀

Tasks 📝

AutoMetric ✨

Jury 2.0.0 is out 🎉🥳

New Metric System

New metrics

Development

Release Notes

Acknowledgments

Contributors

Releases: obss/jury

v2.1.2

What's Changed

Contributors

v2.1.1

What's Changed

Contributors

v2.1.0

What's New 🚀

Tasks 📝

AutoMetric ✨

v2.0.0

Jury 2.0.0 is out 🎉🥳

New Metric System

New metrics

Development

v1.1.2

v1.1.1

v1.0.1

v1.0.0

Release Notes

Acknowledgments

Contributors

v0.0.6

v0.0.5