Skip to content

v2.1.0

Compare
Choose a tag to compare
@devrimcavusoglu devrimcavusoglu released this 25 Oct 15:10
· 61 commits to main since this release
8801eb1

What's New πŸš€

Tasks πŸ“

We added task based new metric system which allows us to evaluate different type of inputs rather than old system which could only evaluate from strings (generated text) for only language generation tasks. Hence, jury now is able to support broader set of metrics works with different types of input.

With this, on jury.Jury API, the consistency of set of tasks given is under control. Jury will raise an error if any pair of metrics are not consistent with each other in terms of task (evaluation input).

AutoMetric ✨

  • AutoMetric is introduced as a main factory class for automatically loading metrics, as a side note load_metric is still available for backward compatibility and is preferred (it uses AutoMetric under the hood).
  • Tasks are now distinguished within metrics. For example, precision can be used for language-generation or sequence-classification task, where one evaluates from string (generated text) while other one evaluates from integers (class labels).
  • On configuration file, metrics can be now stated with HuggingFace's datasets' metrics initializiation parameters. The keyword arguments for metrics that are used on computation are now separated in "compute_kwargs" key.

Full Changelog: 2.0.0...2.1.0