v2.1.0
What's New π
Tasks π
We added task based new metric system which allows us to evaluate different type of inputs rather than old system which could only evaluate from strings (generated text) for only language generation tasks. Hence, jury now is able to support broader set of metrics works with different types of input.
With this, on jury.Jury
API, the consistency of set of tasks given is under control. Jury will raise an error if any pair of metrics are not consistent with each other in terms of task (evaluation input).
AutoMetric β¨
- AutoMetric is introduced as a main factory class for automatically loading metrics, as a side note
load_metric
is still available for backward compatibility and is preferred (it uses AutoMetric under the hood). - Tasks are now distinguished within metrics. For example, precision can be used for
language-generation
orsequence-classification
task, where one evaluates from string (generated text) while other one evaluates from integers (class labels). - On configuration file, metrics can be now stated with HuggingFace's datasets' metrics initializiation parameters. The keyword arguments for metrics that are used on computation are now separated in
"compute_kwargs"
key.
Full Changelog: 2.0.0...2.1.0