Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: Added way more training dataset annotations (#1765)
* fix: Leaderboard: `K` instead of `M` Fixes #1752 * format * fixed existing annotations to refer to task name instead of hf dataset * added annotation to nvidia * added voyage * added uae annotations * Added stella annotations * sentence trf models * added salesforce and e5 * jina * bge + model2vec * added llm2vec annotations * add jasper * format * format * Updated annotations and moved jina models * fix: add even more training dataset annotations (#1793) * fix: update max tokens for OpenAI (#1772) update max tokens * ci: skip AfriSentiLID for now (#1785) * skip AfriSentiLID for now * skip relevant test case instead --------- Co-authored-by: Isaac Chung <isaac.chung@team.wrike.com> * 1.28.7 Automatically generated by python-semantic-release * ci: fix model loading test (#1775) * pass base branch into the make command as an arg * test a file that has custom wrapper * what about overview * just dont check overview * revert instance check * explicitly omit overview and init * remove test change * try on a lot of models * revert test model file --------- Co-authored-by: Isaac Chung <isaac.chung@team.wrike.com> * feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787) * feat: Update task filtering, fixing bug on MTEB - Updated task filtering adding exclusive_language_filter and hf_subset - fix bug in MTEB where cross-lingual splits were included - added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta) The following code outlines the problems: ```py import mteb from mteb.benchmarks import MTEB_ENG_CLASSIC task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0] # was eq. to: task = mteb.get_task("STS22", languages=["eng"]) task.hf_subsets # correct filtering to English datasets: # ['en', 'de-en', 'es-en', 'pl-en', 'zh-en'] # However it should be: # ['en'] # with the changes it is: task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0] task.hf_subsets # ['en'] # eq. to task = mteb.get_task("STS22", hf_subsets=["en"]) # which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits): task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True) ``` * format * remove "en-ext" from AmazonCounterfactualClassification * fixed mteb(deu) * fix: simplify in a few areas * fix: Add gritlm * 1.29.0 Automatically generated by python-semantic-release * fix: Added more annotations! * fix: Added C-MTEB (#1786) Added C-MTEB * 1.29.1 Automatically generated by python-semantic-release * docs: Add contact to MMTEB benchmarks (#1796) * Add myself to MMTEB benchmarks * lint * fix: loading pre 11 (#1798) * fix loading pre 11 * add similarity * lint * run all task types * 1.29.2 Automatically generated by python-semantic-release * fix: allow to load no revision available (#1801) * fix allow to load no revision available * lint * add require_model_meta to leaderboard * lint * 1.29.3 Automatically generated by python-semantic-release --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Isaac Chung <isaac.chung@team.wrike.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Isaac Chung <isaac.chung@team.wrike.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com>
- Loading branch information