diff --git a/data/xml/2023.genbench.xml b/data/xml/2023.genbench.xml index 6fa91433cf..2ceca54d61 100644 --- a/data/xml/2023.genbench.xml +++ b/data/xml/2023.genbench.xml @@ -151,7 +151,7 @@ AmélieReymondUniversity of Washington ShaneSteinert-ThrelkeldUniversity of Washington 143-151 - Language models achieve remarkable results on a variety of tasks, yet still struggle on compositional generalisation benchmarks. The majority of these benchmarks evaluate performance in English only, leaving us with the question of whether these results generalise to other languages. As an initial step to answering this question, we introduce mSCAN, a multilingual adaptation of the SCAN dataset. It was produced by a rule-based translation, developed in cooperation with native speakers. We then showcase this novel dataset on some in-context learning experiments, and GPT3.5 and the multilingual large language model BLOOM + Language models achieve remarkable results on a variety of tasks, yet still struggle on compositional generalisation benchmarks. The majority of these benchmarks evaluate performance in English only, leaving us with the question of whether these results generalise to other languages. As an initial step to answering this question, we introduce mSCAN, a multilingual adaptation of the SCAN dataset. It was produced by a rule-based translation, developed in cooperation with native speakers. We then showcase this novel dataset on some in-context learning experiments, and GPT3.5 and the multilingual large language model BLOOM as well as gpt3.5-turbo. 2023.genbench-1.11 reymond-steinert-threlkeld-2023-mscan 10.18653/v1/2023.genbench-1.11