Skip to content

Latest commit

 

History

History
99 lines (85 loc) · 30.2 KB

File metadata and controls

99 lines (85 loc) · 30.2 KB
logo

Awesome Multilingual Large Language Models

A comprehensive collection of multilingual datasets and large language models, meticulously curated for evaluating and enhancing the performance of large language models across diverse languages and tasks.

contributors last update forks stars open issues


Datasets

Dataset Year Languages GitHub Download
Star
OMGEval : An Open Multilingual Generative Evaluation Benchmark for Large Language Models
2024 Chinese (zh) (🇨🇳), Russian (ru) (🇷🇺), French (fr) (🇫🇷), Spanish (es) (🇪🇸), Arabic (ar) (🇸🇦) Github Data
Star
MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property
2024 Chinese (zh) (🇨🇳), English (en) (🇬🇧), German (de) (🇩🇪), Japanese (ja) (🇯🇵), French (fr) (🇫🇷), Korean (ko) (🇰🇷), Russian (ru) (🇷🇺), Spanish (es) (🇪🇸), Portuguese (pt) (🇵🇹), Catalan (ca) (🇦🇩) Github Data
Star
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
2024 English (en) (🇬🇧), Chinese (zh) (🇨🇳), Japanese (ja) (🇯🇵), French (fr) (🇫🇷), German (de) (🇩🇪) Github Data
Star
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
2023 English (🇺🇸), Chinese (🇨🇳), Italian (🇮🇹), Portuguese (🇧🇷), Vietnamese (🇻🇳), Thai (🇹🇭), Swahili (🇰🇪), Afrikaans (🇿🇦), Javanese (🇮🇩) Github Data
Star
Language models are multilingual chain-of-thought reasoners
2023 Bengali (🇧🇩), Chinese (🇨🇳), French (🇫🇷), German (🇩🇪), Japanese (🇯🇵), Russian (🇷🇺), Spanish (🇪🇸), Swahili (🇰🇪), Telugu (🇮🇳), Thai (🇹🇭) Github Data
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
2023 English [🇬🇧], Russian [🇷🇺], Spanish [🇪🇸], German [🇩🇪], French [🇫🇷], Chinese [🇨🇳], Italian [🇮🇹], Portuguese [🇵🇹], Polish [🇵🇱], Japanese [🇯🇵], Vietnamese [🇻🇳], Dutch [🇳🇱], Arabic [🇸🇦], Turkish [🇹🇷], Czech [🇨🇿], Persian [🇮🇷], Hungarian [🇭🇺], Greek [🇬🇷], Romanian [🇷🇴], Swedish [🇸🇪], Ukrainian [🇺🇦], Finnish [🇫🇮], Korean [🇰🇷], Danish [🇩🇰], Bulgarian [🇧🇬], Norwegian [🇳🇴], Hindi [🇮🇳], Slovak [🇸🇰], Thai [🇹🇭], Lithuanian [🇱🇹], Catalan [🇪🇸], Indonesian [🇮🇩], Bangla [🇧🇩], Estonian [🇪🇪], Slovenian [🇸🇮], Latvian [🇱🇻], Hebrew [🇮🇱], Serbian [🇷🇸], Tamil [🇮🇳], Albanian [🇦🇱], Azerbaijani [🇦🇿] 🤗 Data
Star
Language models are multilingual chain-of-thought reasoners
2023 Bengali (🇧🇩), Chinese (🇨🇳), French (🇫🇷), German (🇩🇪), Japanese (🇯🇵), Russian (🇷🇺), Spanish (🇪🇸), Swahili (🇰🇪), Telugu (🇮🇳), Thai (🇹🇭) Github Data
Wiki-40B: Multilingual Language Model Dataset 2020 English (🇺🇸), German (🇩🇪), French (🇫🇷), Russian (🇷🇺), Spanish (🇪🇸), Italian (🇮🇹), Japanese (🇯🇵), Chinese Simplified (🇨🇳), Chinese Traditional (🇹🇼), Polish (🇵🇱), Ukrainian (🇺🇦), Dutch (🇳🇱), Swedish (🇸🇪), Portuguese (🇵🇹), Serbian (🇷🇸), Hungarian (🇭🇺), Catalan (🇪🇸), Czech (🇨🇿), Finnish (🇫🇮), Arabic (🇸🇦), Korean (🇰🇷), Persian (🇮🇷), Norwegian (🇳🇴), Vietnamese (🇻🇳), Hebrew (🇮🇱), Indonesian (🇮🇩), Romanian (🇷🇴), Turkish (🇹🇷), Bulgarian (🇧🇬), Estonian (🇪🇪), Malay (🇲🇾), Danish (🇩🇰), Slovak (🇸🇰), Croatian (🇭🇷), Greek (🇬🇷), Lithuanian (🇱🇹), Slovenian (🇸🇮), Thai (🇹🇭), Hindi (🇮🇳), Latvian (🇱🇻), Filipino (🇵🇭) 👁️ Data
Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning 2021 English (🇺🇸), German (🇩🇪), French (🇫🇷), Russian (🇷🇺), Spanish (🇪🇸), Hindi (🇮🇳), Vietnamese (🇻🇳), Bulgarian (🇧🇬), Chinese (🇨🇳), Dutch (🇳🇱), Italian (🇮🇹), Japanese (🇯🇵), Polish (🇵🇱), Portuguese (🇵🇹), Arabic (🇸🇦), Swahili (🇹🇿), Urdu (🇵🇰) GitHub Data
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset 2022 Akan (🇬🇭), Arabic (🇸🇦), Assamese (🇮🇳), Bambara (🇲🇱), Basque (🇪🇸), Bengali (🇧🇩), Catalan (🇪🇸), Chichewa (🇲🇼), chiShona (🇿🇼), Chitumbuka (🇲🇼), English (🇬🇧), Fon (🇧🇯), French (🇫🇷), Gujarati (🇮🇳), Hindi (🇮🇳), Igbo (🇳🇬), Indonesian (🇮🇩), isiXhosa (🇿🇦), isiZulu (🇿🇦), Kannada (🇮🇳), Kikuyu (🇰🇪), Kinyarwanda (🇷🇼), Kirundi (🇧🇮), Lingala (🇨🇩), Luganda (🇺🇬), Malayalam (🇮🇳), Marathi (🇮🇳), Nepali (🇳🇵), Northern Sotho (🇿🇦), Odia (🇮🇳), Portuguese (🇵🇹), Punjabi (🇮🇳), Sesotho (🇱🇸), Setswana (🇧🇼), Simplified Chinese (🇨🇳), Spanish (🇪🇸), Swahili (🇰🇪), Tamil (🇮🇳), Telugu (🇮🇳), Traditional Chinese (🇹🇼), Twi (🇬🇭), Urdu (🇵🇰), Vietnamese (🇻🇳), Wolof (🇸🇳), Xitsonga (🇿🇦), Yoruba (🇳🇬), Programming Languages (💻) GitHub Data
GEOMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models 2022 English (🇺🇸), Chinese (🇨🇳), Hindi (🇮🇳), Persian (🇮🇷), Swahili (🇰🇪) GitHub 🔍

Models

Title Year Languages Code Demo
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
2024 Afrikaans [🇿🇦], Amharic [🇪🇹], Arabic [🇸🇦], Azerbaijani [🇦🇿], Belarusian [🇧🇾], Bengali [🇧🇩], Bulgarian [🇧🇬], Catalan [🇪🇸], Cebuano [🇵🇭], Czech [🇨🇿], Welsh [🏴], Danish [🇩🇰], German [🇩🇪], Greek [🇬🇷], English [🇬🇧], Esperanto [🇪🇸], Estonian [🇪🇪], Basque [🇪🇸], Finnish [🇫🇮], Tagalog [🇵🇭], French [🇫🇷], Western Frisian [🇳🇱], Scottish Gaelic [🏴], Irish [🇮🇪], Galician [🇪🇸], Gujarati [🇮🇳], Haitian Creole [🇭🇹], Hausa [🇳🇪], Hebrew [🇮🇱], Hindi [🇮🇳], Hungarian [🇭🇺], Armenian [🇦🇲], Igbo [🇳🇬], Indonesian [🇮🇩], Icelandic [🇮🇸], Italian [🇮🇹], Javanese [🇮🇩], Japanese [🇯🇵], Kannada [🇮🇳], Georgian [🇬🇪], Kazakh [🇰🇿], Khmer [🇰🇭], Kyrgyz [🇰🇬], Korean [🇰🇷], Kurdish [🇹🇷], Lao [🇱🇦], Latvian [🇱🇻], Latin [🇻🇦], Lithuanian [🇱🇹], Luxembourgish [🇱🇺], Malayalam [🇮🇳], Marathi [🇮🇳], Macedonian [🇲🇰], Malagasy [🇲🇬], Maltese [🇲🇹], Mongolian [🇲🇳], Maori [🇳🇿], Malay [🇲🇾], Burmese [🇲🇲], Nepali [🇳🇵], Dutch [🇳🇱], Norwegian [🇳🇴], Northern Sotho [🇿🇦], Chichewa [🇲🇼], Oriya [🇮🇳], Punjabi [🇮🇳], Persian [🇮🇷], Polish [🇵🇱], Portuguese [🇵🇹], Pashto [🇦🇫], Romanian [🇷🇴], Russian [🇷🇺], Sinhala [🇱🇰], Slovak [🇸🇰], Slovenian [🇸🇮], Samoan [🇼🇸], Shona [🇿🇼], Sindhi [🇵🇰], Somali [🇸🇴], Southern Sotho [🇱🇸], Spanish [🇪🇸], Albanian [🇦🇱], Serbian [🇷🇸], Sundanese [🇮🇩], Swahili [🇰🇪], Swedish [🇸🇪], Tamil [🇮🇳], Telugu [🇮🇳], Tajik [🇹🇯], Thai [🇹🇭], Turkish [🇹🇷], Twi [🇬🇭], Ukrainian [🇺🇦], Urdu [🇵🇰], Uzbek [🇺🇿], Vietnamese [🇻🇳], Xhosa [🇿🇦], Yiddish [🇮🇱], Yoruba [🇳🇬], Chinese [🇨🇳], Zulu [🇿🇦] Source 🤗
Star
LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision
2024 Arabic (ar) (🇸🇦), Bengali (bn) (🇧🇩), Chinese (zh) (🇨🇳), Danish (da) (🇩🇰), Dutch (nl) (🇳🇱), English (en) (🇬🇧), French (fr) (🇫🇷), German (de) (🇩🇪), Hindi (hi) (🇮🇳), Japanese (ja) (🇯🇵), Korean (ko) (🇰🇷), Marathi (mr) (🇮🇳), Punjabi (pa) (🇮🇳), Russian (ru) (🇷🇺), Spanish (es) (🇪🇸), Swahili (sw) (🇰🇪), Telugu (te) (🇮🇳), Turkish (tr) (🇹🇷), Urdu (ur) (🇵🇰) Github 🤗
Star
Orion-14B: Open-source Multilingual Large Language Models
2024 English [🇬🇧], Chinese [🇨🇳], Japanese [🇯🇵], Korean [🇰🇷], Spanish [🇪🇸], French [🇫🇷], German [🇩🇪], Arabic [🇸🇦] Github 🤗
Star
Baichuan 2: Open Large-scale Language Models
2023 Arabic (ar) (🇸🇦), Chinese (zh) (🇨🇳), English (en) (🇬🇧), French (fr) (🇫🇷), Russian (ru) (🇷🇺), Spanish (es) (🇪🇸), German (de) (🇩🇪), Japanese (ja) (🇯🇵) Github 🤗
Star
Larger-Scale Transformers for Multilingual Masked Language Modeling
2021 Afrikaans (🇿🇦), Albanian (🇦🇱), Amharic (🇪🇹), Arabic (🇸🇦), Armenian (🇦🇲), Assamese (🇮🇳), Azerbaijani (🇦🇿), Basque (🇪🇸), Belarusian (🇧🇾), Bengali (🇧🇩), Bengali Romanize (🇧🇩), Bosnian (🇧🇦), Breton (🏴), Bulgarian (🇧🇬), Burmese (🇲🇲), Burmese zawgyi font (🇲🇲), Catalan (🇪🇸), Chinese (Simplified) (🇨🇳), Chinese (Traditional) (🇹🇼), Croatian (🇭🇷), Czech (🇨🇿), Danish (🇩🇰), Dutch (🇳🇱), English (🇬🇧), Esperanto (🏴), Estonian (🇪🇪), Filipino (🇵🇭), Finnish (🇫🇮), French (🇫🇷), Galician (🇪🇸), Georgian (🇬🇪), German (🇩🇪), Greek (🇬🇷), Gujarati (🇮🇳), Hausa (🇳🇬), Hebrew (🇮🇱), Hindi (🇮🇳), Hindi Romanize (🇮🇳), Hungarian (🇭🇺), Icelandic (🇮🇸), Indonesian (🇮🇩), Irish (🇮🇪), Italian (🇮🇹), Japanese (🇯🇵), Javanese (🇮🇩), Kannada (🇮🇳), Kazakh (🇰🇿), Khmer (🇰🇭), Korean (🇰🇷), Kurdish (Kurmanji) (🇹🇷), Kyrgyz (🇰🇬), Lao (🇱🇦), Latin (🏛️), Latvian (🇱🇻), Lithuanian (🇱🇹), Macedonian (🇲🇰), Malagasy (🇲🇬), Malay (🇲🇾), Malayalam (🇮🇳), Marathi (🇮🇳), Mongolian (🇲🇳), Nepali (🇳🇵), Norwegian (🇳🇴), Oriya (🇮🇳), Oromo (🇪🇹), Pashto (🇦🇫), Persian (🇮🇷), Polish (🇵🇱), Portuguese (🇵🇹), Punjabi (🇮🇳), Romanian (🇷🇴), Russian (🇷🇺), Sanskrit (🇮🇳), Scottish Gaelic (🏴), Serbian (🇷🇸), Sindhi (🇵🇰), Sinhala (🇱🇰), Slovak (🇸🇰), Slovenian (🇸🇮), Somali (🇸🇴), Spanish (🇪🇸), Sundanese (🇮🇩), Swahili (🇰🇪), Swedish (🇸🇪), Tamil (🇮🇳), Tamil Romanize (🇮🇳), Telugu (🇮🇳), Telugu Romanize (🇮🇳), Thai (🇹🇭), Turkish (🇹🇷), Ukrainian (🇺🇦), Urdu (🇵🇰), Urdu Romanize (🇵🇰), Uyghur (🇨🇳), Uzbek (🇺🇿), Vietnamese (🇻🇳), Welsh (🏴), Western Frisian (🇳🇱), Xhosa (🇿🇦), Yiddish (🇮🇱) Github 🔍
Star
InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities
2023 English (🇺🇸), Chinese (🇨🇳) Github 🔍
PolyLM: An Open Source Polyglot Large Language Model
2023 English (EN) [🇬🇧], Chinese (ZH) [🇨🇳], Russian (RU) [🇷🇺], Spanish (ES) [🇪🇸], German (DE) [🇩🇪], French (FR) [🇫🇷], Italian (IT) [🇮🇹], Portuguese (PT) [🇵🇹], Japanese (JA) [🇯🇵], Vietnamese (VI) [🇻🇳], Indonesian (ID) [🇮🇩], Polish (PL) [🇵🇱], Dutch (NL) [🇳🇱], Arabic (AR) [🇦🇪], Turkish (TR) [🇹🇷], Thai (TH) [🇹🇭], Hebrew (HE) [🇮🇱], Korean (KO) [🇰🇷] Model 🔍
Star
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2023 Akan (🇬🇭), Arabic (🇸🇦), Assamese (🇮🇳), Bambara (🇲🇱), Basque (🇪🇸), Bengali (🇧🇩), Catalan (🇪🇸), Chichewa (🇲🇼), chiShona (🇿🇼), Chitumbuka (🇲🇼), English (🇬🇧), Fon (🇧🇯), French (🇫🇷), Gujarati (🇮🇳), Hindi (🇮🇳), Igbo (🇳🇬), Indonesian (🇮🇩), isiXhosa (🇿🇦), isiZulu (🇿🇦), Kannada (🇮🇳), Kikuyu (🇰🇪), Kinyarwanda (🇷🇼), Kirundi (🇧🇮), Lingala (🇨🇩), Luganda (🇺🇬), Malayalam (🇮🇳), Marathi (🇮🇳), Nepali (🇳🇵), Northern Sotho (🇿🇦), Odia (🇮🇳), Portuguese (🇵🇹), Punjabi (🇮🇳), Sesotho (🇱🇸), Setswana (🇧🇼), Simplified Chinese (🇨🇳), Spanish (🇪🇸), Swahili (🇰🇪), Tamil (🇮🇳), Telugu (🇮🇳), Traditional Chinese (🇹🇼), Twi (🇬🇭), Urdu (🇵🇰), Vietnamese (🇻🇳), Wolof (🇸🇳), Xitsonga (🇿🇦), Yoruba (🇳🇬), Programming Languages (💻) Github 🤗
Star
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
2023 hbs_Latn (🇭🇷), mal_Mlym (🇮🇳), aze_Latn (🇦🇿), guj_Gujr (🇮🇳), ben_Beng (🇮🇳), kan_Knda (🇮🇳), tel_Telu (🇮🇳), mlt_Latn (🇲🇹), fra_Latn (🇫🇷), spa_Latn (🇪🇸), eng_Latn (🇬🇧), fil_Latn (🇵🇭), nob_Latn (🇳🇴), rus_Cyrl (🇷🇺), deu_Latn (🇩🇪), tur_Latn (🇹🇷), pan_Guru (🇮🇳), mar_Deva (🇮🇳), por_Latn (🇵🇹), nld_Latn (🇳🇱), ara_Arab (🇸🇦), zho_Hani (🇨🇳), ita_Latn (🇮🇹), ind_Latn (🇮🇩), ell_Grek (🇬🇷), bul_Cyrl (🇧🇬), swe_Latn (🇸🇪), ces_Latn (🇨🇿), isl_Latn (🇮🇸), pol_Latn (🇵🇱), ron_Latn (🇷🇴), dan_Latn (🇩🇰), hun_Latn (🇭🇺), tgk_Cyrl (🇹🇯), srp_Latn (🇷🇸), fas_Arab (🇮🇷), ceb_Latn (🇵🇭), heb_Hebr (🇮🇱), hrv_Latn (🇭🇷), glg_Latn (🇪🇸), fin_Latn (🇫🇮), slv_Latn (🇸🇮), vie_Latn (🇻🇳), mkd_Cyrl (🇲🇰), slk_Latn (🇸🇰), nor_Latn (🇳🇴), est_Latn (🇪🇪), ltz_Latn (🇱🇺), eus_Latn (🇪🇸), lit_Latn (🇱🇹), kaz_Cyrl (🇰🇿), lav_Latn (🇱🇻), bos_Latn (🇧🇦), epo_Latn (🇺🇸), cat_Latn (🇪🇸), tha_Thai (🇹🇭), ukr_Cyrl (🇺🇦), tgl_Latn (🇵🇭), sin_Sinh (🇱🇰), gle_Latn (🇮🇪), hin_Deva (🇮🇳), kor_Hang (🇰🇷), ory_Orya (🇮🇳), urd_Arab (🇵🇰), swa_Latn (🇰🇪), sqi_Latn (🇦🇱), bel_Cyrl (🇧🇾), afr_Latn (🇿🇦), nno_Latn (🇳🇴), tat_Cyrl (🇷🇺), asm_Beng (🇮🇳), hil_Latn (🇵🇭), nso_Latn (🇿🇦), ibo_Latn (🇳🇬), kin_Latn (🇷🇼), tpi_Latn (🇵🇬), twi_Latn (🇬🇭), kir_Cyrl (🇰🇬), nep_Deva (🇳🇵), azj_Latn (🇦🇿), bcl_Latn (🇵🇭), xho_Latn (🇿🇦), cym_Latn (🏴), gaa_Latn (🇬🇭), ton_Latn (🇹🇴), tah_Latn (🇵🇫), lat_Latn (🇻🇦), srn_Latn (🇸🇷), ewe_Latn (🇬🇭), bem_Latn (🇿🇲), orm_Latn (🇪🇹), haw_Latn (🇺🇸), hmo_Latn (🇵🇬), kat_Geor (🇬🇪), pag_Latn (🇵🇭), loz_Latn (🇿🇲), fry_Latn (🇳🇱), mya_Mymr (🇲🇲), nds_Latn (🇩🇪), run_Latn (🇧🇮), pnb_Arab (🇵🇰), rar_Latn (🇨🇰), fij_Latn (🇫🇯), wls_Latn (🇼🇸), ckb_Arab (🇮🇶), ven_Latn (🇿🇦), zsm_Latn (🇲🇾), chv_Cyrl (🇷🇺), lua_Latn (🇨🇩), que_Latn (🇵🇪), sag_Latn (🇨🇫), guw_Latn (🇬🇼), bre_Latn (🇫🇷), toi_Latn (🇨🇫), pus_Arab (🇦🇫), che_Cyrl (🇷🇺), pis_Latn (🇸🇧), kon_Latn (🇨🇩), oss_Cyrl (🇷🇺), hyw_Armn (🇦🇲), iso_Latn (🇻🇺), nan_Latn (🇹🇼), lub_Latn (🇨🇩), lim_Latn (🇳🇱), tuk_Latn (🇹🇲), tir_Ethi (🇪🇹), tgk_Latn (🇹🇯), yua_Latn (🇲🇽), min_Latn (🇮🇩), lue_Latn (🇨🇩), khm_Khmr (🇰🇭), tum_Latn (🇲🇼), tll_Latn (🇳🇦), ekk_Latn (🇪🇪), lug_Latn (🇺🇬), niu_Latn (🇳🇺), tzo_Latn (🇲🇽), mah_Latn (🇲🇭), tvl_Latn (🇹🇻), jav_Latn (🇮🇩), hau_Latn (🇳🇬), som_Latn (🇸🇴), uzb_Latn (🇺🇿), sot_Latn (🇿🇦), uzb_Cyrl (🇺🇿), cos_Latn (🇫🇷), als_Latn (🇦🇱), amh_Ethi (🇪🇹), sun_Latn (🇮🇩), war_Latn (🇵🇭), div_Thaa (🇲🇻), yor_Latn (🇳🇬), fao_Latn (🇫🇴), uzn_Cyrl (🇺🇿), smo_Latn (🇼🇸), bak_Cyrl (🇷🇺), ilo_Latn (🇵🇭), tso_Latn (🇿🇦), mri_Latn (🇳🇿), hmn_Latn (🇺🇸), nau_Latn (🇳🇷), asm_Beng (🇮🇳), hil_Latn (🇵🇭), nso_Latn (🇿🇦), ibo_Latn (🇳🇬), kin_Latn (🇷🇼), tpi_Latn (🇵🇬), twi_Latn (🇬🇭), kir_Cyrl (🇰🇬), pap_Latn (🇳🇱), aze_Latn (🇦🇿), qvi_Latn (🇵🇪), cak_Latn (🇬🇹), kbp_Latn (🇧🇫), kri_Latn (🇸🇱), mau_Latn (🇲🇽), scn_Latn (🇮🇹), tyv_Cyrl (🇷🇺), ina_Latn (🇧🇪), btx_Latn (🇮🇩), nch_Latn (🇲🇽), ncj_Latn (🇲🇽), pau_Latn (🇵🇼), toj_Latn (🇲🇽), pcm_Latn (🇳🇬), dyu_Latn (🇧🇫), kss_Latn (🇳🇬), afb_Arab (🇸🇦), urh_Latn (🇳🇬), quc_Latn (🇬🇹), new_Deva (🇳🇵), yao_Latn (🇲🇼), ngl_Latn (🇲🇿), nyu_Latn (🇲🇿), kab_Latn (🇩🇿), tuk_Cyrl (🇹🇲), xmf_Geor (🇬🇪), ndc_Latn (🇲🇿), san_Deva (🇮🇳), nba_Latn (🇳🇬), bpy_Beng (🇮🇳), ncx_Latn (🇲🇽), qug_Latn (🇵🇪), rmn_Latn (🇮🇳), cjk_Latn (🇬🇹), arb_Arab (🇸🇦), kea_Latn (🇨🇻), mck_Latn (🇨🇩), arn_Latn (🇨🇱), pdt_Latn (🇩🇪), her_Latn (🇳🇦), tlh_Latn (🇺🇸), suz_Deva (🇮🇳), kat_Geor (🇬🇪), kmr_Cyrl (🇷🇺), gcr_Latn (🇬🇵), jbo_Latn (🇺🇸), tbz_Latn (🇵🇼), bam_Latn (🇲🇱), prk_Latn (🇸🇮), jam_Latn (🇯🇲), twx_Latn (🇹🇼), sme_Latn (🇫🇮), gom_Latn (🇮🇳), bum_Latn (🇨🇲), mgr_Latn (🇲🇼), ahk_Latn (🇵🇰), kur_Arab (🇮🇶), bas_Latn (🇨🇲), bin_Latn (🇳🇬), tsz_Latn (🇲🇽), sid_Latn (🇪🇹), diq_Latn (🇹🇷), srd_Latn (🇮🇹), tcf_Latn (🇲🇽), bzj_Latn (🇮🇳), udm_Cyrl (🇷🇺), cce_Latn (🇨🇲), meu_Latn (🇨🇩), chw_Latn (🇨🇲), cbk_Latn (🇵🇭), ibg_Latn (🇮🇩), bhw_Latn (🇮🇩), ngu_Latn (🇲🇽), nyy_Latn (🇹🇿), szl_Latn (🇵🇱), ish_Latn (🇹🇿), naq_Latn (🇳🇦), toh_Latn (🇳🇿), ttj_Latn (🇰🇪), nse_Latn (🇳🇬), ami_Latn (🇹🇼), alz_Latn (🇸🇩), apc_Arab (🇸🇾), vls_Latn (🇳🇱), mhr_Cyrl (🇷🇺), djk_Latn (🇩🇪), prs_Arab (🇦🇫), san_Latn (🇮🇳), som_Arab (🇸🇴), uig_Latn (🇨🇳), hau_Arab (🇳🇬) Github 🔍
Star
Few-shot Learning with Multilingual Generative Language Models
2022 English (🇺🇸), Russian (🇷🇺), Chinese (🇨🇳), German (🇩🇪), Spanish (🇪🇸), French (🇫🇷), Japanese (🇯🇵), Italian (🇮🇹), Portuguese (🇵🇹), Greek (🇬🇷), Romanian (🇷🇴), Ukrainian (🇺🇦), Hungarian (🇭🇺), Korean (🇰🇷), Polish (🇵🇱), Norwegian (🇳🇴), Dutch (🇳🇱), Finnish (🇫🇮), Danish (🇩🇰), Indonesian (🇮🇩), Croatian (🇭🇷), Turkish (🇹🇷), Arabic (🇸🇦), Vietnamese (🇻🇳), Thai (🇹🇭), Bulgarian (🇧🇬), Persian (🇮🇷), Swedish (🇸🇪), Malay (🇲🇾), Hebrew (🇮🇱), Czech (🇨🇿), Slovak (🇸🇰), Catalan (🇪🇸), Lithuanian (🇱🇹), Slovene (🇸🇮), Hindi (🇮🇳), Estonian (🇪🇪), Latvian (🇱🇻), Tagalog (🇵🇭), Albanian (🇦🇱), Serbian (🇷🇸), Azerbaijani (🇦🇿), Bengali (🇧🇩), Tamil (🇮🇳), Urdu (🇵🇰), Kazakh (🇰🇿), Armenian (🇦🇲), Georgian (🇬🇪), Icelandic (🇮🇸), Belarusian (🇧🇾), Bosnian (🇧🇦), Malayalam (🇮🇳), Macedonian (🇲🇰), Swahili (🇹🇿), Afrikaans (🇿🇦), Telugu (🇮🇳), Arabic Romanized (🇸🇦), Mongolian (🇲🇳), Latin (🇮🇹), Nepali (🇳🇵), Sinhalese (🇱🇰), Marathi (🇮🇳), Kannada (🇮🇳), Somali (🇸🇴), Welsh (🏴), Javanese (🇮🇩), Pashto (🇦🇫), Uzbek (🇺🇿), Gujarati (🇮🇳), Khmer (🇰🇭), Urdu Romanized (🇵🇰), Amharic (🇪🇹), Bengali Romanized (🇧🇩), Punjabi (🇮🇳), Galician (🇪🇸), Hausa (🇳🇬), Sanskrit (🇮🇳), Basque (🇪🇸), Burmese (🇲🇲), Sundanese (🇮🇩), Oriya (🇮🇳), Haitian (🇭🇹), Lao (🇱🇦), Kyrgyz (🇰🇬), Breton (🇫🇷), Irish (🇮🇪), Yoruba (🇳🇬), Esperanto (🌐), Tamil Romanized (🇮🇳), Zulu (🇿🇦), Tigrinya (🇪🇷), Telugu Romanized (🇮🇳), Kurdish (🇹🇷), Oromo (🇪🇹), Xhosa (🇿🇦), Scottish Gaelic (🇬🇧), Igbo (🇳🇬), Assamese (🇮🇳), Ganda (🇺🇬), Wolof (🇸🇳), Western Frisian (🇳🇱), Tswana (🇧🇼), Fula (🇸🇳), Guaraní (🇵🇾), Sindhi (🇵🇰), Lingala (🇨🇩), Bambara (🇲🇱), Inuktitut (🇨🇦), Kongo (🇨🇩), Quechua (🇵🇪), Swati (🇸🇿), Unassigned (🌐) Github 🔍
Introducing L2M3, A Multilingual Medical Large Language Model to Advance Health Equity in Low-Resource Regions
2024 English (🇺🇸), Chinese (🇨🇳), Telugu (🇮🇳), Hindi (🇮🇳), Arabic (🇸🇦), Swahili (🇹🇿), Bengali (🇧🇩) 🔍 🔍
Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning
2022 Afrikaans (🇿🇦), Amharic (🇪🇹), Hausa (🇳🇬), Igbo (🇳🇬), Malagasy (🇲🇬), Chichewa (🇲🇼), Oromo (🇪🇹), Naija (🇳🇬), Kinyarwanda (🇷🇼), Kirundi (🇧🇮), Shona (🇿🇼), Somali (🇸🇴), Sesotho (🇱🇸), Swahili (🇹🇿), isiXhosa (🇿🇦), Yoruba (🇳🇬), isiZulu (🇿🇦), English (🇬🇧), French (🇫🇷), Arabic (🇸🇦), Lingala (🇨🇩), Luganda (🇺🇬), Luo (🇰🇪), Wolof (🇸🇳) GitHub 🤗
MuRIL: Multilingual Representations for Indian Languages
2021 Assamese (🇮🇳), Bengali (🇧🇩), Gujarati (🇮🇳), Hindi (🇮🇳), Kannada (🇮🇳), Kashmiri (🇮🇳), Malayalam (🇮🇳), Marathi (🇮🇳), Nepali (🇳🇵), Oriya (🇮🇳), Punjabi (🇮🇳), Sanskrit (🇮🇳), Sindhi (🇵🇰), Tamil (🇮🇳), Telugu (🇮🇳), Urdu (🇮🇳), English (🇬🇧) 🔍 🔍
From English to Foreign Languages: Transferring Pretrained Language Models
2020 French (🇫🇷), Russian (🇷🇺), Arabic (🇦🇪), Chinese (🇨🇳), Hindi (🇮🇳), Vietnamese (🇻🇳) 🔍 🔍

Multilingual Vision Language Models

Survey / Review Papers