BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models

This is the repository for the paper BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models. This is a work in progress and more materials will be added over time.

The repository currently contains:

Indonesian & Tamil LINDSEA linguistic diagnostic dataset
Indonesian & Tamil cultural representation dataset

Folder Structure

.
├── LICENSE
├── README.md
├── culture
│   └── representation
│       ├── README.md
│       ├── id            # Data for Indonesian cultural representation
│       └── ta            # Data for Tamil cultural representation
└── lindsea
    ├── README.md
    ├── id
    │   ├── pragmatics    # Data for Indonesian pragmatic reasoning (scalar implicatures/presuppositions)
    │   ├── prompts.yaml  # Prompts (English & Translated) for LINDSEA (Indonesian)
    │   ├── semantics     # Data for Indonesian semantic tests (coreference/translation)
    │   └── syntax        # Data for Indonesian syntactic tests (minimal pairs)
    └── ta
        ├── pragmatics    # Data for Tamil pragmatic reasoning (scalar implicatures/presuppositions)
        ├── prompts.yaml  # Prompts (English & Translated) for LINDSEA (Tamil)
        ├── semantics     # Data for Tamil semantic tests (coreference/translation)
        └── syntax        # Data for Tamil syntactic tests (minimal pairs)

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Citation

Please cite our paper if you use our data:

@misc{leong2023bhasa,
      title={BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models},
      author={Wei Qi Leong and Jian Gang Ngui and Yosephine Susanto and Hamsawardhini Rengarajan and Kengatharaiyer Sarveswaran and William Chandra Tjhi},
      year={2023},
      eprint={2309.06085},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
culture/representation		culture/representation
lindsea		lindsea
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models

Folder Structure

License

Citation

About

Releases

Packages

License

aisingapore/BHASA

Folders and files

Latest commit

History

Repository files navigation

BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models

Folder Structure

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages