Skip to content

aisingapore/BHASA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models

CC BY 4.0

This is the repository for the paper BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models. This is a work in progress and more materials will be added over time.

The repository currently contains:

  • Indonesian & Tamil LINDSEA linguistic diagnostic dataset
  • Indonesian & Tamil cultural representation dataset

Folder Structure

.
├── LICENSE
├── README.md
├── culture
│   └── representation
│       ├── README.md
│       ├── id            # Data for Indonesian cultural representation
│       └── ta            # Data for Tamil cultural representation
└── lindsea
    ├── README.md
    ├── id
    │   ├── pragmatics    # Data for Indonesian pragmatic reasoning (scalar implicatures/presuppositions)
    │   ├── prompts.yaml  # Prompts (English & Translated) for LINDSEA (Indonesian)
    │   ├── semantics     # Data for Indonesian semantic tests (coreference/translation)
    │   └── syntax        # Data for Indonesian syntactic tests (minimal pairs)
    └── ta
        ├── pragmatics    # Data for Tamil pragmatic reasoning (scalar implicatures/presuppositions)
        ├── prompts.yaml  # Prompts (English & Translated) for LINDSEA (Tamil)
        ├── semantics     # Data for Tamil semantic tests (coreference/translation)
        └── syntax        # Data for Tamil syntactic tests (minimal pairs)

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Citation

Please cite our paper if you use our data:

@misc{leong2023bhasa,
      title={BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models},
      author={Wei Qi Leong and Jian Gang Ngui and Yosephine Susanto and Hamsawardhini Rengarajan and Kengatharaiyer Sarveswaran and William Chandra Tjhi},
      year={2023},
      eprint={2309.06085},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published