This repository contains the resources related to our research on English-Sinhala word embedding alignment.
- alignment_matrices/ contains the alignment matrices obtained using different alignment techniques in different directions (i.e. Si --> En and En --> Si).
- all_data/ contains all the datasets we used for the supervised alignment. The datasets have been created using the large datasets provided in this repository.
- muse_content/ contains the scripts used for iterative Procrustes alignment which have been adopted from this repository by facebook-research.
- rcsls_content/ contains the scripts used for RCSLS alignment which have been adopted from the FastText repository by facebook-research.
- vecmap_content/ contains the scripts used for VecMap alignment which have been adopted from the VecMap repository.
- contrastive_bli_content/ contains the scripts used for ContranstiveBLI alignment which have been adopted from the ContranstiveBLI repository.
Alignment results obtained for Sinhala-English alignment from further studies (publication is under review):
If you are willing to use this work, please be kind enough to cite the following papers.
@INPROCEEDINGS{10253560,
author={Wickramasinghe, Kasun and De Silva, Nisansa},
booktitle={2023 IEEE 17th International Conference on Industrial and Information Systems (ICIIS)},
title={Sinhala-English Parallel Word Dictionary Dataset},
year={2023},
volume={},
number={},
pages={61-66},
keywords={Dictionaries;Annotations;Pipelines;Machine translation;Task analysis;Information systems;parallel corpus;alignment;English-Sinhala dictionary;word embedding alignment;lexicon induction},
doi={10.1109/ICIIS58898.2023.10253560}}
@inproceedings{wickramasinghe-de-silva-2023-sinhala,
title = "{S}inhala-{E}nglish Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language",
author = "Wickramasinghe, Kasun and
de Silva, Nisansa",
editor = "Huang, Chu-Ren and
Harada, Yasunari and
Kim, Jong-Bok and
Chen, Si and
Hsu, Yu-Yin and
Chersoni, Emmanuele and
A, Pranav and
Zeng, Winnie Huiheng and
Peng, Bo and
Li, Yuxi and
Li, Junlin",
booktitle = "Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation",
month = dec,
year = "2023",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.paclic-1.42",
pages = "424--435",
}