Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 1.13 KB

README.md

File metadata and controls

10 lines (6 loc) · 1.13 KB

Amharic-Word-Embedding-Word2vec

Amharic-Word Embedding-Word2vec is a pre-trained distributed word representation (word embedding) which aims to provide the Amharic NLP researcher with free to use. The repository consists for codes that allow users to train thier embedding using thier own dataset and computing similarity between words/phrases, and two pre-trained models with different dataset settings (models with stemmed and unstemmed datasets). In addition, the repository handle a collection pair of Amharic words referred to as "wordsim100 (provides human annotated scores of relatedness between term pairs)" collected form potential users which was used to evaluate word embedding model.

Note: -to run the code you need to import gensim (python module)

  -you can also download the above embeddings form my drive. here are the links;
  
   Amharic word embedding_skipgram with stemmed data: https://drive.google.com/file/d/1f-AAdiu_caxAfEL7Ll8dOkNOiEUjMVC5/view?usp=share_link
  
   Amharic word embedding_skipgram with unstemmed data: https://drive.google.com/file/d/1SFTeMQALxKH3rsER60rXmobgr2D2Msd1/view?usp=share_link