Amharic-Word-Embedding-Word2vec

Amharic-Word Embedding-Word2vec is a pre-trained distributed word representation (word embedding) which aims to provide the Amharic NLP researcher with free to use. The repository consists for codes that allow users to train thier embedding using thier own dataset and computing similarity between words/phrases, and two pre-trained models with different dataset settings (models with stemmed and unstemmed datasets). In addition, the repository handle a collection pair of Amharic words referred to as "wordsim100 (provides human annotated scores of relatedness between term pairs)" collected form potential users which was used to evaluate word embedding model.

Note: -to run the code you need to import gensim (python module)

  -you can also download the above embeddings form my drive. here are the links;
  
   Amharic word embedding_skipgram with stemmed data: https://drive.google.com/file/d/1f-AAdiu_caxAfEL7Ll8dOkNOiEUjMVC5/view?usp=share_link
  
   Amharic word embedding_skipgram with unstemmed data: https://drive.google.com/file/d/1SFTeMQALxKH3rsER60rXmobgr2D2Msd1/view?usp=share_link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Amharic-Word-Embedding-Word2vec

Files

README.md

Latest commit

History

README.md

File metadata and controls

Amharic-Word-Embedding-Word2vec