Machine Translation for Middle Egyptian-English

This repo preprocesses various corpora of Middle Egyptian transliterations to an identical format, then uses supervised and semi-supervised learning techniques with OpenNMT for machine translation. Afterwards, the results are quantified using token-accuracy, perplexity, cross-entropy and BLEU score.

Supervised case:
Corpus size: 12,938 aligned sentences
Current max BLEU score = 42.22

Semi-supervised case:
Corpus size: 50,457 monolingual sentences + 12,938 aligned sentences
Current max BLEU score = 41.78

In-progress:

Parse pyramid texts from PDF to add additional ~5k aligned sentences
Preprocess newly added aligned sentences
Update machine translation notebook with new BLEU score after corpus expanded
Semi-supervised machine translation pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
compiled_corpora		compiled_corpora
onmt-files/run		onmt-files/run
pyramidtext-parsing		pyramidtext-parsing
texts		texts
word2vec		word2vec
.gitattributes		.gitattributes
EgyptianTranslation.ipynb		EgyptianTranslation.ipynb
Generate_Word2Vec.ipynb		Generate_Word2Vec.ipynb
README.md		README.md
SemiSupervised.ipynb		SemiSupervised.ipynb
egyptiandatasetgenerator.ipynb		egyptiandatasetgenerator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Translation for Middle Egyptian-English

About

Languages

fayrose/EgyptianTranslation

Folders and files

Latest commit

History

Repository files navigation

Machine Translation for Middle Egyptian-English

About

Topics

Resources

Stars

Watchers

Forks

Languages