Multilingual Machine Translation with Transformers

This project was for learning purposes only. Hence, focused on getting decent results rather than building an alternative to existing multilingual models.

Implemented a 7M parameter model.
Trained a BERT style tokenizer.
Trained on Opus100 Dataset with en-hi & en-te subsets.
Go through the entirety on Kaggle.

ENGLISH ----> HINDI
          |
          --> TELUGU

Working

The model understands which language to translate to based on the preceding beginning-of-sentence bos token:
- english sentences start with <s-en> token
- hindi sentences start with <s-hi> token
- telugu sentences start with <s-te> token
- all sentences end with </s> token
trained as a Sequence-to-Sequence transformer model with an encoder-decoder style architecture. Encoder handles english and decoder handles both hindi & telugu.

Model Config

config = {
    'dim': 128,
    'n_heads': 4,
    'attn_dropout': 0.1,
    'mlp_dropout': 0.1,
    'depth': 8,
    'vocab_size': 30000,
    'max_len': 128
 }

Inference Results

python inference.py --text 'how are you?' -l hi -s
>>> आप कैसे हैं?

python inference.py --text 'please call me' -l hi   
>>> कृपया मुझे पुकारो

python inference.py --text 'what are you doing?' -l te -s -t 0.5
>>> మీరు ఏం చేస్తున్నారు?

python inference.py --text "what's wrong?" -l te -s
>>> ఏమి తప్పు?

The results are kinda hilarious but atleast it works.

Here's the SOTA model if you really want good quality multilingual indic translation: ai4bharat/indictrans2-indic-en-1B, it's even used by the govt. of India officially.

I have refrained my feet from every evil way,
That I might keep thy word.
                                Psalm 119:101

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
model		model
.gitignore		.gitignore
README.md		README.md
en-hi-te-translation.ipynb		en-hi-te-translation.ipynb
inference.py		inference.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Machine Translation with Transformers

Working

Model Config

Inference Results

About

Languages

shreydan/multilingual-translation

Folders and files

Latest commit

History

Repository files navigation

Multilingual Machine Translation with Transformers

Working

Model Config

Inference Results

About

Topics

Resources

Stars

Watchers

Forks

Languages