Google Translate Backtranslation for NLP data augmentation

By Jean-Philippe Corbeil and Hadi Abdi Ghadivel

This script was programmed for data augmentation of NLP copora for the paraphrase identification task. It can easily be adapted for any other NLP task. We use Google API. Thus, you need to provide your own Google API token in your own .env file and activate Google Translate API.

Figure 1. Data augmentation in NLP with backtranslation procedure.

No filter is applied in this part of the code. We leave it to further processing steps.

Install depencies

Simply use the requirements.txt file (better in virtual environment):

pip install -r requirements.txt

Last update: June 8th, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
aug_data.png		aug_data.png
backtranslate.py		backtranslate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Translate Backtranslation for NLP data augmentation

Install depencies

About

Releases

Packages

Languages

jpcorb20/google-translate-backtranslation-da

Folders and files

Latest commit

History

Repository files navigation

Google Translate Backtranslation for NLP data augmentation

Install depencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages