Skip to content

Latest commit

 

History

History
37 lines (25 loc) · 1.14 KB

README.md

File metadata and controls

37 lines (25 loc) · 1.14 KB

OCR Translator (Linux OS)

Keywords: OCR, Tesseract-OCR, Google Translate, Shell Script, Linux

1. Introduction: OCR Translator

Immigrants often struggle to understand letters in a foreign language received by mail. OCR Translator aims to overcome language barriers, by using Tesseract-OCR and Google Translate.

2. Workflow

notice: the preferred way is using a flatbed scanner, camera-based functionality will be added in future releases.

3. Config

  1. Install Tesseract OCR; at time of writing, tesseract 4.0.0-beta.1 was used as OCR engine.

  2. Install dependencies (using conda virtualenv)

    # navigate to ./anaconda 
    conda env create --file environment.yml
    
    # activate OCR_Translator_env
    source activate OCR_Translator_env

Notes:

  • currently supported data types: PDF, png
  • one page only (multiple pdf pages won't work)

License

OCR_Translator_license