OCRmyPDF now supports EasyOCR - so you can use EasyOCR on PDFs! #1094

jbarlow83 · 2023-07-25T07:54:21Z

jbarlow83
Jul 25, 2023

OCRmyPDF is the leading command line "PDF to OCR-PDF" tool. Most image to text tools like EasyOCR and Tesseract focus on image to text conversion, to avoid the complexities of PDF - so if you have a PDF that you want to apply to OCR, you have to manually convert it to some other format. OCRmyPDF takes care of all those conversion details, using lossless conversions and meticulous attention to edge cases.

To date, OCRmyPDF has used Tesseract OCR almost exclusively. I've now created a plugin that adds support for using EasyOCR as the OCR engine.

When the plugin is installed to a virtual environment that contains OCRmyPDF, EasyOCR will be used instead of Tesseract where possible.

Currently, Tesseract is still used for page deskew determination, page rotation detection, and a few other functions. If anyone has thoughts on how to use EasyOCR or related ML models for the functions above without Tesseract, I'd be happy to incorporate them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCRmyPDF now supports EasyOCR - so you can use EasyOCR on PDFs! #1094

{{title}}

Replies: 0 comments

Select a reply

OCRmyPDF now supports EasyOCR - so you can use EasyOCR on PDFs! #1094

jbarlow83 Jul 25, 2023

Replies: 0 comments

jbarlow83
Jul 25, 2023