OCRmyPDF now supports EasyOCR - so you can use EasyOCR on PDFs! #1094
jbarlow83
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
OCRmyPDF is the leading command line "PDF to OCR-PDF" tool. Most image to text tools like EasyOCR and Tesseract focus on image to text conversion, to avoid the complexities of PDF - so if you have a PDF that you want to apply to OCR, you have to manually convert it to some other format. OCRmyPDF takes care of all those conversion details, using lossless conversions and meticulous attention to edge cases.
To date, OCRmyPDF has used Tesseract OCR almost exclusively. I've now created a plugin that adds support for using EasyOCR as the OCR engine.
When the plugin is installed to a virtual environment that contains OCRmyPDF, EasyOCR will be used instead of Tesseract where possible.
Currently, Tesseract is still used for page deskew determination, page rotation detection, and a few other functions. If anyone has thoughts on how to use EasyOCR or related ML models for the functions above without Tesseract, I'd be happy to incorporate them.
Beta Was this translation helpful? Give feedback.
All reactions