You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Requesting a version of PDF OCR that only runs tesseract OCR on embedded images in PDF instead of capturing the whole page of the PDF.
A lot of my professors use powerpoints converted to PDF, the text is already text, while the screen-grabs they use lack this and could benefit from OCR.
I believe this could save time for others as well as not all PDF documents are purely images and often a combination.
The text was updated successfully, but these errors were encountered:
T-Dane
changed the title
Running OCR on embedded images of PDF using Poppler pdfimages or ImageMapping instead of whole pdf?
Running OCR on embedded images of PDF using Poppler pdfimages or ImageMapping instead of whole pdf pages converted to png?
Oct 30, 2024
Interesting idea, but inserting the OCRed text back into the existing text layer for hybrid pages might be challenging.
I'm not familiar with ImageMapping, can you provide a link?
Thanks. We're currently looking into reducing the dependencies on external programs, so I'm not sure we'll use your suggestion, but we'll keep this in mind.
Requesting a version of PDF OCR that only runs tesseract OCR on embedded images in PDF instead of capturing the whole page of the PDF.
A lot of my professors use powerpoints converted to PDF, the text is already text, while the screen-grabs they use lack this and could benefit from OCR.
I believe this could save time for others as well as not all PDF documents are purely images and often a combination.
The text was updated successfully, but these errors were encountered: