Replies: 2 comments
-
IIRC we also wanted to validate image formats here, i.e.
The idea was to get a fail-early-and-loud mechanism, but as opt-in (by using this processor as first step in the workflow). Another idea might be to conveniently repair, if possible. (This aspect applies to DPI meta-data as well). |
Beta Was this translation helpful? Give feedback.
-
BTW, DPI estimation is non-trivial and already implies some kind of line segmentation dependency (like ocrolib) IMO. Strategically, we could approach this as a processor in core with some base functionality (without DPI estimation, or a very crude one), but hooks for easy overriding. Other modules (like ocrd_cis) could then subclass from the image characterization in core to provide their own. |
Beta Was this translation helpful? Give feedback.
-
As discussed in our Open Tech Call, there is interest in the community for a pre-processor that validates certain assumptions on the images and possibly offers solutions to alleviate them. This arose from a discussion around reliable calculation of DPI (cf. #676).
Maybe we can use this thread to gather requirements before moving forward with the implementation.
Functionality such a processor should have:
min-dpi
: Lower threshold to reject an imageestimate-dpi
: Whether to use heuristics on image size and page size to estimate DPIscale
: Whether to scale images to fake a higher DPIFeedback and proposals are welcome!
Beta Was this translation helpful? Give feedback.
All reactions