Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn if there is text missing in the ReadingOrder #59

Open
mikegerber opened this issue May 21, 2021 · 1 comment
Open

Warn if there is text missing in the ReadingOrder #59

mikegerber opened this issue May 21, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@mikegerber
Copy link
Member

For 00451941.gt.xml, dinglehopper-extract does not extract the header's text DE L'ESPRIT DE L'HOMME.

@mikegerber
Copy link
Member Author

mikegerber commented May 21, 2021

The header is in TextRegion r3, but the ReadingOrder only includes the main text in r1, so dinglehopper does only extract the main text. This means: The file is buggy, not dinglehopper.

However, we can do better by warning that any region is not included in the extracted text.

@mikegerber mikegerber changed the title Text missing Warn if there is text missing in the ReadingOrder May 21, 2021
@mikegerber mikegerber self-assigned this May 21, 2021
@mikegerber mikegerber added the enhancement New feature or request label May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant