You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The header is in TextRegionr3, but the ReadingOrder only includes the main text in r1, so dinglehopper does only extract the main text. This means: The file is buggy, not dinglehopper.
However, we can do better by warning that any region is not included in the extracted text.
mikegerber
changed the title
Text missing
Warn if there is text missing in the ReadingOrder
May 21, 2021
For 00451941.gt.xml,
dinglehopper-extract
does not extract the header's textDE L'ESPRIT DE L'HOMME
.The text was updated successfully, but these errors were encountered: