Skip to content

Store OCR in the index itself #182

Answered by jbaiter
paroar asked this question in Q&A
Jun 14, 2021 · 1 comments · 4 replies
Discussion options

You must be logged in to vote

Hey @paroar, sorry for responding so late, didn't see the thread in the Discussions tab!

The answer is pretty simple, you just put the raw OCR markup into the field :-)

So let's say you have some hOCR, your document would look like this:

{
  "id": "1",
  "ocr_text": "<html><body><div class=\"ocr_page\">...</div></body></html>",
}

And that's it, the plugin will extract the text from the markup for indexing and use the stored markup for highlighting.

Hope that answers your questions! Do you have suggestions on how this could be made clearer in the documentation?

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@paroar
Comment options

@paroar
Comment options

@jbaiter
Comment options

jbaiter Aug 16, 2021
Collaborator

@paroar
Comment options

Answer selected by jbaiter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants