-
Hi, I was following the Github Pages Documentation and I can't grasp how to do this.
I do the following. Post method to solr that updates the file via JSON {
"id": "1",
"ocr_text": ???
} Thanks for this amazing plugin! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hey @paroar, sorry for responding so late, didn't see the thread in the Discussions tab! The answer is pretty simple, you just put the raw OCR markup into the field :-) So let's say you have some hOCR, your document would look like this: {
"id": "1",
"ocr_text": "<html><body><div class=\"ocr_page\">...</div></body></html>",
} And that's it, the plugin will extract the text from the markup for indexing and use the stored markup for highlighting. Hope that answers your questions! Do you have suggestions on how this could be made clearer in the documentation? |
Beta Was this translation helpful? Give feedback.
Hey @paroar, sorry for responding so late, didn't see the thread in the Discussions tab!
The answer is pretty simple, you just put the raw OCR markup into the field :-)
So let's say you have some hOCR, your document would look like this:
And that's it, the plugin will extract the text from the markup for indexing and use the stored markup for highlighting.
Hope that answers your questions! Do you have suggestions on how this could be made clearer in the documentation?