Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand to add support for Vespa? #473

Open
jbaiter opened this issue Nov 15, 2024 · 0 comments
Open

Expand to add support for Vespa? #473

jbaiter opened this issue Nov 15, 2024 · 0 comments

Comments

@jbaiter
Copy link
Member

jbaiter commented Nov 15, 2024

With its very versatile support for hybrid search (combining "classic" Lucene-like term search with vector search), Vespa is becoming very popular in many contexts. It would be great to be able to use OCR highlighting with it, at least for the term-search.

Vespa supports using CharFilter implementations from Lucene, so at least the indexing side should be a simple matter of writing the appropriate wrappers to expose the functionality to it.

For rendering the responses, Vespa supports custom "Result Renderers", with these it should be simple to add a ocrHighlighting field to the response, assuming the Result object has offset information associated with it. I haven't yet found out how to access this information, but it's definitely available at least internally for the highlighting feature for fields and summaries (called "bolding" in Vespa). Hopefully there's a way to access it from the Renderer implementation.

Looks like it's going to be more complicated: The Java-side of Vespa does not have access to offset information, this is all handled in the C++ backend, and then passed on to the Java side as a text sequence with highlight markers, i.e. the offset information is lost. From what I could gather from the documentation, the Java side of Vespa is the only place where we can add extra functionality, so a straight 1:1 port of the approach used for Solr won't work in Vespa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant