-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenSearch indexing and queries #2834
Conversation
✅ Deploy Preview for peppy-sprite-186812 canceled.
|
@@ -68,6 +68,7 @@ public OpenLineageResource( | |||
public void create(@Valid @NotNull BaseEvent event, @Suspended final AsyncResponse asyncResponse) | |||
throws JsonProcessingException, SQLException { | |||
if (event instanceof LineageEvent) { | |||
serviceFactory.getSearchService().indexEvent((LineageEvent) event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If search.enabled=false
, will the index call fail/error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling in the indexEvent
method:
public void indexEvent(@Valid @NotNull LineageEvent event) {
if (!searchConfig.isEnabled()) {
log.debug("Search is disabled, skipping indexing");
return;
}
UUID runUuid = runUuidFromEvent(event.getRun());
log.debug("Indexing event {}", event);
if (event.getInputs() != null) {
indexDatasets(event.getInputs(), runUuid, event);
}
if (event.getOutputs() != null) {
indexDatasets(event.getOutputs(), runUuid, event);
}
indexJob(runUuid, event);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, when we follow up with an SearchService interface, we'll want to bind search to an engine (psql
, opensearch
) and can do away with the flag.
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package marquez.api; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move the pkg
to marquez.api.v2beta.SearchResource
until we promote the API to v2
.
build.gradle
Outdated
@@ -64,6 +64,8 @@ subprojects { | |||
|
|||
dependencies { | |||
implementation "org.projectlombok:lombok:${lombokVersion}" | |||
implementation 'org.opensearch.client:opensearch-rest-client:2.15.0' | |||
implementation 'org.opensearch.client:opensearch-java:2.6.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to api/build.gradle
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2834 +/- ##
============================================
- Coverage 84.77% 83.28% -1.49%
- Complexity 1470 1477 +7
============================================
Files 256 259 +3
Lines 6626 6785 +159
Branches 308 313 +5
============================================
+ Hits 5617 5651 +34
- Misses 856 977 +121
- Partials 153 157 +4 ☔ View full report in Codecov by Sentry. |
# Conflicts: # .env.example # docker-compose.web.yml
run_id: string | ||
name: string | ||
namespace: string | ||
eventType: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's eventType
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, @phixMe! We definitely have some follow up work, but great to see the progress on search 💯 🚀 🥇
Signed-off-by: wslulciuc <willy@datakin.com>
Signed-off-by: wslulciuc <willy@datakin.com>
Signed-off-by: wslulciuc <willy@datakin.com>
Signed-off-by: wslulciuc <willy@datakin.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔍 ❤️
* Elasticsearch code. * Adding basic responses for elasticsearch. * Saving highlights. * Saving code cleanup. * Adding EsSearch. * Saving partial progress. * Refinements. * Small bug fixes. * Fixing alignment * Migrating es jobs naming to be specific. * Adding boilerplate for dataset es search * Adding datasets. * Adding polish for more data. * Empty state and other small enhancements. * Adding arrow key functionality. * Removing console log * Spotless * Refinements to queries. * Adding debounce. * Fixing alignment issues. * Saving updates for password setting via env config for elasticsearch. * Setting up startup scripts and adding corresponding waits. * Adding logs and more fields for jobs. * Resolving jackson serialization issue. * Small updates for search display. * Adding onClick handlers. * Fixing null cases, adding more search options for datasets. * Handling enter key. * Fixing minor encoding and layout issues for spark related open lineage events. * Additional fixes for text overflow on names and namespaces. * Fixing indexing problem. * Transitioning to opensearch. * Removing elasticsearch references. * Isolation of search code, calling services. * Adding config to support multiple instances. * Spotless * Adding helm files. * Adding in stronger password for search. * Handling debouncing. * Adding "ADVANCED_SEARCH" configurable variable for web. * Fixing some tests. * Moving indexing down a row. * Spotless * Putting back removed code. * Merge spotless resolution. * Skipping over search for db migration tests. * Adding search back to migration * Trying out ci config setting. * Removing search from base config as a whole. * Pushing out header updates. * Review comment on search service init. * Fixing up dependencies in docker to apply migrations. * Code review updates and naming changes. * newline * Updating for beta vs. non beta endpoints in search resource. * Moving search resource to its own place. * Removing prints. * Removing all helm changes for this work stream. * Adding back lock file contents. * Adding header * Adding middleware proxy. * Code review updates. * Moving from outer gradle to api gradle. * Removing extra containers. * Removing extra containers. * Set timeout for seed container to 60s Signed-off-by: wslulciuc <willy@datakin.com> * Fixing `--no-search` and frontend config. * Add check before indexing ol event Signed-off-by: wslulciuc <willy@datakin.com> * Fix db migration CI job Signed-off-by: wslulciuc <willy@datakin.com> --------- Signed-off-by: wslulciuc <willy@datakin.com> Co-authored-by: phix <peter.hicks@astronomer.io> Co-authored-by: wslulciuc <willy@datakin.com> Signed-off-by: Isa Inalcik <isa.inalcik@gmail.com>
* Elasticsearch code. * Adding basic responses for elasticsearch. * Saving highlights. * Saving code cleanup. * Adding EsSearch. * Saving partial progress. * Refinements. * Small bug fixes. * Fixing alignment * Migrating es jobs naming to be specific. * Adding boilerplate for dataset es search * Adding datasets. * Adding polish for more data. * Empty state and other small enhancements. * Adding arrow key functionality. * Removing console log * Spotless * Refinements to queries. * Adding debounce. * Fixing alignment issues. * Saving updates for password setting via env config for elasticsearch. * Setting up startup scripts and adding corresponding waits. * Adding logs and more fields for jobs. * Resolving jackson serialization issue. * Small updates for search display. * Adding onClick handlers. * Fixing null cases, adding more search options for datasets. * Handling enter key. * Fixing minor encoding and layout issues for spark related open lineage events. * Additional fixes for text overflow on names and namespaces. * Fixing indexing problem. * Transitioning to opensearch. * Removing elasticsearch references. * Isolation of search code, calling services. * Adding config to support multiple instances. * Spotless * Adding helm files. * Adding in stronger password for search. * Handling debouncing. * Adding "ADVANCED_SEARCH" configurable variable for web. * Fixing some tests. * Moving indexing down a row. * Spotless * Putting back removed code. * Merge spotless resolution. * Skipping over search for db migration tests. * Adding search back to migration * Trying out ci config setting. * Removing search from base config as a whole. * Pushing out header updates. * Review comment on search service init. * Fixing up dependencies in docker to apply migrations. * Code review updates and naming changes. * newline * Updating for beta vs. non beta endpoints in search resource. * Moving search resource to its own place. * Removing prints. * Removing all helm changes for this work stream. * Adding back lock file contents. * Adding header * Adding middleware proxy. * Code review updates. * Moving from outer gradle to api gradle. * Removing extra containers. * Removing extra containers. * Set timeout for seed container to 60s Signed-off-by: wslulciuc <willy@datakin.com> * Fixing `--no-search` and frontend config. * Add check before indexing ol event Signed-off-by: wslulciuc <willy@datakin.com> * Fix db migration CI job Signed-off-by: wslulciuc <willy@datakin.com> --------- Signed-off-by: wslulciuc <willy@datakin.com> Co-authored-by: phix <peter.hicks@astronomer.io> Co-authored-by: wslulciuc <willy@datakin.com> Signed-off-by: Isa Inalcik <isa.inalcik@gmail.com>
Problem
Our search right now does not enable nested queries on OpenLineage facets, code, linked entities, and ids. We want to
enable our search to be the very best place to absorb and catalog OpenLineage based data.
Opensearch.Demo.mov
Includes
To Follow Up
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)