Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/dremio): Dremio Source Ingestion #11598

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

sagar-salvi-apptware
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Oct 11, 2024
@sagar-salvi-apptware sagar-salvi-apptware changed the title feat: Dremio Source Ingestion feat(ingest/dremio): Dremio Source Ingestion Oct 11, 2024
Copy link
Collaborator

@mayurinehate mayurinehate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make use of report.warning and/or report.error for cases where user needs to be aware of error details - such as login failure due to missing PAT or incorrect password ?

dataset_pattern: AllowDenyPattern = Field(
default=AllowDenyPattern.allow_all(),
description="Regex patterns for schemas to filter",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like schema_pattern uses TABLE_SCHEMA column which is fully qualified so this is fine.

Copy link
Collaborator

@mayurinehate mayurinehate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good to get in. A few minor things around tests and a decision pending on profiling.

logger = logging.getLogger(__name__)


class DremioProfiler:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to integrate DatahubGEProfiler here instead to make use of other improvements(multithreading, query optimisation, etc) around it. Not sure if this needs to be done right away. cc: @hsheth2 to weigh in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment docs Issues and Improvements to docs ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants