Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws.securityhub_findings: Add fields to _source as needed by CDR workflows. #11608

Merged
merged 3 commits into from
Nov 1, 2024

Conversation

kcreddy
Copy link
Contributor

@kcreddy kcreddy commented Oct 31, 2024

Proposed commit message

Add cloud.provider, event.kind, and observer.vendor fields to
_source as needed by CDR workflows.

The commit here removed the fields from _source. But the fields are required to be
present in _source for Cloud Detection and Response (CDR) workflows. This PR reverts
the changes made in that commit and re-adds the fields into the ingest pipeline.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

cd packages/aws && elastic-package stack down && elastic-package build && elastic-package stack up --version=8.16.0-SNAPSHOT -d -v && eval "$(elastic-package stack shellinit)" && elastic-package test pipeline --generate -v --data-streams=securityhub_findings

--- Test results for package: aws - START ---
╭─────────┬──────────────────────┬───────────┬──────────────────────────────────────────────────────────┬────────┬──────────────╮
│ PACKAGE │ DATA STREAM          │ TEST TYPE │ TEST NAME                                                │ RESULT │ TIME ELAPSED │
├─────────┼──────────────────────┼───────────┼──────────────────────────────────────────────────────────┼────────┼──────────────┤
│ aws     │ securityhub_findings │ pipeline  │ (ingest pipeline warnings test-securityhub-findings.log) │ PASS   │ 322.486291ms │
│ aws     │ securityhub_findings │ pipeline  │ test-securityhub-findings.log                            │ PASS   │ 274.104459ms │
╰─────────┴──────────────────────┴───────────┴──────────────────────────────────────────────────────────┴────────┴──────────────╯
--- Test results for package: aws - END   ---
Done

Related issues

Sample documents after the change:

  1. Source index: _source-updated-document-destination-index.json
  2. Destination index: _source-updated-document-source-index.json

@kcreddy kcreddy changed the title aws.securityhub_findings: Add fields to _source as needed by CDR workflows. aws.securityhub_findings: Add fields to _source as needed by CDR workflows. Oct 31, 2024
@kcreddy kcreddy self-assigned this Oct 31, 2024
@kcreddy kcreddy added bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations] labels Oct 31, 2024
@kcreddy kcreddy marked this pull request as ready for review October 31, 2024 16:58
@kcreddy kcreddy requested review from a team as code owners October 31, 2024 16:58
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@andrewkroh
Copy link
Member

@maxcold What is the reason that the CDR workflows depend on _source instead of fields? We should not need to sacrifice on storage efficiency.

@maxcold
Copy link
Contributor

maxcold commented Oct 31, 2024

@andrewkroh in general there is quite a lot of code in Cloud Security that relies on the _source , eg. https://github.com/elastic/kibana/blob/main/x-pack/plugins/cloud_security_posture/public/pages/configurations/latest_findings/latest_findings_table.tsx#L55
I don't know if there is any specific reason for that or was it just smth that worked for us as we only relied on the data coming from cloudbeat and our native integration. For us to switch from this in our codebase won't be a small thing. I can bring it up with the team and have a github issue to track the effort of not relying directly on the _source

@andrewkroh
Copy link
Member

andrewkroh commented Oct 31, 2024

I can bring it up with the team and have a github issue to track the effort of not relying directly on the _source

I think that would be the prudent course of action. We can add these fields into _source as a temporary workaround.

Another important reason to avoid _source is synthetic source, which is now enabled by default in Serverless as part of LogsDB and will become the default in Elasticsearch in the next major. Every query for _source incurs the penalty of synthesizing the _source value from the doc values.

@elasticmachine
Copy link

💚 Build Succeeded

cc @kcreddy

@efd6
Copy link
Contributor

efd6 commented Oct 31, 2024

To provide context, the relevant PR and comment are here.

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as a work-around while the world is brought into order.

@kcreddy kcreddy merged commit 676f9e2 into elastic:main Nov 1, 2024
5 checks passed
@elastic-vault-github-plugin-prod

Package aws - 2.31.1 containing this change is available at https://epr.elastic.co/search?package=aws

@maxcold
Copy link
Contributor

maxcold commented Nov 1, 2024

@andrewkroh here is the issue to track getting rid of _source in our queries:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Pull request that fixes a bug issue Integration:aws AWS Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants