Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

must-gather doesn't anonymize many things when running Scylla in GCP #2015

Open
gdubicki opened this issue Jul 12, 2024 · 7 comments
Open

must-gather doesn't anonymize many things when running Scylla in GCP #2015

gdubicki opened this issue Jul 12, 2024 · 7 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@gdubicki
Copy link
Contributor

gdubicki commented Jul 12, 2024

What happened?

I did a run of must-gather as documented at https://operator.docs.scylladb.com/stable/support/must-gather.html.

After grepping for some names I noticed that the data collected didn't anonymize some things:

  • GCS bucket names used for backups,
  • GCP project names,
  • GCR image names,

What did you expect to happen?

I was expecting these names to be anonymized.
Instead I had to do a bunch of recursive find and replace (grep -rl old . | xargs sed -i "" -e 's/old/new/g') myself...

At the minimum, the warning in the docs about checking the gathered data should be emphasized and turn into a required step.

How can we reproduce it (as minimally and precisely as possible)?

  1. Deploy Scylla in GCP
  2. Configure backups in GCS
  3. Run some other workloads in the same cluster with images from GCR
  4. Run must-gather
  5. grep for the aforementioned names in the result directory

Scylla Operator version

1.13.0

Kubernetes platform name and version

$ kubectl version
Client Version: v1.29.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.5-gke.1192000

Please attach the must-gather archive.

I can't attach the non-anonymized archive because that's the point here. The additionally manually anonymized version is in #2016.

Anything else we need to know?

No response

Copy link
Contributor

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out

/lifecycle stale

@scylla-operator-bot scylla-operator-bot bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2024
Copy link
Contributor

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out

/lifecycle rotten

@scylla-operator-bot scylla-operator-bot bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 12, 2024
@gdubicki
Copy link
Contributor Author

Still valid! I just had to anonymize manually again in #2133.

@gdubicki
Copy link
Contributor Author

/remove-lifecycle stale

@gdubicki
Copy link
Contributor Author

/remove-lifecycle rotten

@scylla-operator-bot scylla-operator-bot bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 10, 2024
@zimnx zimnx added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Oct 10, 2024
@scylla-operator-bot scylla-operator-bot bot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Oct 10, 2024
@tnozicka
Copy link
Member

After grepping for some names I noticed that the data collected didn't anonymize some things:

GCS bucket names used for backups,
GCP project names,
GCR image names,

None of these are part of the API so I don't think we can automatically redact them and sometimes some of them may matter.

Although, I struggle to see the secrecy of say GCR image names.

At the minimum, the warning in the docs about checking the gathered data should be emphasized and turn into a required step.

It already says you may want to review it. I don't think "required" is fitting here. In Kubernetes secret data is supposed to be stored in Secrets.

By default, all collected Secrets are censored to avoid sending sensitive data. That said, you can always review the archive before you attach it to an issue or your support request.

https://operator.docs.scylladb.com/v1.14/support/must-gather.html#gathering-data-with-must-gather

@tnozicka tnozicka added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. kind/feature Categorizes issue or PR as related to a new feature. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. kind/bug Categorizes issue or PR as related to a bug. labels Oct 11, 2024
Copy link
Contributor

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out

/lifecycle stale

@scylla-operator-bot scylla-operator-bot bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

3 participants