Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) #2180

Open
grzywin opened this issue Nov 5, 2024 · 3 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@grzywin
Copy link

grzywin commented Nov 5, 2024

What happened?

Example argus run: https://argus.scylladb.com/tests/scylla-cluster-tests/9d9291b1-27b4-4cd4-9e3c-2ca69700b2ae
Failing step:

Command: 'kubectl --kubeconfig=/home/ubuntu/sct-results/20241103-010504-970249/functional-master-9d9291b1/.kube/config --cache-dir=/home/ubuntu/sct-results/20241103-010504-970249/.kube/http-cache--functional-master-9d9291b1 --namespace=local-csi-driver rollout status daemonset.apps/local-csi-driver'
Stdout:
Waiting for daemon set spec update to be observed...
Waiting for daemon set "local-csi-driver" rollout to finish: 3 out of 4 new pods have been updated...
Waiting for daemon set "local-csi-driver" rollout to finish: 0 of 4 updated pods are available...

In must-gather logs we can see an error in NodeConfig

  - lastTransitionTime: "2024-11-03T01:27:27Z"
    message: 'MountControllerNodeip-10-0-0-28.eu-north-1.compute.internalDegraded:
      can''t create mounts: can''t ensure units: can''t get unit statuses: can''t
      list units by names "mnt-persistent\\x2dvolumes.mount": Unknown method ''ListUnitsByNames''
      or interface ''org.freedesktop.systemd1.Manager''.'
    observedGeneration: 1
    reason: MountControllerNodeip-10-0-0-28.eu-north-1.compute.internalDegraded_Error
    status: "True"
    type: Nodeip-10-0-0-28.eu-north-1.compute.internalDegraded

What did you expect to happen?

Cluster created without error in NodeConfig

How can we reproduce it (as minimally and precisely as possible)?

You can rerun argus job.

Scylla Operator version

v1.15.0-alpha.0-93-g24fd817-latest

But I have seen that this error start occur since version v1.15.0-alpha.0-56-g8bdba4f-latest

Kubernetes platform name and version

k8s: 1.27
platform: AWS/EKS
manager: 3.3
scylla: 2024.1.12

Please attach the must-gather archive.

Must-gather archive can be downloaded from here (Logs tab):
https://argus.scylladb.com/tests/scylla-cluster-tests/9d9291b1-27b4-4cd4-9e3c-2ca69700b2ae

Anything else we need to know?

No response

@grzywin grzywin added the kind/bug Categorizes issue or PR as related to a bug. label Nov 5, 2024
@scylla-operator-bot scylla-operator-bot bot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 5, 2024
@grzywin grzywin changed the title SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) Nov 5, 2024
@rzetelskik
Copy link
Member

rzetelskik commented Nov 5, 2024

@grzywin what EKS AMI images are you using in these tests? Apparently ListUnitsByNames should be available for systemd v230+ (systemd/systemd#3182), but that's a release from 7 years ago.

@rzetelskik
Copy link
Member

@tnozicka looks like we should prioritise adding EKS periodics btw

@rzetelskik
Copy link
Member

rzetelskik commented Nov 5, 2024

Ok, looks like that's Amazon Linux 2 which is sitting at v219 https://docs.aws.amazon.com/AL2/latest/relnotes/relnotes-20240916.html.

We'll have to add a fallback to ListUnits if ListUnitsByNames is unsupported.

@rzetelskik rzetelskik added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants