SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) #2180

grzywin · 2024-11-05T11:37:47Z

What happened?

Example argus run: https://argus.scylladb.com/tests/scylla-cluster-tests/9d9291b1-27b4-4cd4-9e3c-2ca69700b2ae
Failing step:

Command: 'kubectl --kubeconfig=/home/ubuntu/sct-results/20241103-010504-970249/functional-master-9d9291b1/.kube/config --cache-dir=/home/ubuntu/sct-results/20241103-010504-970249/.kube/http-cache--functional-master-9d9291b1 --namespace=local-csi-driver rollout status daemonset.apps/local-csi-driver'
Stdout:
Waiting for daemon set spec update to be observed...
Waiting for daemon set "local-csi-driver" rollout to finish: 3 out of 4 new pods have been updated...
Waiting for daemon set "local-csi-driver" rollout to finish: 0 of 4 updated pods are available...

In must-gather logs we can see an error in NodeConfig

  - lastTransitionTime: "2024-11-03T01:27:27Z"
    message: 'MountControllerNodeip-10-0-0-28.eu-north-1.compute.internalDegraded:
      can''t create mounts: can''t ensure units: can''t get unit statuses: can''t
      list units by names "mnt-persistent\\x2dvolumes.mount": Unknown method ''ListUnitsByNames''
      or interface ''org.freedesktop.systemd1.Manager''.'
    observedGeneration: 1
    reason: MountControllerNodeip-10-0-0-28.eu-north-1.compute.internalDegraded_Error
    status: "True"
    type: Nodeip-10-0-0-28.eu-north-1.compute.internalDegraded

What did you expect to happen?

Cluster created without error in NodeConfig

How can we reproduce it (as minimally and precisely as possible)?

You can rerun argus job.

Scylla Operator version

v1.15.0-alpha.0-93-g24fd817-latest

But I have seen that this error start occur since version v1.15.0-alpha.0-56-g8bdba4f-latest

Kubernetes platform name and version

k8s: 1.27
platform: AWS/EKS
manager: 3.3
scylla: 2024.1.12

Please attach the must-gather archive.

Must-gather archive can be downloaded from here (Logs tab):
https://argus.scylladb.com/tests/scylla-cluster-tests/9d9291b1-27b4-4cd4-9e3c-2ca69700b2ae

Anything else we need to know?

No response

The text was updated successfully, but these errors were encountered:

rzetelskik · 2024-11-05T13:23:34Z

@grzywin what EKS AMI images are you using in these tests? Apparently ListUnitsByNames should be available for systemd v230+ (systemd/systemd#3182), but that's a release from 7 years ago.

rzetelskik · 2024-11-05T13:24:34Z

@tnozicka looks like we should prioritise adding EKS periodics btw

rzetelskik · 2024-11-05T14:08:25Z

Ok, looks like that's Amazon Linux 2 which is sitting at v219 https://docs.aws.amazon.com/AL2/latest/relnotes/relnotes-20240916.html.

We'll have to add a fallback to ListUnits if ListUnitsByNames is unsupported.

grzywin added the kind/bug Categorizes issue or PR as related to a bug. label Nov 5, 2024

grzywin assigned rzetelskik Nov 5, 2024

scylla-operator-bot bot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 5, 2024

grzywin mentioned this issue Nov 5, 2024

SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner #2181

Closed

grzywin changed the title ~~SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner~~ SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) Nov 5, 2024

tnozicka mentioned this issue Nov 25, 2024

Structurize, update and extend the documentation #2188

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) #2180

SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) #2180

grzywin commented Nov 5, 2024 •

edited

Loading

rzetelskik commented Nov 5, 2024 •

edited

Loading

rzetelskik commented Nov 5, 2024

rzetelskik commented Nov 5, 2024 •

edited

Loading

SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) #2180

SCT tests for Operator are failing during K8s cluster creation while installing dynamic local volume provisioner (AWS) #2180

Comments

grzywin commented Nov 5, 2024 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Scylla Operator version

Kubernetes platform name and version

Please attach the must-gather archive.

Anything else we need to know?

rzetelskik commented Nov 5, 2024 • edited Loading

rzetelskik commented Nov 5, 2024

rzetelskik commented Nov 5, 2024 • edited Loading

grzywin commented Nov 5, 2024 •

edited

Loading

rzetelskik commented Nov 5, 2024 •

edited

Loading

rzetelskik commented Nov 5, 2024 •

edited

Loading