Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack: Add openstack-provision-etcd-disk-speed step #58941

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mandre
Copy link
Member

@mandre mandre commented Nov 18, 2024

This step patches the etcd cluster to configure disk speed. It defaults
to being a no-op.

We know our OpenStack envs are slow. Let's tweak etcd to be more
tolerant to disk latency by enabling slow profile for all OpenStack clusters.

@@ -0,0 +1,44 @@
#!/usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step could be shared with other platforms to avoid duplication in the future, but this seems to be a common pattern in this repo.

@EmilienM
Copy link
Member

/approve

@mandre
Copy link
Member Author

mandre commented Nov 18, 2024

/pj-rehearse periodic-ci-shiftstack-ci-release-4.18-e2e-openstack-ovn-etcd-scaling

@openshift-ci-robot
Copy link
Contributor

@mandre: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2024
@MiguelCarpio
Copy link
Contributor

NOTE: ETCD slow profile only works from OCP 4.16 and above

@EmilienM
Copy link
Member

Fixed metadata, by running make update.

@EmilienM
Copy link
Member

/lgtm
/hold
I'll let you pj-rehearse ack once you think this is ready.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2024
Copy link
Contributor

openshift-ci bot commented Nov 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: EmilienM, mandre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
Copy link
Contributor

openshift-ci bot commented Nov 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: EmilienM, mandre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2024
Copy link
Contributor

openshift-ci bot commented Nov 18, 2024

New changes are detected. LGTM label has been removed.

@mandre
Copy link
Member Author

mandre commented Nov 18, 2024

/pj-rehearse periodic-ci-shiftstack-ci-release-4.18-e2e-openstack-ovn-etcd-scaling

@openshift-ci-robot
Copy link
Contributor

@mandre: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@mandre
Copy link
Member Author

mandre commented Nov 19, 2024

OK, the 4.18-e2e-openstack-ovn-etcd-scaling job might not be representative after all. It's running with etcd on RAMFS already and should not fail due to disk latency.

According to the logs, we've successfully patched the cluster (the heartbeat interval changed to 500 as seen here) but our version check failed because we couldn't find the bc binary. This needs to be fixed.

Looking at prow results, the 4.18 ovn-etcd-scaling jobs are very unstable on every platform. This is also the case for the 4.17 ovn-etcd-scaling jobs. The 4.16 ovn-etcd-scaling jobs are fine however.

Copy link
Contributor

openshift-ci bot commented Nov 19, 2024

@mandre: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/step-registry-shellcheck 4a2c535 link true /test step-registry-shellcheck
ci/rehearse/periodic-ci-shiftstack-ci-release-4.18-e2e-openstack-ovn-etcd-scaling 2648409 link unknown /pj-rehearse periodic-ci-shiftstack-ci-release-4.18-e2e-openstack-ovn-etcd-scaling

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

mandre and others added 2 commits November 19, 2024 09:42
This step patches the etcd cluster to configure disk speed. It defaults
to being a no-op.

Co-Authored-By: Miguel Carpio <mcarpio@redhat.com>
We know our OpenStack envs are slow. Let's tweak etcd to be more
tolerant to disk latency.
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@mandre: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-cluster-storage-operator-master-e2e-openstack-cinder-csi openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.19-e2e-openstack-cinder-csi openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.18-e2e-openstack-cinder-csi openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-master-e2e-openstack-manila-csi openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.19-e2e-openstack-manila-csi openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.18-e2e-openstack-manila-csi openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-master-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-master-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.19-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.19-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.18-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.18-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.17-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.17-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.16-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.16-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.15-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.15-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.14-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.14-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.13-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.13-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.12-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.12-e2e-openstack-parallel openshift/cluster-storage-operator presubmit Registry content changed
pull-ci-openshift-cluster-storage-operator-release-4.11-e2e-openstack openshift/cluster-storage-operator presubmit Registry content changed

A total of 898 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@mandre
Copy link
Member Author

mandre commented Nov 19, 2024

/pj-rehearse periodic-ci-shiftstack-ci-release-4.16-e2e-openstack-ccpmso-zone

@openshift-ci-robot
Copy link
Contributor

@mandre: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants