Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: On Cloud Service (Managed) ROSA Cluster data_science_pipelines_application_apiserver_ready alert is not firing for Operator v1.34 #131

Open
asanzgom opened this issue Oct 24, 2023 · 0 comments
Labels
kind/bug Something isn't working

Comments

@asanzgom
Copy link

asanzgom commented Oct 24, 2023

ODH Component

Data Science Pipelines

Current Behavior

On a Cloud Service (Managed) ROSA Cluster the data_science_pipelines_application_apiserver_ready alerts are not firing and metrics return "Empty query result" for Operator v1.34

Expected Behavior

data_science_pipelines_application_apiserver_ready alerts should fire and metrics return data

Steps To Reproduce

  1. In Dashboard, create a project e.g. 'test-dspa-alerts' and deploy an example pipeline
  2. Verify the following metrics:
    data_science_pipelines_application_ready{dspa_namespace="test-dspa-alerts"}
    data_science_pipelines_application_apiserver_ready{dspa_namespace="test-dspa-alerts"}
    data_science_pipelines_application_persistenceagent_ready{dspa_namespace="test-dspa-alerts"}
    data_science_pipelines_application_scheduledworkflow_ready{dspa_namespace="test-dspa-alerts"}

Expected result: All metric should have value = 1

Result: data_science_pipelines_application_apiserver_ready{dspa_namespace="test-dspa-alerts"} throws an "Empty query result" the other three queries return 1:

image

  1. Provoke a disruption in the service providing the Data Science Pipelines API in the user's namespace

    Workloads > Deployments
    Project: test-dspa-alerts
    Scale down to 0 pods:
    ds-pipeline-persistenceagent-pipelines-definition
    ds-pipeline-pipelines-definition
    ds-pipeline-scheduledworkflow-pipelines-definition

  2. Verify metrics:
    data_science_pipelines_application_ready{dspa_namespace="test-dspa-alerts"}
    data_science_pipelines_application_apiserver_ready{dspa_namespace="test-dspa-alerts"}
    data_science_pipelines_application_persistenceagent_ready{dspa_namespace="test-dspa-alerts"}
    data_science_pipelines_application_scheduledworkflow_ready{dspa_namespace="test-dspa-alerts"}

Expected result: All metric should have value = 0

Result: data_science_pipelines_application_apiserver_ready{dspa_namespace="test-dspa-alerts"} throws an "Empty query result" the other three queries return 0

  1. Verify that after 5 minutes of disruption the following alerts are firing:

    Data Science Pipeline Application Unavailable
    Data Science Pipeline APIServer Unavailabl
    Data Science Pipeline PersistenceAgent Unavailable
    Data Science Pipeline ScheduledWorkflows Unavailable
    Data Science Pipelines Application Route Error Burn Rate (for 2m)

Result: All 4 alerts were firing except for the "Data Science Pipeline APIServer Unavailable" one:
image

  1. Verify alerts are firing also in Alertmanager:

Result: All 4 alerts were firing except for the "Data Science Pipeline APIServer Unavailable" one:

image

  1. Verify that the alert can be seen in OpenShift Cluster Monitoring prometheus:

    OpenShift Console > Monitoring > Metrics
    Run this query: ALERTS{namespace=~"redhat-ods-applications|redhat-ods-monitoring|redhat-ods-operator|rhods-notebooks"}

Expected result: Alerts should be active

Actual result: All 4 alerts were firing except for the "Data Science Pipeline APIServer Unavailable" one:

image

@asanzgom asanzgom added the kind/bug Something isn't working label Oct 24, 2023
@asanzgom asanzgom changed the title [Bug]: On Cloud Service (Managed) Cluster data_science_pipelines_application_apiserver_ready alert is not firing for Operator v1.34 [Bug]: On Cloud Service (Managed) ROSA Cluster data_science_pipelines_application_apiserver_ready alert is not firing for Operator v1.34 Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: No status
Status: No status
Development

No branches or pull requests

1 participant