Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracted DB and Object Storage Error Logs to display on DSPA #487

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

amadhusu
Copy link
Contributor

@amadhusu amadhusu commented Nov 22, 2023

The issue resolved by this Pull Request:

Resolves #451

Description of your changes:

Extracted DB and Object Storage Error Logs which were only visible in DSPO to display on DSPA Conditions with a brief message of any error.

Testing instructions

  1. Deploy DSPO
  2. Deploy a Minio in a self-signed external OCP cluster (preferablly different from the dsp host cluster)
oc new-project minio
oc -n minio apply -f https://gist.githubusercontent.com/HumairAK/f1358278ab70dde2ee074d1a35f8f058/raw/f234ba638f408c3aff8ed836858d0d1e332f43be/minio.yaml

# Retrieve the route
$MINIO_ROUTE=$(oc get routes -n minio minio-secure --template={{.spec.host}})
  1. On DSP host cluster, deploy a dspa, first without the cabundle:
kind: Secret
apiVersion: v1
metadata:
  name: s3secret
  namespace: dspa
stringData:
  customaccessKey: accesskey
  customsecretKey: secretkey
type: Opaque
---
apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
kind: DataSciencePipelinesApplication
metadata:
  name: sample
  namespace: dspa
spec:
  apiServer:
    enableSamplePipeline: false
  objectStorage:
    externalStorage:
      bucket: mlpipeline
      host: $MINIO_ROUTE  # <------------- add the route retrieved earlier !
      s3CredentialsSecret:
        accessKey: customaccessKey
        secretKey: customsecretKey
        secretName: s3secret
      scheme: https
  mlpipelineUI:
    image: 'quay.io/opendatahub/odh-ml-pipelines-frontend-container:beta-ui'
  1. Check DSPO logs, you should see an error mentioning x509 and instructions on adding a cabundle.
  2. Verify in the Conditions for sample DSPA instance that Condition Type ObjectStoreAvailable shows the same error mentioned in DSPO logs.
    Screenshot from 2023-11-23 16-45-35

Checklist

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@dsp-developers
Copy link
Contributor

A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487
An OCP cluster where you are logged in as cluster admin is required.

To use this image run the following:

cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/487/head
git checkout -b pullrequest 0f483ae689090cb85936aa973847d10df6e73b2a
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-487"

More instructions here on how to deploy and test a Data Science Pipelines Application.

@amadhusu amadhusu changed the title [Draft] Extracted DB and Object Storage Error Logs to display on DSPA [WIP] Extracted DB and Object Storage Error Logs to display on DSPA Nov 22, 2023
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@amadhusu amadhusu changed the title [WIP] Extracted DB and Object Storage Error Logs to display on DSPA Extracted DB and Object Storage Error Logs to display on DSPA Nov 23, 2023
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@hbelmiro
Copy link
Contributor

/lgtm

@@ -37,47 +38,49 @@ var mariadbTemplates = []string{
}

// extract to var for mocking in testing
var ConnectAndQueryDatabase = func(host, port, username, password, dbname string, dbConnectionTimeout time.Duration) bool {
var ConnectAndQueryDatabase = func(host, port, username, password, dbname string, dbConnectionTimeout time.Duration) (bool, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the intended purpose of message is for error handling, why don't we just return errors rather than strings and let the caller of this function get the message from errror.Error()? This is much closer to what is conventional in golang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your suggestion and did the same, but I decided to return errors where custom messaging was provided which makes it easier to read on the Web UI.

controllers/database.go Outdated Show resolved Hide resolved
}

decodePass, _ := b64.StdEncoding.DecodeString(params.DBConnection.Password)
dbConnectionTimeout := config.GetDurationConfigWithDefault(config.DBConnectionTimeoutConfigName, config.DefaultDBConnectionTimeout)

log.V(1).Info(fmt.Sprintf("Database Heath Check connection timeout: %s", dbConnectionTimeout))

dbHealthCheckPassed := ConnectAndQueryDatabase(params.DBConnection.Host,
dbHealthCheckPassed, message := ConnectAndQueryDatabase(params.DBConnection.Host,
Copy link
Member

@gmfrasca gmfrasca Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively/additionally to above, we could just return an error, and then assume success/failure below based on err != nil, which is a standard and common check

Copy link
Contributor Author

@amadhusu amadhusu Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function itself returns a bool apart from the error , I let it be and decided to assume the success/failure below based on err != nil though.

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

controllers/database.go Outdated Show resolved Hide resolved
controllers/storage.go Outdated Show resolved Hide resolved
Copy link
Contributor

@hbelmiro hbelmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 30, 2023
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

return true
infoMessage := "Database health check disabled, assuming database is available and ready."
log.V(1).Info(infoMessage)
return true, errors.New(infoMessage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure we should return the infoMessage as an error rather than just log a message and return nil. Seems a bit questionable to have a 'happy error', per se, esp since this is mostly a dev enablement feature, but interested what others think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @gmfrasca.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that was the case before I converted from string to error which makes sense to me as well.

Signed-off-by: Achyut Madhusudan <amadhusu@redhat.com>
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-487

@hbelmiro
Copy link
Contributor

hbelmiro commented Dec 1, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Dec 1, 2023
Copy link
Contributor

openshift-ci bot commented Dec 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: diegolovison, gmfrasca, hbelmiro

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 1, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit 325b3e8 into opendatahub-io:main Dec 1, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surface DB/S3 health check connection errors to DSPA cr
5 participants