Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disallowing inconsistent metadata in cli-migrations images (fixes #10599) #10602

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

chardo
Copy link

@chardo chardo commented Nov 15, 2024

Description

The Hasura CLI's hasura metadata apply command supports a --disallow-inconsistent-metadata flag, which helps to prevent breaking changes to metadata before they're applied rather than discovering them through hasura metadata ic list or, worse, via runtime application errors. However, in production environments, it's common to deploy graphql-engine using the cli-migrations Docker image and avoid exposing the metadata API entirely. This means that CI/CD workflows have no guaranteed way of preventing inconsistent metadata from landing in production, which is made even riskier by the fact that metadata changes will be automatically picked up by any already-running instances, even if that metadata is inconsistent.

This change attempts to address that by exposing an optional HASURA_GRAPHQL_DISALLOW_INCONSISTENT_METADATA env variable that can be provided to the cli-migrations Docker images in order to activate the corresponding --disallow-inconsistent-metadata flag on the hasura metadata apply step. If this is set, metadata application will fail, the docker-entrypoint.sh script will exit early, and the container will fail to start up.

Also, just to say: I know that I opened this PR before I got any traction on the associated issue. If there's good reason for not implementing this change, I will understand and won't mind throwing this work away.

Changelog

Component : build

Type: feature

Product: community-edition

Short Changelog

Add support for disallowing inconsistent metadata in cli-migrations image

Long Changelog

Related Issues

#10599

Solution and Design

This follows the existing pattern for configuration env vars in the docker-entrypoint.sh script(s), though is strict in requiring the value of the new variable to be "true" (case insensitive) rather than it just being set to any value.

In this first draft, I've not made any effort to fail gracefully in the event of inconsistent metadata. Ideally I think we'd probably capture the exit code, shut down the temporary graphql-engine server, and then exit with the original code. That said, this implementation is consistent with how the script already handles possible non-zero exit codes for the hasura-cli commands (eg if the server is unreachable, or the metadata contains invalid YAML).

Zooming out a little bit, it's maybe also worth mentioning that I've deliberately chosen to isolate this feature to the cli-migrations image, rather than making it a server-level config variable that would change the graphql-engine's default behavior when receiving new metadata updates. The latter seems a bit far-reaching, and I'd rather leverage an existing API than broaden/complicate its scope in a significant way.

Steps to test and verify

I updated the existing test scripts so that, after confirming the "good" behavior works as intended, they also attempt to apply some inconsistent metadata and then confirm that the docker image is unable to start up.

I couldn't find any tools/docs for running tests locally, but I was able to get both test scripts running and passing locally by:

  • installing a copy of the hasura CLI into a local hasura-cli
  • building the cli-migrations images manually
  • setting the necessary test env vars
  • running the test scripts on local

I did set the new disallow-inconsistent-metadata flag to "true" on both of the test docker-compose.yaml files, so that I could just augment the existing test files. If you'd prefer to have this test run in a separate, isolated file with a different env configuration, I'm willing to do that too! Just thought this was a simpler first revision.

Limitations, known bugs & workarounds

Server checklist

Catalog upgrade

Does this PR change Hasura Catalog version?

  • No
  • Yes
    • Updated docs with SQL for downgrading the catalog

Metadata

n/a

Does this PR add a new Metadata feature?

  • No
  • Yes
    • Does run_sql auto manages the new metadata through schema diffing?
      • Yes
      • Not required
    • Does run_sql auto manages the definitions of metadata on renaming?
      • Yes
      • Not required
    • Does export_metadata/replace_metadata supports the new metadata added?
      • Yes
      • Not required

GraphQL

  • No new GraphQL schema is generated
  • New GraphQL schema is being generated:
    • New types and typenames are correlated

Breaking changes

  • No Breaking changes

  • There are breaking changes:

    1. Metadata API

      Existing query types:

      • Modify args payload which is not backward compatible
      • Behavioural change of the API
      • Change in response JSON schema
      • Change in error code
    2. GraphQL API

      Schema Generation:

      • Change in any NamedType
      • Change in table field names

      Schema Resolve:-

      • Change in treatment of null value for any input fields
    3. Logging

      • Log JSON schema has changed
      • Log type names have changed

@chardo chardo requested a review from a team as a code owner November 15, 2024 17:42
@CLAassistant
Copy link

CLAassistant commented Nov 15, 2024

CLA assistant check
All committers have signed the CLA.

@@ -77,7 +87,7 @@ if [ -d "$HASURA_GRAPHQL_METADATA_DIR" ]; then
echo "version: 3" > config.yaml
echo "endpoint: http://localhost:$HASURA_GRAPHQL_MIGRATIONS_SERVER_PORT" >> config.yaml
echo "metadata_directory: metadata" >> config.yaml
hasura-cli metadata apply
hasura-cli metadata apply $HASURA_GRAPHQL_DISALLOW_INCONSISTENT_METADATA
Copy link
Author

@chardo chardo Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question for the codeowners - is there a good reason that the v3 image applies metadata updates before applying db migrations, while v2 does them the other way around?

I suppose either order might result in temporary metadata inconsistencies if DB updates and metadata updates are bundled into the same release. if you're dropping a column or table, you probably want metadata applied first; if you're adding a column or table, you probably want migrations applied first -- so we probably need to accept some unavoidable ICs either way. just checking to see if there's a good reason that we're picking one side for v2 and another for v3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update on this - I went ahead and modified this script so that it has the same ordering (and thus the same behavior) as v2. The need for this became more clear after my updates to the test scripts, which revealed that the v2 test could pass while the v3 test would fail under the same circumstances, because the current v3 setup relies on DB migrations being run before metadata can be applied, in order to remain strictly consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants