Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified #1044

Closed
1 of 3 tasks
rloredo opened this issue Dec 6, 2023 · 8 comments
Assignees
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe regression support

Comments

@rloredo
Copy link

rloredo commented Dec 6, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

With 1.7.2 if I remove loaded_at_field in a source freshness test, it logs that it started running the freshness test but never sends the query to BigQuery nor fails.

Expected Behavior

Fail and raise an error, or don't run that freshness test.

Steps To Reproduce

Using dbt-bigquery 1.7.2

With a source specified as:

tables:
      - name: source_something
         freshness:
          error_after: { count: 1, period: hour }
          filter: something_at >= '{{ var("data_interval_start") }}'

Running

dbt source freshness

will run forever.

In dbt-bigquery 1.6.5 it just doesn't run.

Relevant log output

Logs show nothing.

Environment

- OS: MacOS 13.6 (22G120), Debian 11
- Python: 3.10.9
- dbt-core: 1.7.3
- dbt-bigquery: 1.7.2

Additional Context

@mikealfare: Per @dbeatty10's comment:

Tasks

  1. content improvement priority: high size: x-small source freshness
    matthewshaver
@rloredo rloredo added bug Something isn't working triage labels Dec 6, 2023
@github-actions github-actions bot changed the title [Bug] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified [ADAP-1061] [Bug] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified Dec 6, 2023
@mikealfare
Copy link
Contributor

Thanks for submitting this issue @rloredo! I've reproduced the behavior in a test case and attached that PR to this issue.

@Fleid @dbeatty10 What would be the desired behavior in this scenario? Should we raise an error? Skip the test? Something else? We'll need to know this prior to resolving the issue.

@mikealfare mikealfare added refinement Product or leadership input needed and removed triage labels Dec 6, 2023
@dbeatty10
Copy link
Contributor

@Fleid @dbeatty10 What would be the desired behavior in this scenario? Should we raise an error? Skip the test? Something else? We'll need to know this prior to resolving the issue.

@mikealfare It sounds like the behavior in 1.6 is essentially "skip the source freshness test if loaded_at_field config is missing"?

If so, it would be reasonable to preserve that behavior in 1.7+.

The alternative of essentially making loaded_at_field a required field by raising an error could also be useful, but would have the downside of potentially breaking projects that are currently working.

So it seems like the most desireable behavior would be:

  • raise a warning whenever loaded_at_field is missing
  • skip any source freshness tests where loaded_at_field is missing

@mikealfare mikealfare removed the refinement Product or leadership input needed label Dec 6, 2023
@pgoslatara
Copy link
Contributor

The alternative of essentially making loaded_at_field a required field by raising an error could also be useful, but would have the downside of potentially breaking projects that are currently working.

@dbeatty10 From a long-term perspective, is this not the desired behaviour for this scenario? Because if someone has a source with a freshness check defined but without a loaded_at_field then I feel it's very unlikely that they are happy (or aware) that the check is skipped. In terms of encouraging best practices and promoting rigouous testing it would be better for this scenario to result in an error (similar to mis-configured models) rather than the expectation that a test is run, but in reality isn't.

Related, I agree that making a change of this nature is not what a patch release is for. In that case, is it better to have two actions as a result of this issue?

  • Short-term: patch release that restores the existing behaviour of 1.6.
  • Long-term: minor release of 1.8 that includes a breaking change related to an error being raised when loaded_at_field is not defined.

@dbeatty10
Copy link
Contributor

Thanks for bringing up your thoughts on short-term and long-term approaches @pgoslatara 🧠 !

  • Short-term: patch release that restores the existing behaviour of 1.6.

Yep, this is what we have queued up (along with explicitly raising a warning when loaded_at_field is missing) 👍

  • Long-term: minor release of 1.8 that includes a breaking change related to an error being raised when loaded_at_field is not defined.

We don't want to take introducing a breaking change lightly, and so it largely depends on the trade-offs.

Trade-offs

Since at least May 2020, our documentation for source freshness has said something like this:

In the freshness block, one or both of warn_after and error_after can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source.

Additionally, the loaded_at_field is required to calculate freshness for a table. If a loaded_at_field is not provided, then dbt will not calculate freshness for the table.

In one case, a user could be surprised that no source freshness is executed when loaded_at_field is omitted.

In another case, a user could have intentionally omitted one of the required fields and relied on the behavior described in the documentation.

Decision

Although we may re-assess this again at some point in the future, we'd rather not introduce a change that forces a breaking change at this point in time.

Rather, our approach to reduce the impact of the "surprise factor" is by raising a warning when loaded_at_field is not present.

One driving factor is that the current behavior has been described in the documentation for several years. Another is that adding a warning can reduce the impact of accidentally omitting this field (which is in line with other warnings that dbt raises in similar scenarios).

A lesser factor is the assumption that the developer adding the freshness check verified that they were able to induce both the warning and error states in their testing prior to merging. Otherwise, how would they know that count, period, filter, etc. are configured and working as expected?

@mikealfare
Copy link
Contributor

I was also reminded after this that we specifically removed loaded_at_field as a required field in 1.7 because we introduced the option of building source freshness based on database metadata tables. This enables the reporting of source freshness without inspecting the data in the table. This is both faster (no aggregation required and can collect for many sources at once) and more accurate (sources may be fresher than the data reflects if the aren't updated).

This context further supports @dbeatty10's direction above. Hopefully this helps.

@dbeatty10
Copy link
Contributor

Opened dbt-labs/docs.getdbt.com#4615 to clarify calculating source freshness from metadata vs. the loaded_at_field.

Acceptance criteria

For adapters that do not support calculating source freshness from metadata:

  • skip any source freshness tests where loaded_at_field is missing (to restore behavior of 1.6)
  • raise a warning whenever loaded_at_field is missing

For adapters that do support calculating source freshness from metadata, this should be a non-issue.

@dbeatty10 dbeatty10 changed the title [ADAP-1061] [Bug] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified [ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified Dec 8, 2023
@mikealfare mikealfare added backport 1.7.latest and removed bug Something isn't working labels Dec 13, 2023
@mikealfare mikealfare removed their assignment Dec 13, 2023
@mikealfare
Copy link
Contributor

@dbeatty10 I edited the issue description to include your comments above so they don't get overlooked. Given that this issue is particular to dbt-bigquery, I omitted your last comment. However, that does pertain to the docs issue that you opened since that would be general, and not specific to dbt-bigquery.

@martynydbt martynydbt added bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe support labels Feb 23, 2024
@mikealfare
Copy link
Contributor

Since we've implemented metadata-based source freshness. This bug is effectively resolved. While it previously should have thrown an error or warning, it now produces a result since dbt-bigquery has metadata-based source freshness enabled for 1.7+. See #1131 for the test demonstrating this functionality. Please let us know if you are still experiencing this problem.

Closing as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe regression support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants