[ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified #1044

rloredo · 2023-12-06T14:27:25Z

Is this a new bug in dbt-bigquery?

I believe this is a new bug in dbt-bigquery
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

With 1.7.2 if I remove loaded_at_field in a source freshness test, it logs that it started running the freshness test but never sends the query to BigQuery nor fails.

Expected Behavior

Fail and raise an error, or don't run that freshness test.

Steps To Reproduce

Using dbt-bigquery 1.7.2

With a source specified as:

tables:
      - name: source_something
         freshness:
          error_after: { count: 1, period: hour }
          filter: something_at >= '{{ var("data_interval_start") }}'

Running

dbt source freshness

will run forever.

In dbt-bigquery 1.6.5 it just doesn't run.

Relevant log output

Logs show nothing.

Environment

- OS: MacOS 13.6 (22G120), Debian 11
- Python: 3.10.9
- dbt-core: 1.7.3
- dbt-bigquery: 1.7.2

Additional Context

@mikealfare: Per @dbeatty10's comment:

Tasks

Give feedback

Skip any source freshness tests where loaded_at_field is missing (to restore behavior of 1.6)
Raise a warning whenever loaded_at_field is missing
loaded_at_field not required for source freshness checks as-of 1.7 docs.getdbt.com#4615

content improvement priority: high size: x-small source freshness
Options

The text was updated successfully, but these errors were encountered:

mikealfare · 2023-12-06T19:27:44Z

Thanks for submitting this issue @rloredo! I've reproduced the behavior in a test case and attached that PR to this issue.

@Fleid @dbeatty10 What would be the desired behavior in this scenario? Should we raise an error? Skip the test? Something else? We'll need to know this prior to resolving the issue.

dbeatty10 · 2023-12-06T19:37:42Z

@Fleid @dbeatty10 What would be the desired behavior in this scenario? Should we raise an error? Skip the test? Something else? We'll need to know this prior to resolving the issue.

@mikealfare It sounds like the behavior in 1.6 is essentially "skip the source freshness test if loaded_at_field config is missing"?

If so, it would be reasonable to preserve that behavior in 1.7+.

The alternative of essentially making loaded_at_field a required field by raising an error could also be useful, but would have the downside of potentially breaking projects that are currently working.

So it seems like the most desireable behavior would be:

raise a warning whenever loaded_at_field is missing
skip any source freshness tests where loaded_at_field is missing

pgoslatara · 2023-12-07T12:32:41Z

The alternative of essentially making loaded_at_field a required field by raising an error could also be useful, but would have the downside of potentially breaking projects that are currently working.

@dbeatty10 From a long-term perspective, is this not the desired behaviour for this scenario? Because if someone has a source with a freshness check defined but without a loaded_at_field then I feel it's very unlikely that they are happy (or aware) that the check is skipped. In terms of encouraging best practices and promoting rigouous testing it would be better for this scenario to result in an error (similar to mis-configured models) rather than the expectation that a test is run, but in reality isn't.

Related, I agree that making a change of this nature is not what a patch release is for. In that case, is it better to have two actions as a result of this issue?

Short-term: patch release that restores the existing behaviour of 1.6.
Long-term: minor release of 1.8 that includes a breaking change related to an error being raised when loaded_at_field is not defined.

dbeatty10 · 2023-12-07T15:33:14Z

Thanks for bringing up your thoughts on short-term and long-term approaches @pgoslatara 🧠 !

Short-term: patch release that restores the existing behaviour of 1.6.

Yep, this is what we have queued up (along with explicitly raising a warning when loaded_at_field is missing) 👍

Long-term: minor release of 1.8 that includes a breaking change related to an error being raised when loaded_at_field is not defined.

We don't want to take introducing a breaking change lightly, and so it largely depends on the trade-offs.

Trade-offs

Since at least May 2020, our documentation for source freshness has said something like this:

In the freshness block, one or both of warn_after and error_after can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source.

Additionally, the loaded_at_field is required to calculate freshness for a table. If a loaded_at_field is not provided, then dbt will not calculate freshness for the table.

In one case, a user could be surprised that no source freshness is executed when loaded_at_field is omitted.

In another case, a user could have intentionally omitted one of the required fields and relied on the behavior described in the documentation.

Decision

Although we may re-assess this again at some point in the future, we'd rather not introduce a change that forces a breaking change at this point in time.

Rather, our approach to reduce the impact of the "surprise factor" is by raising a warning when loaded_at_field is not present.

One driving factor is that the current behavior has been described in the documentation for several years. Another is that adding a warning can reduce the impact of accidentally omitting this field (which is in line with other warnings that dbt raises in similar scenarios).

A lesser factor is the assumption that the developer adding the freshness check verified that they were able to induce both the warning and error states in their testing prior to merging. Otherwise, how would they know that count, period, filter, etc. are configured and working as expected?

mikealfare · 2023-12-07T16:47:11Z

I was also reminded after this that we specifically removed loaded_at_field as a required field in 1.7 because we introduced the option of building source freshness based on database metadata tables. This enables the reporting of source freshness without inspecting the data in the table. This is both faster (no aggregation required and can collect for many sources at once) and more accurate (sources may be fresher than the data reflects if the aren't updated).

This context further supports @dbeatty10's direction above. Hopefully this helps.

dbeatty10 · 2023-12-08T22:38:00Z

Opened dbt-labs/docs.getdbt.com#4615 to clarify calculating source freshness from metadata vs. the loaded_at_field.

Acceptance criteria

For adapters that do not support calculating source freshness from metadata:

skip any source freshness tests where loaded_at_field is missing (to restore behavior of 1.6)
raise a warning whenever loaded_at_field is missing

For adapters that do support calculating source freshness from metadata, this should be a non-issue.

mikealfare · 2023-12-13T18:08:29Z

@dbeatty10 I edited the issue description to include your comments above so they don't get overlooked. Given that this issue is particular to dbt-bigquery, I omitted your last comment. However, that does pertain to the docs issue that you opened since that would be general, and not specific to dbt-bigquery.

mikealfare · 2024-03-02T01:40:33Z

Since we've implemented metadata-based source freshness. This bug is effectively resolved. While it previously should have thrown an error or warning, it now produces a result since dbt-bigquery has metadata-based source freshness enabled for 1.7+. See #1131 for the test demonstrating this functionality. Please let us know if you are still experiencing this problem.

Closing as resolved.

rloredo added bug Something isn't working triage labels Dec 6, 2023

github-actions bot changed the title ~~[Bug] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified~~ [ADAP-1061] [Bug] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified Dec 6, 2023

mikealfare mentioned this issue Dec 6, 2023

Fix hanging dbt source freshness command when loaded_at_field is omitted #1046

Closed

4 tasks

mikealfare added refinement Product or leadership input needed and removed triage labels Dec 6, 2023

mikealfare removed the refinement Product or leadership input needed label Dec 6, 2023

dbeatty10 mentioned this issue Dec 8, 2023

loaded_at_field not required for source freshness checks as-of 1.7 dbt-labs/docs.getdbt.com#4615

Closed

1 task

dbeatty10 added the regression label Dec 8, 2023

dbeatty10 changed the title ~~[ADAP-1061] [Bug] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified~~ [ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified Dec 8, 2023

nathaniel-may assigned mikealfare Dec 13, 2023

mikealfare added backport 1.7.latest and removed bug Something isn't working labels Dec 13, 2023

mikealfare removed their assignment Dec 13, 2023

martynydbt assigned mikealfare Feb 21, 2024

martynydbt added bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe support labels Feb 23, 2024

mikealfare mentioned this issue Mar 2, 2024

add test demonstrating source freshness works with no loaded at field #1131

Closed

mikealfare closed this as completed Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified #1044

[ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified #1044

rloredo commented Dec 6, 2023 •

edited by mikealfare

Loading

Tasks

mikealfare commented Dec 6, 2023

dbeatty10 commented Dec 6, 2023

pgoslatara commented Dec 7, 2023

dbeatty10 commented Dec 7, 2023

mikealfare commented Dec 7, 2023

dbeatty10 commented Dec 8, 2023

mikealfare commented Dec 13, 2023

mikealfare commented Mar 2, 2024

[ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified #1044

[ADAP-1061] [Regression] Source freshness never sends query and never finishes/fail if loaded_at_field is not specified #1044

Comments

rloredo commented Dec 6, 2023 • edited by mikealfare Loading

Is this a new bug in dbt-bigquery?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

Tasks

mikealfare commented Dec 6, 2023

dbeatty10 commented Dec 6, 2023

pgoslatara commented Dec 7, 2023

dbeatty10 commented Dec 7, 2023

Trade-offs

Decision

mikealfare commented Dec 7, 2023

dbeatty10 commented Dec 8, 2023

Acceptance criteria

mikealfare commented Dec 13, 2023

mikealfare commented Mar 2, 2024

rloredo commented Dec 6, 2023 •

edited by mikealfare

Loading