From a2d5c4cec9cba83d4f29d9b57a44ab5ea7bb7854 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 17 Oct 2024 13:05:00 +0100 Subject: [PATCH 1/2] add page --- website/docs/docs/build/snapshots.md | 2 + .../resource-configs/dbt_valid_to_current.md | 99 +++++++++++++++++++ website/sidebars.js | 1 + 3 files changed, 102 insertions(+) create mode 100644 website/docs/reference/resource-configs/dbt_valid_to_current.md diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index f5321aa626a..e6e5eafeff5 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -70,6 +70,7 @@ snapshots: [updated_at](/reference/resource-configs/updated_at): column_name [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes): true | false [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): dictionary + [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current): string ``` @@ -88,6 +89,7 @@ The following table outlines the configurations available for snapshots: | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | | [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | +| [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table. | No | string | - In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. - A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). diff --git a/website/docs/reference/resource-configs/dbt_valid_to_current.md b/website/docs/reference/resource-configs/dbt_valid_to_current.md new file mode 100644 index 00000000000..7f0bc5a2855 --- /dev/null +++ b/website/docs/reference/resource-configs/dbt_valid_to_current.md @@ -0,0 +1,99 @@ +--- +resource_types: [snapshots] +description: "Snapshot dbt_valid_to_current custom date" +datatype: "{}" +default_value: {"dbt_valid_from": "dbt_valid_from", "dbt_valid_to": "dbt_valid_to", "dbt_scd_id": "dbt_scd_id", "dbt_updated_at": "dbt_updated_at"} +id: "dbt_valid_to_current" +--- + +Available in 1.9 or with [Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) dbt Cloud. + + + +```yaml: +snapshots: + my_project: + +dbt_valid_to_current: "to_date('9999-12-31')" + +``` + + + + + +```sql +{{ + config( + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + dbt_valid_to_current='to_date('9999-12-31')' + ) +}} + + +``` + + + + + +```yml +snapshots: + [](/reference/resource-configs/resource-path): + +dbt_valid_to_current: "to_date('9999-12-31')" +``` + + + +## Description + +Use the `dbt_valid_to_current` config to set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table. + +This simplifies makes it easier to assign date and range-based filtering with a concrete end date. + +## Default + +By default, `dbt_valid_to` is set to `NULL` for current (most recent) records in your snapshot table. This indicates that these records are still valid and have no defined end date. + +If you prefer to use a specific value instead of `NULL` for `dbt_valid_to` in current and future records (such as a date set in the far future, '9999-12-31'), you can use the `dbt_valid_to_current` configuration option. + +The value assigned to `dbt_valid_to_current` should be a string representing a valid date or timestamp, depending on your database's requirements. + +:::info +For existing records — To avoid any unintentional data modification, dbt will **not** automatically adjust the current value in the existing `dbt_valid_to` column. Existing current records will still have `dbt_valid_to` set to `NULL`. + +For new records — Any new records inserted after applying the `dbt_valid_to_current` configuration will have `dbt_valid_to` set to the specified value (for example, '9999-12-31'), instead of `NULL`. + +This means your snapshot table will have current records with `dbt_valid_to` values of both `NULL` (from existing data) and the new specified value (from new data). If you'd rather have consistent `dbt_valid_to` values for current records, you can either manually update existing records in your snapshot table where `dbt_valid_to` is `NULL` to match your `dbt_valid_to_current` value or rebuild your snapshot table. +::: + +## Example + + + +```yaml +snapshots: + - name: my_snapshot + config: + strategy: timestamp + updated_at: updated_at + dbt_valid_to_current: "to_date('9999-12-31')" + columns: + - name: dbt_valid_from + description: The timestamp when the record became valid. + - name: dbt_valid_to + description: > + The timestamp when the record ceased to be valid. For current records, + this is either `NULL` or the value specified in `dbt_valid_to_current` + (e.g., `'9999-12-31'`). +``` + + + +The resulting snapshot table contains the configured dbt_valid_to column value: + +| id | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to | +| -- | -------------------- | -------------------- | -------------------- | -------------------- | +| 1 | 60a1f1dbdf899a4dd... | 2024-10-02 ... | 2024-10-02 ... | 9999-12-31 ... | +| 2 | b1885d098f8bcff51... | 2024-10-02 ... | 2024-10-02 ... | 9999-12-31 ... | diff --git a/website/sidebars.js b/website/sidebars.js index d31f6d03f57..13535a29df3 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -976,6 +976,7 @@ const sidebarSettings = { "reference/resource-configs/updated_at", "reference/resource-configs/invalidate_hard_deletes", "reference/resource-configs/snapshot_meta_column_names", + "reference/resource-configs/dbt_valid_to_current", ], }, { From 3d54fee75f7b16c9b0bbf849e9f4fcc478b267dd Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 17 Oct 2024 16:01:15 +0100 Subject: [PATCH 2/2] add custom future date --- website/docs/docs/build/snapshots.md | 37 +++++++++++-------- .../docs/docs/dbt-versions/release-notes.md | 2 +- .../resource-configs/dbt_valid_to_current.md | 19 ++++------ 3 files changed, 31 insertions(+), 27 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index e6e5eafeff5..d4256cfcbf0 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -36,12 +36,6 @@ This order is now in the "shipped" state, but we've lost the information about w ## Configuring snapshots -:::info Previewing or compiling snapshots in IDE not supported - -It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, [run the `dbt snapshot` command](#how-snapshots-work) in the IDE. - -::: - - To configure snapshots in versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). These versions use an older syntax where snapshots are defined within a snapshot block in a `.sql` file, typically located in your `snapshots` directory. @@ -71,7 +65,6 @@ snapshots: [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes): true | false [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): dictionary [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current): string - ``` @@ -88,14 +81,14 @@ The following table outlines the configurations available for snapshots: | [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | | [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | -| [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | | [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table. | No | string | +| [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | + - In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. - A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). - You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). - ### Add a snapshot to your project To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). @@ -114,6 +107,7 @@ To add a snapshot to your project follow these steps. For users on versions 1.8 unique_key: id strategy: timestamp updated_at: updated_at + dbt_valid_to_current: "to_date('9999-12-31')" # Specifies that current records should have `dbt_valid_to` set to `'9999-12-31'` instead of `NULL`. ``` @@ -174,6 +168,15 @@ This strategy handles column additions and deletions better than the `check` str + + + +By default, `dbt_valid_to` is `NULL` for current records. However, if you set the [`dbt_valid_to_current` configuration](/reference/resource-configs/dbt_valid_to_current) (available in Versionless and 1.9 and higher), `dbt_valid_to` will be set to your specified value (such as `9999-12-31`) for current records. + +This simplifies your SQL queries by avoiding `NULL` checks and allowing for straightforward date range filtering. + + + The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). @@ -206,12 +209,14 @@ Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots i ### How snapshots work When you run the [`dbt snapshot` command](/reference/commands/snapshot): -* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null`. +* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null` or the value specified in [`dbt_valid_to_current`](/reference/resource-configs/dbt_valid_to_current) if configured. * **On subsequent runs:** dbt will check which records have changed or if any new records have been created: - - The `dbt_valid_to` column will be updated for any existing records that have changed - - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` + - The `dbt_valid_to` column will be updated for any existing records that have changed. + - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` or the value configured in `dbt_valid_to_current`. -Note, these column names can be customized to your team or organizational conventions using the [snapshot_meta_column_names](#snapshot-meta-fields) config. +#### Note +- These column names can be customized to your team or organizational conventions using the [snapshot_meta_column_names](#snapshot-meta-fields) config. +- If you have set the `dbt_valid_to_current` configuration option, then instead of `NULL`, the `dbt_valid_to` field in future records will be set to your specified value (such as `9999-12-31`). Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. @@ -419,12 +424,14 @@ Basically – keep your query as simple as possible! Some reasonable exceptions Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*. -Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless), these column names can be customized to your team or organizational conventions via the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config. +Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless): +- These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config. +- Use the [`dbt_valid_to_current`](/reference/resource-configs/dbt_valid_to_current) config to set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table. | Field | Meaning | Usage | | -------------- | ------- | ----- | | dbt_valid_from | The timestamp when this snapshot row was first inserted | This column can be used to order the different "versions" of a record. | -| dbt_valid_to | The timestamp when this row became invalidated. | The most recent snapshot record will have `dbt_valid_to` set to `null`. | +| dbt_valid_to | The timestamp when this row became invalidated.
For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | | dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | | dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt | diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 662fd0f381a..71865974b6d 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -44,7 +44,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo
- +- **New**: Use the [`dbt_valid_to_current`](/reference/resource-configs/dbt_valid_to_current) config to set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table. - **New**: The [dbt Semantic Layer Python software development kit](/docs/dbt-cloud-apis/sl-python) is now [generally available](/docs/dbt-versions/product-lifecycles). It provides users with easy access to the dbt Semantic Layer with Python and enables developers to interact with the dbt Semantic Layer APIs to query metrics/dimensions in downstream tools. - **Enhancement**: You can now add a description to a singular data test in dbt Cloud Versionless. Use the [`description` property](/reference/resource-properties/description) to document [singular data tests](/docs/build/data-tests#singular-data-tests). You can also use [docs block](/docs/build/documentation#using-docs-blocks) to capture your test description. The enhancement will be included in upcoming dbt Core 1.9 release. - **New**: Introducing the [microbatch incremental model strategy](/docs/build/incremental-microbatch) (beta), available in dbt Cloud Versionless and will soon be supported in dbt Core 1.9. The microbatch strategy allows for efficient, batch-based processing of large time-series datasets for improved performance and resiliency, especially when you're working with data that changes over time (like new records being added daily). To enable this feature in dbt Cloud, set the `DBT_EXPERIMENTAL_MICROBATCH` environment variable to `true` in your project. diff --git a/website/docs/reference/resource-configs/dbt_valid_to_current.md b/website/docs/reference/resource-configs/dbt_valid_to_current.md index 7f0bc5a2855..d2af1847470 100644 --- a/website/docs/reference/resource-configs/dbt_valid_to_current.md +++ b/website/docs/reference/resource-configs/dbt_valid_to_current.md @@ -10,7 +10,7 @@ Available in 1.9 or with [Versionless](/docs/dbt-versions/upgrade-dbt-version-in -```yaml: +```yaml snapshots: my_project: +dbt_valid_to_current: "to_date('9999-12-31')" @@ -30,8 +30,6 @@ snapshots: dbt_valid_to_current='to_date('9999-12-31')' ) }} - - ``` @@ -50,23 +48,22 @@ snapshots: Use the `dbt_valid_to_current` config to set a custom future date for `dbt_valid_to` in new snapshot columns. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` in the snapshot table. -This simplifies makes it easier to assign date and range-based filtering with a concrete end date. +This simplifies makes it easier to assign date, work in a join, and range-based filtering with an end date. ## Default -By default, `dbt_valid_to` is set to `NULL` for current (most recent) records in your snapshot table. This indicates that these records are still valid and have no defined end date. +By default, `dbt_valid_to` is set to `NULL` for current (most recent) records in your snapshot table. This means that these records are still valid and have no defined end date. -If you prefer to use a specific value instead of `NULL` for `dbt_valid_to` in current and future records (such as a date set in the far future, '9999-12-31'), you can use the `dbt_valid_to_current` configuration option. +If you prefer to use a specific value instead of `NULL` for `dbt_valid_to` in current and future records, you can use the `dbt_valid_to_current` configuration option. For example, setting a date in the far future, `9999-12-31`. The value assigned to `dbt_valid_to_current` should be a string representing a valid date or timestamp, depending on your database's requirements. -:::info -For existing records — To avoid any unintentional data modification, dbt will **not** automatically adjust the current value in the existing `dbt_valid_to` column. Existing current records will still have `dbt_valid_to` set to `NULL`. +### Managing records +- For existing records — To avoid any unintentional data modification, dbt will _not_ automatically adjust the current value in the existing `dbt_valid_to` column. Existing current records will still have `dbt_valid_to` set to `NULL`. -For new records — Any new records inserted after applying the `dbt_valid_to_current` configuration will have `dbt_valid_to` set to the specified value (for example, '9999-12-31'), instead of `NULL`. +- For new records — Any new records inserted after applying the `dbt_valid_to_current` configuration will have `dbt_valid_to` set to the specified value (for example, '9999-12-31'), instead of `NULL`. This means your snapshot table will have current records with `dbt_valid_to` values of both `NULL` (from existing data) and the new specified value (from new data). If you'd rather have consistent `dbt_valid_to` values for current records, you can either manually update existing records in your snapshot table where `dbt_valid_to` is `NULL` to match your `dbt_valid_to_current` value or rebuild your snapshot table. -::: ## Example @@ -86,7 +83,7 @@ snapshots: description: > The timestamp when the record ceased to be valid. For current records, this is either `NULL` or the value specified in `dbt_valid_to_current` - (e.g., `'9999-12-31'`). + (like `'9999-12-31'`). ```