Update databricks-configs.md to include MV/ST (#4045)

## What are you changing in this pull request and why?  Editing the Databricks configuration reference: * Expanded configuration table to indicate that we now support all of the main table config options in Python as well as SQL, with the exception of Liquid Clustering, which is not yet part of PySpark. * Added section describing support for materialized_view and streaming_table. ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [About versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) so my content adheres to these guidelines. - [x] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."
dbt-labs · Sep 19, 2023 · 6d63f3a · 6d63f3a
2 parents bf40649 + 703b6f9
commit 6d63f3a
Showing 1 changed file with 163 additions and 10 deletions.
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
@@ -7,21 +7,39 @@ id: "databricks-configs"
 
 When materializing a model as `table`, you may include several optional configs that are specific to the dbt-databricks plugin, in addition to the standard [model configs](/reference/model-configs).
 
-| Option  | Description                                                                                                                        | Required?               | Example                  |
-|---------|------------------------------------------------------------------------------------------------------------------------------------|-------------------------|--------------------------|
-| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | `delta`|
-| location_root  | The created table uses the specified directory to store its data. The table alias is appended to it.                               | Optional                | `/mnt/root`              |
-| partition_by  | Partition the created table by the specified columns. A directory is created for each partition.                                   | Optional                | `date_day`              |
-| liquid_clustered_by  | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2.                                  | Optional                | `date_day`              |
-| clustered_by  | Each partition in the created table will be split into a fixed number of buckets by the specified columns.                         | Optional               | `country_code`              |
-| buckets  | The number of buckets to create while clustering                                                                                   | Required if `clustered_by` is specified                | `8`              |
+<VersionBlock lastVersion="1.5">
+
+| Option              | Description                                                                                                                                                                                                        | Required?                               | Example        |
+|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|----------------|
+| file_format         | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`).                                                                                | Optional                                | `delta`        |
+| location_root       | The created table uses the specified directory to store its data. The table alias is appended to it.                                                                                                               | Optional                                | `/mnt/root`    |
+| partition_by        | Partition the created table by the specified columns. A directory is created for each partition.                                                                                                                   | Optional                                | `date_day`     |
+| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional                                | `date_day`     |
+| clustered_by        | Each partition in the created table will be split into a fixed number of buckets by the specified columns.                                                                                                         | Optional                                | `country_code` |
+| buckets             | The number of buckets to create while clustering                                                                                                                                                                   | Required if `clustered_by` is specified | `8`            |
+
+</VersionBlock>
+
+<VersionBlock firstVersion="1.6">
+
+| Option              | Description                                                                                                                                                                                                        | Required?                               | Model Support | Example        |
+|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|---------------|----------------|
+| file_format         | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`).                                                                                | Optional                                | SQL, Python   | `delta`        |
+| location_root       | The created table uses the specified directory to store its data. The table alias is appended to it.                                                                                                               | Optional                                | SQL, Python   | `/mnt/root`    |
+| partition_by        | Partition the created table by the specified columns. A directory is created for each partition.                                                                                                                   | Optional                                | SQL, Python   | `date_day`     |
+| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional                                | SQL           | `date_day`     |
+| clustered_by        | Each partition in the created table will be split into a fixed number of buckets by the specified columns.                                                                                                         | Optional                                | SQL, Python   | `country_code` |
+| buckets             | The number of buckets to create while clustering                                                                                                                                                                   | Required if `clustered_by` is specified | SQL, Python   | `8`            |
+
+</VersionBlock>
 
 ## Incremental models
 
-dbt-databricks plugin leans heavily on the [`incremental_strategy` config](/docs/build/incremental-models#about-incremental_strategy). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of three values:
+dbt-databricks plugin leans heavily on the [`incremental_strategy` config](/docs/build/incremental-models#about-incremental_strategy). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of four values:
  - **`append`** (default): Insert new records without updating or overwriting any existing data.
  - **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the <Term id="table" /> with new data. If no `partition_by` is specified, overwrite the entire table with new data.
- - **`merge`** (Delta and Hudi file format only): Match records based on a `unique_key`; update old records, insert new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.)
+ - **`merge`** (Delta and Hudi file format only): Match records based on a `unique_key`, updating old records, and inserting new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.)
+ - **`replace_where`** (Delta file format only): Match records based on `incremental_predicates`, replacing all records that match the predicates from the existing table with records matching the predicates from the new data. (If no `incremental_predicates` are specified, all new data is inserted, similar to `append`.)
 
 Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block.
 
@@ -248,6 +266,96 @@ merge into analytics.merge_incremental as DBT_INTERNAL_DEST
 </TabItem>
 </Tabs>
 
+### The `replace_where` strategy
+
+The `replace_where` incremental strategy requires:
+- `file_format: delta`
+- Databricks Runtime 12.0 and above
+
+dbt will run an [atomic `replace where` statement](https://docs.databricks.com/en/delta/selective-overwrite.html#arbitrary-selective-overwrite-with-replacewhere) which selectively overwrites data matching one or more `incremental_predicates` specified as a string or array.  Only rows matching the predicates will be inserted.  If no `incremental_predicates` are specified, dbt will perform an atomic insert, as with `append`.  
+
+:::caution
+
+`replace_where` inserts data into columns in the order provided, rather than by column name.  If you reorder columns and the data is compatible with the existing schema, you may silently insert values into an unexpected column.  If the incoming data is incompatible with the existing schema, you will instead receive an error.
+
+:::
+
+<Tabs
+  defaultValue="source"
+  values={[
+    { label: 'Source code', value: 'source', },
+    { label: 'Run code', value: 'run', },
+]
+}>
+<TabItem value="source">
+
+<File name='replace_where_incremental.sql'>
+
+```sql
+{{ config(
+    materialized='incremental',
+    file_format='delta',
+    incremental_strategy = 'replace_where'
+    incremental_predicates = 'user_id >= 10000' # Never replace users with ids < 10000
+) }}
+
+with new_events as (
+
+    select * from {{ ref('events') }}
+
+    {% if is_incremental() %}
+    where date_day >= date_add(current_date, -1)
+    {% endif %}
+
+)
+
+select
+    user_id,
+    max(date_day) as last_seen
+
+from events
+group by 1
+```
+
+</File>
+</TabItem>
+<TabItem value="run">
+
+<File name='target/run/replace_where_incremental.sql'>
+
+```sql
+create temporary view replace_where__dbt_tmp as
+
+    with new_events as (
+
+        select * from analytics.events
+
+
+        where date_day >= date_add(current_date, -1)
+
+
+    )
+
+    select
+        user_id,
+        max(date_day) as last_seen
+
+    from events
+    group by 1
+
+;
+
+insert into analytics.replace_where_incremental
+    replace where user_id >= 10000
+    table `replace_where__dbt_tmp`
+```
+
+</File>
+
+</TabItem>
+</Tabs>
+
+
 ## Persisting model descriptions
 
 Relation-level docs persistence is supported in dbt v0.17.0. For more
@@ -281,3 +389,48 @@ snapshots:
 ```
 
 </File>
+
+<VersionBlock firstVersion="1.6">
+## Materialized views and streaming stables
+
+Starting with version 1.6.0, the dbt-databricks adapter supports [materialized views](https://docs.databricks.com/en/sql/user/materialized-views.html) and [streaming tables](https://docs.databricks.com/en/sql/load-data-streaming-table.html), as alternatives to incremental tables that are powered by [Delta Live Tables](https://docs.databricks.com/en/delta-live-tables/index.html).
+See [What are Delta Live Tables?](https://docs.databricks.com/en/delta-live-tables/index.html#what-are-delta-live-tables-datasets) for more information and use cases.
+These features are still in preview, and the support in the dbt-databricks adapter should, for now, be considered _experimental_.
+In order to adopt these materialization strategies, you will need a workspace that is enabled for Unity Catalog and serverless SQL Warehouses.
+
+<File name='materialized_view.sql'>
+
+```sql
+{{ config(
+   materialized = 'materialized_view'
+ ) }}
+```
+
+</File>
+
+or
+
+<File name='streaming_table.sql'>
+
+```sql
+{{ config(
+   materialized = 'streaming_table'
+ ) }}
+```
+
+</File>
+
+When dbt detects a pre-existing relation of one of these types, it issues a `REFRESH` [command](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-refresh-full.html).
+
+### Limitations
+
+As mentioned above, support for these materializations in the Databricks adapter is still limited.
+At this time the following configuration options are not available:
+
+* Specifying a refresh schedule for these materializations
+* Specifying `on_configuration_change` settings.
+
+Additionally, if you change the model definition of your materialized view or streaming table, you will need to drop the materialization in your warehouse directly before running dbt again; otherwise, you will get a refresh error.
+
+We plan to address these limitations during the 1.7.x timeframe.
+</VersionBlock>