-
Notifications
You must be signed in to change notification settings - Fork 3
Hubs
tta-scalefree edited this page Sep 29, 2023
·
10 revisions
The "Datavault by Scalefree: Hub" node creates a Hub entity based on one or more stage nodes.
In the "Config" section on the right hand-side, expand section "Data Vault" to modify the following configurations for the node:
Config Name | Config Type | Explanation |
---|---|---|
Hub Hashkey Column | columnDropdownSelector | Select the hashkey column of this hub from the drop-down menu. |
No Sys Columns are added inside a Hub Node.
In this example we got a multi source Hub. The source tables are 'STG_CUSTOMER_WIKI' and 'STG_ORDERS_WIKI'.
Create the node:
- Select the 'STG_CUSTOMER_WIKI' and create the node 'HUB_CUSTOMER_WIKI' with the node type 'Datavault by Scalefree: Hub'
- Rename the Hub using your naming conventions. In this case rename it to 'CUSTOMER_WIKI_H'
- Open the node and delete all unnecessary columns, so that you only keep the Hashkey, Business Key, LoadDate Timestamp and Record Source
- Expand the 'Options' tab on the right side and toggle 'Multi Source' to set it to 'true'
- On the left side you can choose between 'Nodes', 'Sub Graphs' and 'Jobs', choose 'Nodes'
- Click on the '+' (plus) below 'CUSTOMER_WIKI_H' to add a second source
- Click on 'NEW_SOURCE' and rename it to 'STG_ORDERS_WIKI', in this case, as we add this table a second source
- Now expand the node type 'Datavault by Scalefree: Stage' and drag and drop the second source table 'STG_ORDERS_WIKI' to the right side 'Drop a table here to map to columns in the source below'
- This will map the existing columns with matching names from the new source to the original source
- The Business Key 'C_CUSTKEY' isn't mapped automatically, as the names differ between these two sources
- Map them manually by clicking on the empty 'Source' field and select the 'STG_ORDERS.O_CUSTKEY', as this is the euqivilent to the 'STG_CUSTOMER_WIKI.C_CUSTKEY'
- Switch from the 'Mapping' tab to the 'Join' tab in the source 'STG_ORDERS'
- As you can see, this field is empty, which is not allowed as it can't deliver data that way
- On the right side click 'Generate Join' and choose 'Copy to Editor' to directly write that expression to the editor
- In the Config section on the right there is a tab called 'Data Vault'. Expand this and you get a column selector list 'Hub Hashkey Column'. Select the hashkey of this hub and bring it to the right side.
- After creating and running the Hub node, the lineage on the graph will update
CREATE OR REPLACE TABLE
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H" (
"HK_CUSTOMER_H" STRING,
"C_CUSTKEY" NUMBER (38, 0) NOT NULL,
"LDTS" TIMESTAMP
COMMENT 'The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse.',
"RSRC" STRING
COMMENT 'The Record Source (RSRC) describes the source of this data.'
)
COMMENT = 'Customer data as defined by TPC-H'
INSERT INTO
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H"
WITH
incoming AS (
SELECT DISTINCT
"STG_CUSTOMER_WIKI"."HK_CUSTOMER_H" AS "HK_CUSTOMER_H",
"STG_CUSTOMER_WIKI"."C_CUSTKEY" AS "C_CUSTKEY",
"STG_CUSTOMER_WIKI"."LDTS" AS "LDTS",Hi
"STG_CUSTOMER_WIKI"."RSRC" AS "RSRC"
FROM
"COALESCE_WORKSHOP"."CORE"."STG_CUSTOMER_WIKI" "STG_CUSTOMER_WIKI"
),
new_records AS (
SELECT
"SRC".*
FROM
incoming "SRC"
WHERE
NOT EXISTS (
SELECT
1
FROM
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H" "TGT"
WHERE
"SRC"."HK_CUSTOMER_H" = "TGT"."HK_CUSTOMER_H"
) QUALIFY ROW_NUMBER() OVER (
PARTITION BY
"HK_CUSTOMER_H"
ORDER BY
"LDTS"
) = 1
)
SELECT
*
FROM
new_records
INSERT INTO
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H"
WITH
incoming AS (
SELECT DISTINCT
"STG_ORDERS_WIKI"."HK_CUSTOMER_H" AS "HK_CUSTOMER_H",
"STG_ORDERS_WIKI"."O_CUSTKEY" AS "C_CUSTKEY",
"STG_ORDERS_WIKI"."LDTS" AS "LDTS",
"STG_ORDERS_WIKI"."RSRC" AS "RSRC"
FROM
"COALESCE_WORKSHOP"."CORE"."STG_ORDERS_WIKI" "STG_ORDERS_WIKI"
),
new_records AS (
SELECT
"SRC".*
FROM
incoming "SRC"
WHERE
NOT EXISTS (
SELECT
1
FROM
"COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H" "TGT"
WHERE
"SRC"."HK_CUSTOMER_H" = "TGT"."HK_CUSTOMER_H"
) QUALIFY ROW_NUMBER() OVER (
PARTITION BY
"HK_CUSTOMER_H"
ORDER BY
"LDTS"
) = 1
)
SELECT
*
FROM
new_records
Table of Content
- Staging
- DV-Entities
- Hubs
- Links
- Satellites
- Standard Satellite
- Multi-Active Satellite
- Non-Historized Satellite
- Record-Tracking Satellite
- Business Vault
- PIT
- Snapshot Control
- Global Variables