Skip to content
tta-scalefree edited this page Sep 29, 2023 · 10 revisions

The "Datavault by Scalefree: Hub" node creates a Hub entity based on one or more stage nodes.


User Configuration

In the "Config" section on the right hand-side, expand section "Data Vault" to modify the following configurations for the node:

Config Name Config Type Explanation
Hub Hashkey Column columnDropdownSelector Select the hashkey column of this hub from the drop-down menu.

System Columns

No Sys Columns are added inside a Hub Node.


Example 1

Description

In this example we got a multi source Hub. The source tables are 'STG_CUSTOMER_WIKI' and 'STG_ORDERS_WIKI'.

Create the node:

HUB Customer Graph

  • Select the 'STG_CUSTOMER_WIKI' and create the node 'HUB_CUSTOMER_WIKI' with the node type 'Datavault by Scalefree: Hub'
  • Rename the Hub using your naming conventions. In this case rename it to 'CUSTOMER_WIKI_H'

HUB Customer Multi Source

  • Open the node and delete all unnecessary columns, so that you only keep the Hashkey, Business Key, LoadDate Timestamp and Record Source
  • Expand the 'Options' tab on the right side and toggle 'Multi Source' to set it to 'true'

HUB Customer Add Source HUB Customer Order BK

  • On the left side you can choose between 'Nodes', 'Sub Graphs' and 'Jobs', choose 'Nodes'
    • Click on the '+' (plus) below 'CUSTOMER_WIKI_H' to add a second source
    • Click on 'NEW_SOURCE' and rename it to 'STG_ORDERS_WIKI', in this case, as we add this table a second source
    • Now expand the node type 'Datavault by Scalefree: Stage' and drag and drop the second source table 'STG_ORDERS_WIKI' to the right side 'Drop a table here to map to columns in the source below'
    • This will map the existing columns with matching names from the new source to the original source
    • The Business Key 'C_CUSTKEY' isn't mapped automatically, as the names differ between these two sources
    • Map them manually by clicking on the empty 'Source' field and select the 'STG_ORDERS.O_CUSTKEY', as this is the euqivilent to the 'STG_CUSTOMER_WIKI.C_CUSTKEY'

HUB Customer Order Join

  • Switch from the 'Mapping' tab to the 'Join' tab in the source 'STG_ORDERS'
    • As you can see, this field is empty, which is not allowed as it can't deliver data that way
    • On the right side click 'Generate Join' and choose 'Copy to Editor' to directly write that expression to the editor

image

  • In the Config section on the right there is a tab called 'Data Vault'. Expand this and you get a column selector list 'Hub Hashkey Column'. Select the hashkey of this hub and bring it to the right side.

HUB Customer Graph Lineage

  • After creating and running the Hub node, the lineage on the graph will update


Generated SQL Code of Example 2

Create script

  CREATE OR REPLACE TABLE
    "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H" (
      "HK_CUSTOMER_H" STRING,
      "C_CUSTKEY" NUMBER (38, 0) NOT NULL,
      "LDTS" TIMESTAMP
      COMMENT 'The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse.',
      "RSRC" STRING
      COMMENT 'The Record Source (RSRC) describes the source of this data.'
    )
  COMMENT = 'Customer data as defined by TPC-H'

Run script

  INSERT INTO
    "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H"
  WITH
    incoming AS (
      SELECT DISTINCT
        "STG_CUSTOMER_WIKI"."HK_CUSTOMER_H" AS "HK_CUSTOMER_H",
        "STG_CUSTOMER_WIKI"."C_CUSTKEY" AS "C_CUSTKEY",
        "STG_CUSTOMER_WIKI"."LDTS" AS "LDTS",Hi
        "STG_CUSTOMER_WIKI"."RSRC" AS "RSRC"
      FROM
        "COALESCE_WORKSHOP"."CORE"."STG_CUSTOMER_WIKI" "STG_CUSTOMER_WIKI"
    ),
    new_records AS (
      SELECT
        "SRC".*
      FROM
        incoming "SRC"
      WHERE
        NOT EXISTS (
          SELECT
            1
          FROM
            "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H" "TGT"
          WHERE
            "SRC"."HK_CUSTOMER_H" = "TGT"."HK_CUSTOMER_H" 
        ) QUALIFY ROW_NUMBER() OVER (
          PARTITION BY
            "HK_CUSTOMER_H"
          ORDER BY
            "LDTS"
        ) = 1
    )
  SELECT
    *
  FROM
    new_records

  INSERT INTO
    "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H"
  WITH
    incoming AS (
      SELECT DISTINCT
        "STG_ORDERS_WIKI"."HK_CUSTOMER_H" AS "HK_CUSTOMER_H",
        "STG_ORDERS_WIKI"."O_CUSTKEY" AS "C_CUSTKEY",
        "STG_ORDERS_WIKI"."LDTS" AS "LDTS",
        "STG_ORDERS_WIKI"."RSRC" AS "RSRC"
      FROM
        "COALESCE_WORKSHOP"."CORE"."STG_ORDERS_WIKI" "STG_ORDERS_WIKI"
    ),
    new_records AS (
      SELECT
        "SRC".*
      FROM
        incoming "SRC"
      WHERE
        NOT EXISTS (
          SELECT
            1
          FROM
            "COALESCE_WORKSHOP"."CORE"."CUSTOMER_WIKI_H" "TGT"
          WHERE
            "SRC"."HK_CUSTOMER_H" = "TGT"."HK_CUSTOMER_H"
        ) QUALIFY ROW_NUMBER() OVER (
          PARTITION BY
            "HK_CUSTOMER_H"
          ORDER BY
            "LDTS"
        ) = 1
    )
  SELECT
    *
  FROM
    new_records