Skip to content

Staging

JPLangner edited this page Oct 4, 2023 · 11 revisions

The 'Datavault by Scalefree: Stage' node creates the staging layer for the Data Vault model. This layer is mainly for hashing, and additionally gives the option to create derived columns, conduct prejoins, ghostrecords and add NULL values for missing columns. In addition it creates the columns LDTS and RSRC for the Load Date Timestamp and Record Source. Always create one stage per source table that you want to add to the Data Vault model. The staging layer is not to harmonize data. That will be done in the later layers.


User Configuration

Config Name Config Type Explanation
Hash Key Columns Multi-Column Selector In the Config section on the right there is a tab called 'Data Vault'. Extend this and you get a Multi-Column Selector 'Hash Key Columns'. Select each column that is a Hash Key for a specific Data Vault 2.0 entity. Do not select Hash Difference Columns. This section is only needed when multi-active data is processed. This is because in that case the calculation of the hash keys differ.
Hash Difference Columns Multi-Column Selector Extend the Config tab 'Data Vault' and you get a Multi-Column Selector 'Hash Difference Columns'. Select each column that is a Hash Difference for a specific Data Vault 2.0 entity. Do not select regular Hash Key Columns. This section is only needed when multi-active data is processed. This is because in that case the calculation of the hash keys differ.
Ghost Records Toggle In the Config section on the right there is a tab called 'Data Vault'. Extend this and you get a toggle button 'Generate Ghost Records'. If this is enabled (default = true) the Ghost Records will be created whith the node.
Contains Multi-active Data Toggle In the Config section on the right there is a tab called 'Data Vault'. Extend this and you get a toggle button 'Contains Multi-active Data'. If this is enabled the tab for handling multi-active data will show up. Additionally the config for 'Hash Key Columns' and 'Hash Difference Columns' will appear.
Multi-active Key column Multi-Column Selector In the Config section on the right there is a tab called 'Multi-Activity Config'. It is only enabled, when when the Toggle 'Contains Mult-active Data' is set to 'True'. Extend this and you get a toggle button 'Contains Multi-active Data'. Select each column that is a Multi-active Key.
Main Hash Key column columnDropdownSelector In the Config section on the right there is a tab called 'Multi-Activity Config'. It is only enabled, when when the Toggle 'Contains Mult-active Data' is set to 'True'. Extend this and you get a toggle button 'Contains Multi-active Data'. Select the single Main Hash Key Column.

System Columns

Sys Column Name Default Value Explanation
LDTS SYSDATE() The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse. This column doesn't contain the value yet, as you have to define it by yourself. Best would be a column already available within the source data that has a technical arrival timestamp. Only in case that nothing is available, functions like GETDATE() should be used.
RSRC {{ record_source() }} The Record Source (RSRC) describes the source of this data. Inserts the location of the underlying source data to be used as a record source column. Should be replaced with more detailed information if available, for example the storage location within a data lake.

Example 1

Node Graph Node STG_NATION_WIKI Multi-active Config

Description:

  • Source: The source table that you want to use for this stage is 'ORDERS'.

  • LDTS: The LDTS is a system genereted column. The value is set in the 'Transform' field, in this example with the SQL expression 'SYSDATE()'.

  • RSRC: The RSRC is a system genereted column. The value is set in the 'Transform' field, in this example with the Jinja Expression '{{recordsource()}}' which refers to the source table.

  • hashed_columns:

    • HK_ORDERS_H: A hashkey called 'HK_ORDERS_H' is defined, that is calculated out of the business key 'O_ORDERKEY'.

    • HK_CUSTOMER_H: A hashkey called 'HK_CUSTOMER_H' is defined, that is calculated out of the business key 'O_CUSTKEY'.

    • HD_ORDERS_S: A hashdiff called 'HD_ORDERS_S' is calculated out of the descriptive attributes of the source table 'ORDERS'.

    • HK_ORDERS_CUSTOMER_L: A hashkey of a Link called 'HK_ORDERS_CUSTOMER_L' is defined, that is calculated out of the business keys 'O_ORDERKEY' and 'O_CUSTKEY'.


Generated SQL Code of Example 1

EXPLAIN USING text
CREATE OR REPLACE VIEW
"COALESCE_WORKSHOP"."CORE"."DVSTG_ORDERS" (
    "O_ORDERKEY",
    "O_CUSTKEY",
    "O_ORDERSTATUS",
    "O_TOTALPRICE",
    "O_ORDERDATE",
    "O_ORDERPRIORITY",
    "O_CLERK",
    "O_SHIPPRIORITY",
    "O_COMMENT",
    "LDTS"
    COMMENT 'The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse.',
    "RSRC"
    COMMENT 'The Record Source (RSRC) describes the source of this data.',
    "HK_ORDERS_H",
    "HK_CUSTOMER_H",
    "HK_ORDERS_CUSTOMER_L",
    "HD_ORDERS_S"
)
COMMENT = 'Orders data as defined by TPC-H' AS
SELECT
"ORDERS"."O_ORDERKEY" AS "O_ORDERKEY",
"ORDERS"."O_CUSTKEY" AS "O_CUSTKEY",
"ORDERS"."O_ORDERSTATUS" AS "O_ORDERSTATUS",
"ORDERS"."O_TOTALPRICE" AS "O_TOTALPRICE",
"ORDERS"."O_ORDERDATE" AS "O_ORDERDATE",
"ORDERS"."O_ORDERPRIORITY" AS "O_ORDERPRIORITY",
"ORDERS"."O_CLERK" AS "O_CLERK",
"ORDERS"."O_SHIPPRIORITY" AS "O_SHIPPRIORITY",
"ORDERS"."O_COMMENT" AS "O_COMMENT",
SYSDATE() AS "LDTS",
'"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."ORDERS"' AS "RSRC",
IFNULL(
    SHA1(
    NULLIF(
        CAST(
        REGEXP_REPLACE(
            REGEXP_REPLACE(
            REGEXP_REPLACE(
                REGEXP_REPLACE(
                UPPER(
                    CONCAT(
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_ORDERKEY" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    )
                    )
                ),
                '\n',
                ''
                ),
                '\t',
                ''
            ),
            '\v',
            ''
            ),
            '\r',
            ''
        ) AS STRING
        ),
        '^^'
    )
    ),
    '0000000000000000000000000000000000000000'
) AS "HK_ORDERS_H",
IFNULL(
    SHA1(
    NULLIF(
        CAST(
        REGEXP_REPLACE(
            REGEXP_REPLACE(
            REGEXP_REPLACE(
                REGEXP_REPLACE(
                UPPER(
                    CONCAT(
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_CUSTKEY" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    )
                    )
                ),
                '\n',
                ''
                ),
                '\t',
                ''
            ),
            '\v',
            ''
            ),
            '\r',
            ''
        ) AS STRING
        ),
        '^^'
    )
    ),
    '0000000000000000000000000000000000000000'
) AS "HK_CUSTOMER_H",
IFNULL(
    SHA1(
    NULLIF(
        CAST(
        REGEXP_REPLACE(
            REGEXP_REPLACE(
            REGEXP_REPLACE(
                REGEXP_REPLACE(
                UPPER(
                    CONCAT(
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_ORDERKEY" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    ),
                    '||',
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_CUSTKEY" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    )
                    )
                ),
                '\n',
                ''
                ),
                '\t',
                ''
            ),
            '\v',
            ''
            ),
            '\r',
            ''
        ) AS STRING
        ),
        '^^||^^'
    )
    ),
    '0000000000000000000000000000000000000000'
) AS "HK_ORDERS_CUSTOMER_L",
IFNULL(
    SHA1(
    NULLIF(
        CAST(
        REGEXP_REPLACE(
            REGEXP_REPLACE(
            REGEXP_REPLACE(
                REGEXP_REPLACE(
                UPPER(
                    CONCAT(
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_ORDERSTATUS" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    ),
                    '||',
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_TOTALPRICE" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    ),
                    '||',
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_ORDERDATE" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    ),
                    '||',
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_ORDERPRIORITY" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    ),
                    '||',
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_CLERK" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    ),
                    '||',
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_SHIPPRIORITY" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    ),
                    '||',
                    COALESCE(
                        (
                        CONCAT(
                            '\"',
                            REPLACE(
                            REPLACE(
                                REPLACE(
                                TRIM(CAST("ORDERS"."O_COMMENT" AS STRING)),
                                '\\',
                                '\\\\'
                                ),
                                '"',
                                '\"'
                            ),
                            '^^',
                            '--'
                            ),
                            '\"'
                        )
                        ),
                        '^^'
                    )
                    )
                ),
                '\n',
                ''
                ),
                '\t',
                ''
            ),
            '\v',
            ''
            ),
            '\r',
            ''
        ) AS STRING
        ),
        '^^||^^||^^||^^||^^||^^||^^'
    )
    ),
    '0000000000000000000000000000000000000000'
) AS "HD_ORDERS_S"
FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."ORDERS" "ORDERS"
UNION ALL
SELECT
0 AS O_ORDERKEY,
0 AS O_CUSTKEY,
'u' AS O_ORDERSTATUS,
0 AS O_TOTALPRICE,
TO_TIMESTAMP('0001-01-01T00:00:01', 'YYYY-MM-DDTHH24:MI:SS') AS O_ORDERDATE,
'(unknown)' AS O_ORDERPRIORITY,
'(unknown)' AS O_CLERK,
0 AS O_SHIPPRIORITY,
'(unknown)' AS O_COMMENT,
TO_TIMESTAMP('0001-01-01T00:00:01', 'YYYY-MM-DDTHH24:MI:SS') AS LDTS,
'(unknown)' AS RSRC,
'0000000000000000000000000000000000000000' as HK_ORDERS_H,
'0000000000000000000000000000000000000000' as HK_CUSTOMER_H,
'0000000000000000000000000000000000000000' as HK_ORDERS_CUSTOMER_L,
'0000000000000000000000000000000000000000' as HD_ORDERS_S
UNION ALL
SELECT
-1 AS O_ORDERKEY,
-1 AS O_CUSTKEY,
'e' AS O_ORDERSTATUS,
-1 AS O_TOTALPRICE,
TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS') AS O_ORDERDATE,
'(error)' AS O_ORDERPRIORITY,
'(error)' AS O_CLERK,
-1 AS O_SHIPPRIORITY,
'(error)' AS O_COMMENT,
TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS') AS LDTS,
'(error)' AS RSRC,
'ffffffffffffffffffffffffffffffffffffffff' as HK_ORDERS_H,
'ffffffffffffffffffffffffffffffffffffffff' as HK_CUSTOMER_H,
'ffffffffffffffffffffffffffffffffffffffff' as HK_ORDERS_CUSTOMER_L,
'ffffffffffffffffffffffffffffffffffffffff' as HD_ORDERS_S