-
Notifications
You must be signed in to change notification settings - Fork 3
Staging
The 'Datavault by Scalefree: Stage' node creates the staging layer for the Data Vault model. This layer is mainly for hashing, and additionally gives the option to create derived columns, conduct prejoins, ghostrecords and add NULL values for missing columns. In addition it creates the columns LDTS and RSRC for the Load Date Timestamp and Record Source. Always create one stage per source table that you want to add to the Data Vault model. The staging layer is not to harmonize data. That will be done in the later layers.
Config Name | Config Type | Explanation |
---|---|---|
Hash Key Columns | Multi-Column Selector | In the Config section on the right there is a tab called 'Data Vault'. Extend this and you get a Multi-Column Selector 'Hash Key Columns'. Select each column that is a Hash Key for a specific Data Vault 2.0 entity. Do not select Hash Difference Columns. This section is only needed when multi-active data is processed. This is because in that case the calculation of the hash keys differ. |
Hash Difference Columns | Multi-Column Selector | Extend the Config tab 'Data Vault' and you get a Multi-Column Selector 'Hash Difference Columns'. Select each column that is a Hash Difference for a specific Data Vault 2.0 entity. Do not select regular Hash Key Columns. This section is only needed when multi-active data is processed. This is because in that case the calculation of the hash keys differ. |
Ghost Records | Toggle | In the Config section on the right there is a tab called 'Data Vault'. Extend this and you get a toggle button 'Generate Ghost Records'. If this is enabled (default = true) the Ghost Records will be created whith the node. |
Contains Multi-active Data | Toggle | In the Config section on the right there is a tab called 'Data Vault'. Extend this and you get a toggle button 'Contains Multi-active Data'. If this is enabled the tab for handling multi-active data will show up. Additionally the config for 'Hash Key Columns' and 'Hash Difference Columns' will appear. |
Multi-active Key column | Multi-Column Selector | In the Config section on the right there is a tab called 'Multi-Activity Config'. It is only enabled, when when the Toggle 'Contains Mult-active Data' is set to 'True'. Extend this and you get a toggle button 'Contains Multi-active Data'. Select each column that is a Multi-active Key. |
Main Hash Key column | columnDropdownSelector | In the Config section on the right there is a tab called 'Multi-Activity Config'. It is only enabled, when when the Toggle 'Contains Mult-active Data' is set to 'True'. Extend this and you get a toggle button 'Contains Multi-active Data'. Select the single Main Hash Key Column. |
Sys Column Name | Default Value | Explanation |
---|---|---|
LDTS | SYSDATE() | The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse. This column doesn't contain the value yet, as you have to define it by yourself. Best would be a column already available within the source data that has a technical arrival timestamp. Only in case that nothing is available, functions like GETDATE() should be used. |
RSRC | {{ record_source() }} | The Record Source (RSRC) describes the source of this data. Inserts the location of the underlying source data to be used as a record source column. Should be replaced with more detailed information if available, for example the storage location within a data lake. |
-
Source: The source table that you want to use for this stage is 'ORDERS'.
-
LDTS: The LDTS is a system genereted column. The value is set in the 'Transform' field, in this example with the SQL expression 'SYSDATE()'.
-
RSRC: The RSRC is a system genereted column. The value is set in the 'Transform' field, in this example with the Jinja Expression '{{recordsource()}}' which refers to the source table.
-
hashed_columns:
-
HK_ORDERS_H: A hashkey called 'HK_ORDERS_H' is defined, that is calculated out of the business key 'O_ORDERKEY'.
-
HK_CUSTOMER_H: A hashkey called 'HK_CUSTOMER_H' is defined, that is calculated out of the business key 'O_CUSTKEY'.
-
HD_ORDERS_S: A hashdiff called 'HD_ORDERS_S' is calculated out of the descriptive attributes of the source table 'ORDERS'.
-
HK_ORDERS_CUSTOMER_L: A hashkey of a Link called 'HK_ORDERS_CUSTOMER_L' is defined, that is calculated out of the business keys 'O_ORDERKEY' and 'O_CUSTKEY'.
-
EXPLAIN USING text
CREATE OR REPLACE VIEW
"COALESCE_WORKSHOP"."CORE"."DVSTG_ORDERS" (
"O_ORDERKEY",
"O_CUSTKEY",
"O_ORDERSTATUS",
"O_TOTALPRICE",
"O_ORDERDATE",
"O_ORDERPRIORITY",
"O_CLERK",
"O_SHIPPRIORITY",
"O_COMMENT",
"LDTS"
COMMENT 'The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse.',
"RSRC"
COMMENT 'The Record Source (RSRC) describes the source of this data.',
"HK_ORDERS_H",
"HK_CUSTOMER_H",
"HK_ORDERS_CUSTOMER_L",
"HD_ORDERS_S"
)
COMMENT = 'Orders data as defined by TPC-H' AS
SELECT
"ORDERS"."O_ORDERKEY" AS "O_ORDERKEY",
"ORDERS"."O_CUSTKEY" AS "O_CUSTKEY",
"ORDERS"."O_ORDERSTATUS" AS "O_ORDERSTATUS",
"ORDERS"."O_TOTALPRICE" AS "O_TOTALPRICE",
"ORDERS"."O_ORDERDATE" AS "O_ORDERDATE",
"ORDERS"."O_ORDERPRIORITY" AS "O_ORDERPRIORITY",
"ORDERS"."O_CLERK" AS "O_CLERK",
"ORDERS"."O_SHIPPRIORITY" AS "O_SHIPPRIORITY",
"ORDERS"."O_COMMENT" AS "O_COMMENT",
SYSDATE() AS "LDTS",
'"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."ORDERS"' AS "RSRC",
IFNULL(
SHA1(
NULLIF(
CAST(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
UPPER(
CONCAT(
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_ORDERKEY" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
)
)
),
'\n',
''
),
'\t',
''
),
'\v',
''
),
'\r',
''
) AS STRING
),
'^^'
)
),
'0000000000000000000000000000000000000000'
) AS "HK_ORDERS_H",
IFNULL(
SHA1(
NULLIF(
CAST(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
UPPER(
CONCAT(
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_CUSTKEY" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
)
)
),
'\n',
''
),
'\t',
''
),
'\v',
''
),
'\r',
''
) AS STRING
),
'^^'
)
),
'0000000000000000000000000000000000000000'
) AS "HK_CUSTOMER_H",
IFNULL(
SHA1(
NULLIF(
CAST(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
UPPER(
CONCAT(
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_ORDERKEY" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
),
'||',
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_CUSTKEY" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
)
)
),
'\n',
''
),
'\t',
''
),
'\v',
''
),
'\r',
''
) AS STRING
),
'^^||^^'
)
),
'0000000000000000000000000000000000000000'
) AS "HK_ORDERS_CUSTOMER_L",
IFNULL(
SHA1(
NULLIF(
CAST(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
UPPER(
CONCAT(
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_ORDERSTATUS" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
),
'||',
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_TOTALPRICE" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
),
'||',
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_ORDERDATE" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
),
'||',
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_ORDERPRIORITY" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
),
'||',
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_CLERK" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
),
'||',
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_SHIPPRIORITY" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
),
'||',
COALESCE(
(
CONCAT(
'\"',
REPLACE(
REPLACE(
REPLACE(
TRIM(CAST("ORDERS"."O_COMMENT" AS STRING)),
'\\',
'\\\\'
),
'"',
'\"'
),
'^^',
'--'
),
'\"'
)
),
'^^'
)
)
),
'\n',
''
),
'\t',
''
),
'\v',
''
),
'\r',
''
) AS STRING
),
'^^||^^||^^||^^||^^||^^||^^'
)
),
'0000000000000000000000000000000000000000'
) AS "HD_ORDERS_S"
FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."ORDERS" "ORDERS"
UNION ALL
SELECT
0 AS O_ORDERKEY,
0 AS O_CUSTKEY,
'u' AS O_ORDERSTATUS,
0 AS O_TOTALPRICE,
TO_TIMESTAMP('0001-01-01T00:00:01', 'YYYY-MM-DDTHH24:MI:SS') AS O_ORDERDATE,
'(unknown)' AS O_ORDERPRIORITY,
'(unknown)' AS O_CLERK,
0 AS O_SHIPPRIORITY,
'(unknown)' AS O_COMMENT,
TO_TIMESTAMP('0001-01-01T00:00:01', 'YYYY-MM-DDTHH24:MI:SS') AS LDTS,
'(unknown)' AS RSRC,
'0000000000000000000000000000000000000000' as HK_ORDERS_H,
'0000000000000000000000000000000000000000' as HK_CUSTOMER_H,
'0000000000000000000000000000000000000000' as HK_ORDERS_CUSTOMER_L,
'0000000000000000000000000000000000000000' as HD_ORDERS_S
UNION ALL
SELECT
-1 AS O_ORDERKEY,
-1 AS O_CUSTKEY,
'e' AS O_ORDERSTATUS,
-1 AS O_TOTALPRICE,
TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS') AS O_ORDERDATE,
'(error)' AS O_ORDERPRIORITY,
'(error)' AS O_CLERK,
-1 AS O_SHIPPRIORITY,
'(error)' AS O_COMMENT,
TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS') AS LDTS,
'(error)' AS RSRC,
'ffffffffffffffffffffffffffffffffffffffff' as HK_ORDERS_H,
'ffffffffffffffffffffffffffffffffffffffff' as HK_CUSTOMER_H,
'ffffffffffffffffffffffffffffffffffffffff' as HK_ORDERS_CUSTOMER_L,
'ffffffffffffffffffffffffffffffffffffffff' as HD_ORDERS_S
Table of Content
- Staging
- DV-Entities
- Hubs
- Links
- Satellites
- Standard Satellite
- Multi-Active Satellite
- Non-Historized Satellite
- Record-Tracking Satellite
- Business Vault
- PIT
- Snapshot Control
- Global Variables