Staging Activities
Page 1 of 1
Staging Activities
What are the common staging activities that the Kimball method recognizes? Traditionally I have used staging databases to
1) Decouple from the source-to-Data Mart for restartability
2) Consolidate multiple sources and source formats into a single homogeneous environment
3) Data cleansing, such as populate missing data, scrubbing fields, etc.
But some articles I've seen talk about surrogate key handling and conforming within the staging database. Most ETL tools I've used will read data from a staging table that contains only business or natural keys, do a lookup on the dimension table in the data mart, and populate the dimension table in the data mart. In this scenario, both a connection to staging and the data mart are in play, but no surrogate key handling is done "within staging" but only within memory. When would you ever store dimensional surrogate keys in staging?
As for the conforming location, this seems subjective on every project I've been on. Some will change the column names, data types, merge columns or change their meaning, etc. within Staging. Then when they load the data marts, most of the transformation logic has already been done. Other projects will pull the data over "as-is" into staging, to maintain a clean audit trail of which columns, tables, etc. are absolutely required to load the DW (e.g. only 5 tables from the original 20 tables in the source, and only a subset of fields within each table, only the ones that are required in the DW). Because the field names and table names stay the same, it is easy to trace the source-to-target lineage.
What I want to know is what activities does Kimball Group advocate within Staging? Changing the DB schema or mirror the source(s)? Store surrogate keys for mapping?
1) Decouple from the source-to-Data Mart for restartability
2) Consolidate multiple sources and source formats into a single homogeneous environment
3) Data cleansing, such as populate missing data, scrubbing fields, etc.
But some articles I've seen talk about surrogate key handling and conforming within the staging database. Most ETL tools I've used will read data from a staging table that contains only business or natural keys, do a lookup on the dimension table in the data mart, and populate the dimension table in the data mart. In this scenario, both a connection to staging and the data mart are in play, but no surrogate key handling is done "within staging" but only within memory. When would you ever store dimensional surrogate keys in staging?
As for the conforming location, this seems subjective on every project I've been on. Some will change the column names, data types, merge columns or change their meaning, etc. within Staging. Then when they load the data marts, most of the transformation logic has already been done. Other projects will pull the data over "as-is" into staging, to maintain a clean audit trail of which columns, tables, etc. are absolutely required to load the DW (e.g. only 5 tables from the original 20 tables in the source, and only a subset of fields within each table, only the ones that are required in the DW). Because the field names and table names stay the same, it is easy to trace the source-to-target lineage.
What I want to know is what activities does Kimball Group advocate within Staging? Changing the DB schema or mirror the source(s)? Store surrogate keys for mapping?
kskistad- Posts : 11
Join date : 2009-02-03
Similar topics
» Design of Staging Tables
» Staging area design
» Start Schema vs Snow flake schema
» Where do you put your staging Tables? DWH DB or as a separate DB
» Extract to staging: use constraints or not?
» Staging area design
» Start Schema vs Snow flake schema
» Where do you put your staging Tables? DWH DB or as a separate DB
» Extract to staging: use constraints or not?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|