Staging area design
5 posters
Page 1 of 1
Staging area design
Hi,
I need to design a staging area (doing it first time). SO I have few questions related to staging area design.
My data source is OLTP model which highly normalized. So as a part of ETL design process its better to have staging area so that we trouble the performance of OLTP to get data in DW.
Now questions is what should be the design of staging are.
1. Should we create same table structure in staging are as like in OLTP model along with few more date columns for extracting and storing the data in staging.
e.g. If I have customer data is in 3 tables in OLTP source, should i create same 3 table in staging area so that we can just extract the data from source and keep it in staging so that we dont need to hit source again and again.
Or is it good to have one table per dimension in staging area and extract data from source, de-normalize it in memory using ETL tool and store in staging are
What's your thought on this would like to know if any other approach with advantage on other approach.
I need to design a staging area (doing it first time). SO I have few questions related to staging area design.
My data source is OLTP model which highly normalized. So as a part of ETL design process its better to have staging area so that we trouble the performance of OLTP to get data in DW.
Now questions is what should be the design of staging are.
1. Should we create same table structure in staging are as like in OLTP model along with few more date columns for extracting and storing the data in staging.
e.g. If I have customer data is in 3 tables in OLTP source, should i create same 3 table in staging area so that we can just extract the data from source and keep it in staging so that we dont need to hit source again and again.
Or is it good to have one table per dimension in staging area and extract data from source, de-normalize it in memory using ETL tool and store in staging are
What's your thought on this would like to know if any other approach with advantage on other approach.
Re: Staging area design
I think a staging area in 3NF is a good thing. It might have additional items than the source system in case you need hierarchies that don't exist in transacation system.
But, I wouldn't go overboard with the 3NF in the staging area.
But, I wouldn't go overboard with the 3NF in the staging area.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Staging and 3NF
I assume this is a staging area that will be used to source a data warehouse. For a DW project I was recently on, we built our staging area to look almost exactly like the source system's database. But we dropped the foreign key constraints. This allowed us to load tables in parallel, truncate individual tables and re-load, etc.
Why keep staging like the 3NF source system design? Why not? If you have a dimensional design planned for the DW, then why have the staging area be some OTHER model of the data from source or DW designs? The more different models of the data, the harder it will be to communicate within the team. Also, if stage is different from the source, then it will make communication with the source system owners difficult.
Keep it simple!
Why keep staging like the 3NF source system design? Why not? If you have a dimensional design planned for the DW, then why have the staging area be some OTHER model of the data from source or DW designs? The more different models of the data, the harder it will be to communicate within the team. Also, if stage is different from the source, then it will make communication with the source system owners difficult.
Keep it simple!
owvi4- Posts : 3
Join date : 2009-04-16
Location : Indianapolis, IN
Re: Staging area design
Personally, I do the opposite. I tend to design staging tables to better suit the target rather than the source.
Reasons:
1. ETL is usually a two-step process. Stage then load. if the staging does mild transformations to better suit the target, I need only create one set of load processes. If the DW gets similar data from multiple sources. all I need to do is create new source specific staging processes and let the existing load processes handle the new source.
2. Sources change. I don't want to rewrite ETL processes from end-to-end because of a change in the source.
3. Most of the heavy transformation logic occurs on the load side. With the staging tables closer in structure to the target, the load process code tends to be simpler.
Reasons:
1. ETL is usually a two-step process. Stage then load. if the staging does mild transformations to better suit the target, I need only create one set of load processes. If the DW gets similar data from multiple sources. all I need to do is create new source specific staging processes and let the existing load processes handle the new source.
2. Sources change. I don't want to rewrite ETL processes from end-to-end because of a change in the source.
3. Most of the heavy transformation logic occurs on the load side. With the staging tables closer in structure to the target, the load process code tends to be simpler.
Re: Staging area design
I think it's easier to create dimensions from 3nf. With fact tables, it's whatever gets it done.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Similar topics
» Start Schema vs Snow flake schema
» Staging area.. is it a must?
» Design of Staging Tables
» Data redundancy in the staging area
» Staging Area, copy tables
» Staging area.. is it a must?
» Design of Staging Tables
» Data redundancy in the staging area
» Staging Area, copy tables
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum