Poorly sturctured data at source system
4 posters
Page 1 of 1
Poorly sturctured data at source system
If the data in the source system is poorly structured and not normalized in any way, what is the most typical way to handle this? Is it common to try and normalize the data as it moves through the staging area before pushing it out into a dimensional data mart? We're pulling data from one of our own websites that is stored in SQL server, but it isn't really structured in any way. Would the right approach be to try and normalize it first?
JSchroeder- Posts : 12
Join date : 2012-03-29
Reverse the telescope?
Hi JSchroeder,
I wouldn't make any effort to normalize data for a Data Mart application - I prefer the Star Schema design which is heavily de-normalized.
I also prefer the "aggresive load" strategy, where source system data is accepted as valid and correct until proven otherwise. In reality its typically the most heavily scrutinized data available, flawed or not.
I'd see examples of "poor structure" as an opportunity for the Data Mart application to add value. You can clean data, add hierarchies, banding etc.
Good luck!
Mike
I wouldn't make any effort to normalize data for a Data Mart application - I prefer the Star Schema design which is heavily de-normalized.
I also prefer the "aggresive load" strategy, where source system data is accepted as valid and correct until proven otherwise. In reality its typically the most heavily scrutinized data available, flawed or not.
I'd see examples of "poor structure" as an opportunity for the Data Mart application to add value. You can clean data, add hierarchies, banding etc.
Good luck!
Mike
Re: Poorly sturctured data at source system
I wouldn't normalize it either. Normalizing requires two extensive sets of ETL logic. One to extract and normalize to ETL staging area (ODS). Another set of ETL logic to extract and load to the dimensional warehouse. I try to minimize the amount of transformation logic that must be done to stage the data, usually just adding dates to store a few weeks of data in case I need to reload anything to the warehouse.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Poorly sturctured data at source system
I wouldn't normalize it either. Normalizing requires two extensive sets of ETL logic. One to extract and normalize to ETL staging area (ODS). Another set of ETL logic to extract and load to the dimensional warehouse. I try to minimize the amount of transformation logic that must be done to stage the data, usually just adding dates to store a few weeks of data in case I need to reload anything to the warehouse.
harisrana- Posts : 1
Join date : 2015-02-12
Similar topics
» Is it a best practice that Data warehouse follows the source system data type?
» Can Data Warehouse be used as source for operational system
» Extracting data when there is no timestamp in the source system
» How to handle situations where the data is deleted from the source system?
» Data Architecture for Single Source System (Normalised).
» Can Data Warehouse be used as source for operational system
» Extracting data when there is no timestamp in the source system
» How to handle situations where the data is deleted from the source system?
» Data Architecture for Single Source System (Normalised).
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum