Data redundancy in the staging area
3 posters
Page 1 of 1
Data redundancy in the staging area
I have to design the ETL process and staging area for a new data warehouse and I am not sure whether it makes sense to keep a copy of all dimension and fact tables in the staging area.
The ETL process will be fed by a daily extract from SAP. Master data like material, customer etc. come as a complete download whereas transactional data come in daily buckets.
It seems like a waste of storage to keep the tables in the staging area and then copy them into the final data marts. Staging area and presentation area are within the same data base so I can easily access the tables in the presentation area if necessary.
Regards,
Fabio
The ETL process will be fed by a daily extract from SAP. Master data like material, customer etc. come as a complete download whereas transactional data come in daily buckets.
It seems like a waste of storage to keep the tables in the staging area and then copy them into the final data marts. Staging area and presentation area are within the same data base so I can easily access the tables in the presentation area if necessary.
Regards,
Fabio
figareg- Posts : 2
Join date : 2011-09-26
Location : Germany
Re: Data redundancy in the staging area
I would design the process to load the facts and dimensions directly.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Data redundancy in the staging area
All you need to do is compare the incoming table with the dimension and update/insert if necessary. If needed you may have to reformat or transform the source before you compare.
An easy way to compare is to calculate a CRC code on the source, and when you insert or update, store the new CRC in the dimension. To check for change all you need to do is compare the new CRC with the CRC on the dimension row.
An easy way to compare is to calculate a CRC code on the source, and when you insert or update, store the new CRC in the dimension. To check for change all you need to do is compare the new CRC with the CRC on the dimension row.
Re: Data redundancy in the staging area
Thanks,
I thought it would make sense to compare directly with the final tables but was not sure.
Fabio
I thought it would make sense to compare directly with the final tables but was not sure.
Fabio
figareg- Posts : 2
Join date : 2011-09-26
Location : Germany
Similar topics
» Data in Staging area
» Data types for staging area db tables
» Staging area.. is it a must?
» Start Schema vs Snow flake schema
» Staging area design
» Data types for staging area db tables
» Staging area.. is it a must?
» Start Schema vs Snow flake schema
» Staging area design
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum