Surrogate key regeneration
3 posters
Page 1 of 1
Surrogate key regeneration
I am currently engaged with an organization as a technical lead with their BI deployment. I have become involved with the information delivery team in architectural discussions on "best practice" design of a data mart. What I have uncovered in terms of conformed dimensions and fact data is based on a concept I have never come across before. Let me give you an example:
1. The data mart is implementing the use of surrogate keys. Great, so for cost centre 100 it is assigned a surrogate key of 1.
2. There is no insert/update strategy in place. So, whenever data is loaded the entire mart including dimensions and facts is truncated. For example, if I load July actuals I need to reload all history.
3. What this means is all surrogate keys are regenerating! So, cost centre 100 now has a surrogate key of 2 and all facts are reloaded to reflect this change!
Based on my experience this is a flawed approach in terms of design and implementation. Any thoughts would be greatly appreciated.
Thanks, Paul
1. The data mart is implementing the use of surrogate keys. Great, so for cost centre 100 it is assigned a surrogate key of 1.
2. There is no insert/update strategy in place. So, whenever data is loaded the entire mart including dimensions and facts is truncated. For example, if I load July actuals I need to reload all history.
3. What this means is all surrogate keys are regenerating! So, cost centre 100 now has a surrogate key of 2 and all facts are reloaded to reflect this change!
Based on my experience this is a flawed approach in terms of design and implementation. Any thoughts would be greatly appreciated.
Thanks, Paul
pschmidt- Posts : 1
Join date : 2010-07-28
Re: Surrogate key regeneration
If you reload the entire data warehouse then the approach is not flawed. SK's are purely for joining. The thing that maybe flawed is the approach to reload everything. Over time, the tables will grow incrementally larger causing longer and longer load times.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Surrogate key regeneration
In your case, every incremental ETL load is treated as initial load. Normally you should design your ETL to cater for incremental load on SCD dimension and fact tables, and the initial load is just part of it with unlimited time span.
It's a a lot easier to treat everything as initial load and forget about finding any changes in the source, and it will self correct the data in DW if there are a lot of on-going corrections in source data. It might be OK for small DW as it may only take couple of hours to complete the entire rebuild overnight. But for most DW project, this approach is flawed for the following reasons.
1. You may not have the complete source history to build your DW.
2. Your load window does not allow the time for the entire rebuild.
3. If the load failed in the night, then the whole DW becomes unavailable unless there is an automated backup/restore and database switching mechanism.
In my experience I do frequently rebuild the DW during development to get all SK's aligned, but once in production, it's very risky process that requires quite a few days contingency planning and preparation.
It's a a lot easier to treat everything as initial load and forget about finding any changes in the source, and it will self correct the data in DW if there are a lot of on-going corrections in source data. It might be OK for small DW as it may only take couple of hours to complete the entire rebuild overnight. But for most DW project, this approach is flawed for the following reasons.
1. You may not have the complete source history to build your DW.
2. Your load window does not allow the time for the entire rebuild.
3. If the load failed in the night, then the whole DW becomes unavailable unless there is an automated backup/restore and database switching mechanism.
In my experience I do frequently rebuild the DW during development to get all SK's aligned, but once in production, it's very risky process that requires quite a few days contingency planning and preparation.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Should I use the surrogate key?
» Surrogate Key Disadvantages??
» Fact table with non-numeric measure
» Fact Indexing -SQL Server 2008
» No Surrogate keys
» Surrogate Key Disadvantages??
» Fact table with non-numeric measure
» Fact Indexing -SQL Server 2008
» No Surrogate keys
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|