Problem with Single Surrogate Keys in the AS-IS/Point in Time Dimensional Modelling
2 posters
Page 1 of 1
Problem with Single Surrogate Keys in the AS-IS/Point in Time Dimensional Modelling
I am trying to model a Slowly Changing Dimension - 2 with a timestamped accumulating Snapshot Fact.
Kimball Recommends that one uses Single Surrogate Key while joining the Dimensions with the Fact.
I am in situation where i see there could be a situation that a late arriving fact could render the dimension and fact relationship out of sync because our ETL guys oppose the idea of adding another fact row if there is a new attribute change in one of the SCD-2s.
lets see an example.
CATEGORY DIM
------------------
TYPE -2 KEY | TYPE -1 KEY | NATURAL KEY | NAME |EFF DATE| EXPIRY DATE | CURR
1 | 101 | N1100 | SLIPPING | 2/1/2013| 1/1/9999 | N
2 | 101 | N1100 | SLIPPERY | 4/1/2013| 1/1/9999 | Y
IMPACT FACT
-------------------
TYPE -2 KEY | TYPE -1 KEY | NATURAL KEY | CATEGORY DIM KEY (Type 2) |IPC EFF DATE| IPC EXPIRY DATE | CURR
I1 | 1011 | IPC1100 | 1 | 2/1/2013| 1/1/9999 | N
I2 | 1011 | IPC1100 | 1 | 3/1/2013| 1/1/9999 | N
I3 | 1011 | IPC1100 | 1 | 3/2/2013| 1/1/9999 | Y
So if you notice I3 record still uses the Type 2 key -- 1 and we dont have any Impact fact data coming in from the source with this new change..If i want to report IMPACT FACT as of 4/2 with the Category dimensionality then I3 would be shown linked to "SLIPPING" which is not correct.
I can resolve this using the Type 1 Surrogate Key from Category Dim on the Impact Fact,but that maynot be the right approach as everytime we need to use the key current indicator ='Y' or between clause for effective start and end dates on both the fact and dimension table.
How do you tackle this?Any quick suggestion would be valuable
Cheers!!!
Kimball Recommends that one uses Single Surrogate Key while joining the Dimensions with the Fact.
I am in situation where i see there could be a situation that a late arriving fact could render the dimension and fact relationship out of sync because our ETL guys oppose the idea of adding another fact row if there is a new attribute change in one of the SCD-2s.
lets see an example.
CATEGORY DIM
------------------
TYPE -2 KEY | TYPE -1 KEY | NATURAL KEY | NAME |EFF DATE| EXPIRY DATE | CURR
1 | 101 | N1100 | SLIPPING | 2/1/2013| 1/1/9999 | N
2 | 101 | N1100 | SLIPPERY | 4/1/2013| 1/1/9999 | Y
IMPACT FACT
-------------------
TYPE -2 KEY | TYPE -1 KEY | NATURAL KEY | CATEGORY DIM KEY (Type 2) |IPC EFF DATE| IPC EXPIRY DATE | CURR
I1 | 1011 | IPC1100 | 1 | 2/1/2013| 1/1/9999 | N
I2 | 1011 | IPC1100 | 1 | 3/1/2013| 1/1/9999 | N
I3 | 1011 | IPC1100 | 1 | 3/2/2013| 1/1/9999 | Y
So if you notice I3 record still uses the Type 2 key -- 1 and we dont have any Impact fact data coming in from the source with this new change..If i want to report IMPACT FACT as of 4/2 with the Category dimensionality then I3 would be shown linked to "SLIPPING" which is not correct.
I can resolve this using the Type 1 Surrogate Key from Category Dim on the Impact Fact,but that maynot be the right approach as everytime we need to use the key current indicator ='Y' or between clause for effective start and end dates on both the fact and dimension table.
How do you tackle this?Any quick suggestion would be valuable
Cheers!!!
TheDarkKnight- Posts : 10
Join date : 2013-03-22
Age : 42
Location : NJ
Re: Problem with Single Surrogate Keys in the AS-IS/Point in Time Dimensional Modelling
Its a good thing your ETL guys pushed back. Why would you want to change the fact or add a new row?
If you want current, and you have the type 1 key in the fact, use it. If you don't like having to scan the type 2 table for the current row, build a type 1 table. Where is it written you cannot have a type 1 and type 2 version of the same dimension? It is far simpler and more efficient than messing with the fact table.
If you want current, and you have the type 1 key in the fact, use it. If you don't like having to scan the type 2 table for the current row, build a type 1 table. Where is it written you cannot have a type 1 and type 2 version of the same dimension? It is far simpler and more efficient than messing with the fact table.
Re: Problem with Single Surrogate Keys in the AS-IS/Point in Time Dimensional Modelling
Thanks for your response!!!
The challenge is not the difficulty of having them on the dimension but deciding on how to connect these to the fact.
In the above example i have clearly underlined the issue that will come up.How would you represent the relationship between the SCD-2 Dimension and the timestamped accumulating snapshot Fact.Due to the late arriving Dimension ,how can we make sure the fact points to the latest data using the type 2 key,Since we dont want to use the composite Type 1 Key and Effective date*?Even Effective Date doesnt represent the current State of the SCD-2 Dimension.Please advice
cheers!!!
The challenge is not the difficulty of having them on the dimension but deciding on how to connect these to the fact.
In the above example i have clearly underlined the issue that will come up.How would you represent the relationship between the SCD-2 Dimension and the timestamped accumulating snapshot Fact.Due to the late arriving Dimension ,how can we make sure the fact points to the latest data using the type 2 key,Since we dont want to use the composite Type 1 Key and Effective date*?Even Effective Date doesnt represent the current State of the SCD-2 Dimension.Please advice
cheers!!!
TheDarkKnight- Posts : 10
Join date : 2013-03-22
Age : 42
Location : NJ
Re: Problem with Single Surrogate Keys in the AS-IS/Point in Time Dimensional Modelling
A type 2 key will NEVER point to the latest data. It's not supposed to. It points to the data at the time the fact was inserted.
If you want it to always point to the latest data, use a type 1 dimension.
If you are using a type 2 dimension and you want to use current data, you need to self-join (or use the type 1 key as you have set it up) and filter for the current row. That is how it is done.
If you want it to always point to the latest data, use a type 1 dimension.
If you are using a type 2 dimension and you want to use current data, you need to self-join (or use the type 1 key as you have set it up) and filter for the current row. That is how it is done.
Similar topics
» How to model number of subscribers for a certain point in time
» Using Hash keys instead of Numeric sequence keys in Dimensional Model?
» Constraints on Surrogate Keys?
» Point in time reporting using ad hoc functions
» No Surrogate keys
» Using Hash keys instead of Numeric sequence keys in Dimensional Model?
» Constraints on Surrogate Keys?
» Point in time reporting using ad hoc functions
» No Surrogate keys
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum