Replicate Fact record because Dimension has changed
5 posters
Page 1 of 1
Replicate Fact record because Dimension has changed
I am Trying to Dimensionally represent the source data. Data is extracted incrementally. Fact measures can change with no changes to any Dimension and vice versa!!
I am not sure how to deal with fact record if there is a change to the Dimension (SCD2) but no change to the fact measure!
1: Do I add a new record to the fact with exactly the same data as previous fact record but with different Dimension sk. In which case possible issue with aggregations across time (Even though no immediate reporting requirement need such aggregation)
2: Or Hold the stable/durable key in the Fact table instead of Surrogate key. But in this case am struggling to relate the historical records i.e. which fact applies to which dimension records and correlation between Dimensions.
Would appreciate any suggestions or best practice!
I am not sure how to deal with fact record if there is a change to the Dimension (SCD2) but no change to the fact measure!
1: Do I add a new record to the fact with exactly the same data as previous fact record but with different Dimension sk. In which case possible issue with aggregations across time (Even though no immediate reporting requirement need such aggregation)
2: Or Hold the stable/durable key in the Fact table instead of Surrogate key. But in this case am struggling to relate the historical records i.e. which fact applies to which dimension records and correlation between Dimensions.
Would appreciate any suggestions or best practice!
rohanf- Posts : 13
Join date : 2014-04-02
Re: Replicate Fact record because Dimension has changed
Normally a fact represents a measure at a point in time. Therefore an individual measure cannot change - though a fact that contains multiple measures can change if the measures are not recorded at the same time (e.g. accumulating snapshot).
Similarly, an SCD Dimension records the state of the dimension at the time of the fact.
Therefore, if a dimension changes you just update your dimension table. It doesn't affect your fact as the fact is related to the dimension record at the time of the fact - which doesn't change.
If you have an accumulating snapshot fact then you need to decide which version of each SCD dimension you should assign to the fact - state when the fact is first created, state at each update, etc. - and then apply these business rules to your ETL.
If this doesn't answer your question please could you give an example of the sorts of dimensions/facts/measures you are dealing with?
Similarly, an SCD Dimension records the state of the dimension at the time of the fact.
Therefore, if a dimension changes you just update your dimension table. It doesn't affect your fact as the fact is related to the dimension record at the time of the fact - which doesn't change.
If you have an accumulating snapshot fact then you need to decide which version of each SCD dimension you should assign to the fact - state when the fact is first created, state at each update, etc. - and then apply these business rules to your ETL.
If this doesn't answer your question please could you give an example of the sorts of dimensions/facts/measures you are dealing with?
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Replicate Fact record because Dimension has changed
Thanks Nick!
Firstly its an Insurance project and there is no clear Transaction. Many Events can occur in the system:-
1: Change in Policy status but no change in the premium.
2: Change in Client details but no change to any fact measure.
Our Model in question is as below:-
Dimension:-
Application Dim
Policy Dim
Benefit Dim
Client Dim
Fact Holds the Premium and Sum Assured together with SK to all of the above Dim. Now the Measures may not change and the Dimenssion can change e.g. Policy Status Changes etc.
Now we are evaluating two options:-
1: Do I add a new record to the fact with exactly the same data as previous fact record but with different Dimension sk. In which case possible issue with aggregations across time (Even though no immediate reporting requirement need such aggregation)
2: Or Hold the stable/durable key in the Fact table instead of Surrogate key. But in this case am struggling to relate the historical records i.e. which fact applies to which dimension records and correlation between Dimensions.
Appreciate your help!
Regards
Rohan
Firstly its an Insurance project and there is no clear Transaction. Many Events can occur in the system:-
1: Change in Policy status but no change in the premium.
2: Change in Client details but no change to any fact measure.
Our Model in question is as below:-
Dimension:-
Application Dim
Policy Dim
Benefit Dim
Client Dim
Fact Holds the Premium and Sum Assured together with SK to all of the above Dim. Now the Measures may not change and the Dimenssion can change e.g. Policy Status Changes etc.
Now we are evaluating two options:-
1: Do I add a new record to the fact with exactly the same data as previous fact record but with different Dimension sk. In which case possible issue with aggregations across time (Even though no immediate reporting requirement need such aggregation)
2: Or Hold the stable/durable key in the Fact table instead of Surrogate key. But in this case am struggling to relate the historical records i.e. which fact applies to which dimension records and correlation between Dimensions.
Appreciate your help!
Regards
Rohan
rohanf- Posts : 13
Join date : 2014-04-02
Re: Replicate Fact record because Dimension has changed
OK one thing to bear mind: your suggestion of putting the durable key into the fact table instead of the surrogate key breaks the fundamentals underlying dimension modelling so is not a solution. If you find yourself having to do something like this to get your model to work then there is a 99% chance that your model is wrong.
So if I have understood what you are trying to do correctly then, keeping things simple initially, there are two options:
1. You want to know the values of the dimensional attributes at the time of the event - in which case you implement the Dims as Type 2 SCD
2. You want to know the current values of the dimensional attributes for historic facts -in which case you implement the Dims as Type 1 SCD
Remember, there is nothing wrong with keeping both SCD1 and 2 versions of a Dim - if 2 different fact tables require different types of history for the same Dim
If you really need to know both the values of the dim attributes at the time of the fact and also the current values (and I would strongly recommend that you confirm this is a must-have requirement before you try and implement it) then you are probably looking at having to implement Type 6 SCDs.
However, I am still not entirely clear on what you are trying to report on. Remember, a dimensional model is there to answer business questions - it is not a reformatted copy of your transactional system. So that fact that, for example, Policy Status changes after a fact record has been created is, in itself, of no interest. It only matters if it affects what you are trying to report on - so if you go back to your business requirements you should have the information that tells you how you must handle these changes in source system data
So if I have understood what you are trying to do correctly then, keeping things simple initially, there are two options:
1. You want to know the values of the dimensional attributes at the time of the event - in which case you implement the Dims as Type 2 SCD
2. You want to know the current values of the dimensional attributes for historic facts -in which case you implement the Dims as Type 1 SCD
Remember, there is nothing wrong with keeping both SCD1 and 2 versions of a Dim - if 2 different fact tables require different types of history for the same Dim
If you really need to know both the values of the dim attributes at the time of the fact and also the current values (and I would strongly recommend that you confirm this is a must-have requirement before you try and implement it) then you are probably looking at having to implement Type 6 SCDs.
However, I am still not entirely clear on what you are trying to report on. Remember, a dimensional model is there to answer business questions - it is not a reformatted copy of your transactional system. So that fact that, for example, Policy Status changes after a fact record has been created is, in itself, of no interest. It only matters if it affects what you are trying to report on - so if you go back to your business requirements you should have the information that tells you how you must handle these changes in source system data
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Replicate Fact record because Dimension has changed
You can also do a self join to the dimension to get the current row. This has a small performance hit but if the volume isn't too bad it saves you from having to muck up the model.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Replicate Fact record because Dimension has changed
Thanks, this did help! I have ruled out the option of Stable key in fact. I understand that the Fact only needs to capture the dim key when the fact actually happened. However in our case the fact holds the premium information and there is no event associated (there can be changes to the premium though). When we are looking back in history we want that information (in the fact) to be associated to all dim keys at that point in time. So I am confused about how to represent.
I have come up with below approach!
"Frequently Changing Attributes of Policy" (many)--------> (one) Policy Dim (SCD1 - only attributes set once in a lifetime of a policy) (one)-------------> (Many) The Policy Fact
Does this look reasonable!
Thanks & Regards
Rohan
I have come up with below approach!
"Frequently Changing Attributes of Policy" (many)--------> (one) Policy Dim (SCD1 - only attributes set once in a lifetime of a policy) (one)-------------> (Many) The Policy Fact
Does this look reasonable!
Thanks & Regards
Rohan
rohanf- Posts : 13
Join date : 2014-04-02
Re: Replicate Fact record because Dimension has changed
I'd tackle this with the concept of time-stamping a snapshot fact table. You end up with a new fact record when context changes (e.g. change in policy status) or a measurement changes (e.g. change in premium).
THe timestamps ensure that at any point in time, only one iteration of the measurement is "active". This allows for both current and historical perspectives.
http://www.kimballgroup.com/2012/05/01/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
LAndrews- Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada
Re: Replicate Fact record because Dimension has changed
When you said there is no clear transaction, then the fact tables should be of some sort of snapshot, very likely accumulating snapshot. The simplest and effective approach is to update the SK values in accumulating snapshot table to reflect the current dimension status for those SCD2 dimensions. It sounds like breaking rules, but it is the right way to deal with this type of fact with SCD2 involved.
DK (Durable key) + SK (SCD7) approach is more suitable to transaction fact table as you should not update this type of fact table but just insert. If you want to track the the historical relationship of the dimension keys hosted by the accumulating fact table, Andrew's approach is Kimball's extension for accumulating snapshot which is also the best practice. Otherwise, don't be afraid of updating accumulating snapshot table frequently. Effectively you could even truncate and rebuilt the table if it performs far better than up-sert. But remember, accumulating snapshot is the only type of fact table that you can update periodically. People say don't update fact tables because we mostly deal with transaction or periodical snapshot fact tables.
DK (Durable key) + SK (SCD7) approach is more suitable to transaction fact table as you should not update this type of fact table but just insert. If you want to track the the historical relationship of the dimension keys hosted by the accumulating fact table, Andrew's approach is Kimball's extension for accumulating snapshot which is also the best practice. Otherwise, don't be afraid of updating accumulating snapshot table frequently. Effectively you could even truncate and rebuilt the table if it performs far better than up-sert. But remember, accumulating snapshot is the only type of fact table that you can update periodically. People say don't update fact tables because we mostly deal with transaction or periodical snapshot fact tables.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Calculated attributes in Customer Dimension?
» Fact slowly changed issue
» How to model a FACT where its relationship to multiple dimensions has changed from M:1 to M:M
» Dimension Record Comes and Gos
» Dimensions that don't apply to every fact record
» Fact slowly changed issue
» How to model a FACT where its relationship to multiple dimensions has changed from M:1 to M:M
» Dimension Record Comes and Gos
» Dimensions that don't apply to every fact record
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum