Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
+2
ngalemmo
ian.coetzer
6 posters
Page 1 of 1
Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
Hi,
We have for example two fact tables, these fact tables are at the same grain.
However one of the dimensions (conformed) has been designed to implement simple SCD type 2 change tracking.
The problem is that records are not necessarily inserted at exactly the same time, since the 2nd fact table's business event takes longer to start.
So we do not yet have measures for the 2nd fact table in the source table at the time that the first fact table is populated.
This is fine, records come in the following day and everything matches up again for that time period and totals on reports balance etc.
However some of the attributes on one of the common dimensions have been implemented as SCD type 2, this means the dimension record is expired and a new one inserted.
Now again we have unbalanced fact tables due to the fact that a record in the first fact table is linked to an expirement record in the dimension, and then when the second fact table is populated it of course links to the new dimension record which is active and was inserted due to a SCD type 2 change.
We have for example two fact tables, these fact tables are at the same grain.
However one of the dimensions (conformed) has been designed to implement simple SCD type 2 change tracking.
The problem is that records are not necessarily inserted at exactly the same time, since the 2nd fact table's business event takes longer to start.
So we do not yet have measures for the 2nd fact table in the source table at the time that the first fact table is populated.
This is fine, records come in the following day and everything matches up again for that time period and totals on reports balance etc.
However some of the attributes on one of the common dimensions have been implemented as SCD type 2, this means the dimension record is expired and a new one inserted.
Now again we have unbalanced fact tables due to the fact that a record in the first fact table is linked to an expirement record in the dimension, and then when the second fact table is populated it of course links to the new dimension record which is active and was inserted due to a SCD type 2 change.
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
What is it you want to do? If you want the attributes to be the same, which 'same' do you want? The same as the first fact, second fact or current regardless of fact?
RE: Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
Is expired dim record tied to first fact table correct or the second fact table tied to a new dim record is correct because of SCD2 ? Look at the detail data came in to load first fact and the dimention and second fact and the same dim. If dim data changed from day one (loading first fact) to day second (loading second fact) then it is behaving correctly. This is not a data modeling issue, but timing issue to process the data.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
Hi,
It is not timing because the second fact table is dependant on a completely different business process that is suppose to only come through at a later stage.
What I think must happen is that all the fact records that are linked to the original (now expired) dimension record must be updated with the new surrogate key of the new active dimension record.
However this would then change this from a transactional fact table to an accumulating fact table (in a way)...
And this is not right from a dimensional modelling perspective, and it would potentially have to scan through 35 million + records to update the ones that changed? this would produce the results we want from a reporting perspective but I can't get myself to do this because it seems wrong to me on so many levels? we then also lose what the dimension record looked like at the time of the fact record insertion which is also important to keep intact! I cannot seem to find a solution that makes sense.
It is not timing because the second fact table is dependant on a completely different business process that is suppose to only come through at a later stage.
What I think must happen is that all the fact records that are linked to the original (now expired) dimension record must be updated with the new surrogate key of the new active dimension record.
However this would then change this from a transactional fact table to an accumulating fact table (in a way)...
And this is not right from a dimensional modelling perspective, and it would potentially have to scan through 35 million + records to update the ones that changed? this would produce the results we want from a reporting perspective but I can't get myself to do this because it seems wrong to me on so many levels? we then also lose what the dimension record looked like at the time of the fact record insertion which is also important to keep intact! I cannot seem to find a solution that makes sense.
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
If this is something you do often, why not just store the natural key in both fact tables for that one dimension?
Re: Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
With SCD-2, the dimension key is dependent upon the transaction date the fact represents (which may be different than the date the fact was loaded).
It sounds like one of your facts can be considered "late-arriving", which means you'll need to modify your ETL process to look up the surrogate keys based on the transaction date, rather than just using the "current" record.
If the 2 facts represent a different transaction date, then having different surrogate keys is legitimate SCD2 behavior.
LAndrews- Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada
Re: Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
ian.coetzer wrote:
What I think must happen is that all the fact records that are linked to the original (now expired) dimension record must be updated with the new surrogate key of the new active dimension record.
Sounds like you would rather have a type 1 dimension. Which is why I asked the original question.
When you combine facts, you combine them on attributes, not keys (unless you are creating an aggregate table, in which case it really doesn't matter that the keys are different, you just wind up with more rows). If you want to combine the facts on current dimensional values, you need to self join on the type 2 dimension to locate the current row. You do not change keys or in any way alter what you have.
Re: Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
I think Andrews is right on the point. Your ETL should always be able to load fact table with proper dimension keys for transactions that take place a few days ago. In your case the second fact table load will contain transactions with early dates. For ETL, always use SCD dates to get the correct dimension keys, so that there is only one version of ETL for both historical rebuild and daily incremental load.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Factless fact tables and SCD2
» Fact tables at different grains with measures in each
» Loading a Fact Table with SCD2
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» How to create transactions SCD2 from many SCD tables
» Fact tables at different grains with measures in each
» Loading a Fact Table with SCD2
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» How to create transactions SCD2 from many SCD tables
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum