SCD Type II Design Decision
3 posters
Page 1 of 1
SCD Type II Design Decision
Hi,
I am fairly new to dimensional modelling and have been struggling with a problem for a couple of days now. I currently have a fact table which stores incidents/problems with orders that are sent to customers. The system from which this incident data is being extracted from provides full extracts as they have no effective CDC method.
So in effect every month I am getting duplicate data, for which I am updating the fact table (we want to prevent duplicate data being entered in to the fact table). The updates are being done based on a natural key comprising of 3 dimension keys. The problem I have is that one of these dimension keys is a slowly changing type II dimension.
This is fine, expect that my reporting analyst wants to be able to retrieve only one version of a given incident (i.e the latest). So for example, we could have the following:
Incident_Key Incident valid_from valid_to active_flag
1 A 01/02/2009 23/02/2009 N
2 B 23/02/2009 31/12/2050 Y
Surrogate_Key Client_key Department_key Incident_Key
1 1 1 1
1 1 1 2
In the above example the rows are in theory identical and define the same incident.
Unfortunately there is know way of breaking this down by date either. The only date being captured is the date the incident was raised and this remains the same throughout the life of an incident.
The key questions I have are:
1) Is it advisable to update rows in the fact table and setting their active_flags = 'N' ?
2) In the above scenario should Incident be a SCD II? If it was a SCD type I, I wouldn't have this problem.
Hopefully the question should be clear.
I appreciate your help.
Regards
Azeem
I am fairly new to dimensional modelling and have been struggling with a problem for a couple of days now. I currently have a fact table which stores incidents/problems with orders that are sent to customers. The system from which this incident data is being extracted from provides full extracts as they have no effective CDC method.
So in effect every month I am getting duplicate data, for which I am updating the fact table (we want to prevent duplicate data being entered in to the fact table). The updates are being done based on a natural key comprising of 3 dimension keys. The problem I have is that one of these dimension keys is a slowly changing type II dimension.
This is fine, expect that my reporting analyst wants to be able to retrieve only one version of a given incident (i.e the latest). So for example, we could have the following:
Incident_Key Incident valid_from valid_to active_flag
1 A 01/02/2009 23/02/2009 N
2 B 23/02/2009 31/12/2050 Y
Surrogate_Key Client_key Department_key Incident_Key
1 1 1 1
1 1 1 2
In the above example the rows are in theory identical and define the same incident.
Unfortunately there is know way of breaking this down by date either. The only date being captured is the date the incident was raised and this remains the same throughout the life of an incident.
The key questions I have are:
1) Is it advisable to update rows in the fact table and setting their active_flags = 'N' ?
2) In the above scenario should Incident be a SCD II? If it was a SCD type I, I wouldn't have this problem.
Hopefully the question should be clear.
I appreciate your help.
Regards
Azeem
AzeemFarooqui- Posts : 6
Join date : 2009-02-23
Re: SCD Type II Design Decision
The natural key for your incident has to remain constant between fact updates; otherwise you'll end up with a new fact record (as you should by design). For a type 2 surrogate key lookup you'll lookup surrogate key based on the natural key plus something like date or active flag.
Also your active flag should be in the dimension rather than the fact. It's a dimensional attribute. The valid from and valid to dates should also reside in your dimension.
But the bigger question to me is (as you've mentioned) should this be a type two? If the incident attributes are not changing between updates then it's not. If they are, and no one cares, then it could be a candidate for a type 1. I'd make it a type 2 if the dimensional attributes are actually changing and your users care about the history or you have some reason to believe they'll care about tracking history.
The Data Warehouse Toolkit explains SCDs in much more detail than my couple of paragraphs.
Hope this helps. If not, let me know.
Also your active flag should be in the dimension rather than the fact. It's a dimensional attribute. The valid from and valid to dates should also reside in your dimension.
But the bigger question to me is (as you've mentioned) should this be a type two? If the incident attributes are not changing between updates then it's not. If they are, and no one cares, then it could be a candidate for a type 1. I'd make it a type 2 if the dimensional attributes are actually changing and your users care about the history or you have some reason to believe they'll care about tracking history.
The Data Warehouse Toolkit explains SCDs in much more detail than my couple of paragraphs.
Hope this helps. If not, let me know.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: SCD Type II Design Decision
If you want to capture both "as is" and "most current" view of the dimension, you will need to keys on your fact table. One key is your typical dimension FK. The other is a key that will never change it's value in the dimension. When you want the current view, you join on the "most current" key where current_row_fl = 'Y'. This will always point you to the most current row in the dimension.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: SCD Type II Design Decision
Thanks for the response guys. I think I'm going to have to drive this forward with the business and understand wheter history is important to them.
AzeemFarooqui- Posts : 6
Join date : 2009-02-23
Similar topics
» EDW Design Decision
» Dimension Design Decision
» Fact Table design Decision
» Design Alternatives for Late Arriving/Updated Type 2 Dimension Data
» Optimal SCD type 2 dimension design
» Dimension Design Decision
» Fact Table design Decision
» Design Alternatives for Late Arriving/Updated Type 2 Dimension Data
» Optimal SCD type 2 dimension design
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum