Updating a Fact Table
4 posters
Page 1 of 1
Updating a Fact Table
I currently working on an Accounting Datamart which the dims and facts are loaded daily. I have recently had a request to update the data after its been loaded. The best way for me to explain this is to give an example.
The data is loaded daily, so lets say the data is loaded from 11/30 thru 12/4. On 12/5 a change has occured for one of the customer's, the customer had become 'Inactive'. Of course that change would be loaded into the type 2 dimcustomer and the new effective_dt is now 12/5. The fact for 12/5 has been loaded and the fact tbl now reflects the changes made to the customer. This is fine and works correctly.
But, now the user's want me to make the chg of 'Inactive' for that customer effective 11/30 instead of the actually chg date which would be 12/5. This means I would have to go back and modify the dimcustomer from 11/30 thru 12/4 and also make the changes to the fact table for these dates.
I do not like it and feel uncomfortable updating any dims or facts, but I do not have a choice here. I'm not sure which way to approach it any suggestions are appreciated.
The data is loaded daily, so lets say the data is loaded from 11/30 thru 12/4. On 12/5 a change has occured for one of the customer's, the customer had become 'Inactive'. Of course that change would be loaded into the type 2 dimcustomer and the new effective_dt is now 12/5. The fact for 12/5 has been loaded and the fact tbl now reflects the changes made to the customer. This is fine and works correctly.
But, now the user's want me to make the chg of 'Inactive' for that customer effective 11/30 instead of the actually chg date which would be 12/5. This means I would have to go back and modify the dimcustomer from 11/30 thru 12/4 and also make the changes to the fact table for these dates.
I do not like it and feel uncomfortable updating any dims or facts, but I do not have a choice here. I'm not sure which way to approach it any suggestions are appreciated.
Lisa86- Posts : 4
Join date : 2010-12-15
Refine SCD requirements rather than update fact
Not sure from your example what the general requirement is -- do users want to see all past facts with the updated dimension data, or only recent ones?
If they want to see the latest dimension updates with all past facts, this means SCD Type 1 behavior. It is very common to have many users that want SCD type 1 for customer and product data and a few that need SCD type 2 data. You can find descriptions (including at least one thread in this forum) for implementation alternatives for providing both SCD 1 and 2 data for the same dimension.
If the requirement is to report only recent facts using the updated dimension data, such as 11/30/2010 facts link to dimension rows updated through 12/31/2010, but older facts link to dimension data as it was at the time of the fact, then this is another form of the "late arriving fact" problem, only with dimension data also involved. In that case you need to find out what the time scope of the problem is and get users to sign off on a firm rule -- dimension updates made today can be retroactive for one week, one month, one year, etc. but there must be a rule.
The way I've seen this handled is to recreate the fact rows for whatever the "late arriving" time scope is, unless a long time range and/or large number of records make this impractical, in which case I would provide both SCD type 1 and 2 versions of the customer dimension to give them as close as practival to what they want.
When you recreate fact rows to incorporate the late arriving information, you change the normal SCD type 2 key resolution logic to make the late arriving dimension updates get associated with the fact rows as required.
In any case, I agree with you that updating the fact table to link it to the updated dimension row is not a good solution. It would be a band-aid when what is really needed is a deeper understanding of he requirements for SCD behavior and possibly for dealing with late arriving information.
If they want to see the latest dimension updates with all past facts, this means SCD Type 1 behavior. It is very common to have many users that want SCD type 1 for customer and product data and a few that need SCD type 2 data. You can find descriptions (including at least one thread in this forum) for implementation alternatives for providing both SCD 1 and 2 data for the same dimension.
If the requirement is to report only recent facts using the updated dimension data, such as 11/30/2010 facts link to dimension rows updated through 12/31/2010, but older facts link to dimension data as it was at the time of the fact, then this is another form of the "late arriving fact" problem, only with dimension data also involved. In that case you need to find out what the time scope of the problem is and get users to sign off on a firm rule -- dimension updates made today can be retroactive for one week, one month, one year, etc. but there must be a rule.
The way I've seen this handled is to recreate the fact rows for whatever the "late arriving" time scope is, unless a long time range and/or large number of records make this impractical, in which case I would provide both SCD type 1 and 2 versions of the customer dimension to give them as close as practival to what they want.
When you recreate fact rows to incorporate the late arriving information, you change the normal SCD type 2 key resolution logic to make the late arriving dimension updates get associated with the fact rows as required.
In any case, I agree with you that updating the fact table to link it to the updated dimension row is not a good solution. It would be a band-aid when what is really needed is a deeper understanding of he requirements for SCD behavior and possibly for dealing with late arriving information.
tim_huck- Posts : 8
Join date : 2009-04-09
Location : Evanston, Illinois
Re: Updating a Fact Table
Add a second set of dates to track the business effective and end dates. That way you can track how that record was or how it should've been by using either set of dates.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Updating a Fact Table
example of current dimcustomer and new dimcustomer
DimCustomer | |||||||
cust_pkey | customer_no | status | type | rating | eff_dt | end_dt | |
2 | 345 | Active | Gold | 8 | 8/31/2010 | 9/25/2010 | |
3 | 345 | Active | Silver | 8 | 9/26/2010 | 11/30/2010 | chg type |
4 | 345 | Active | Silver | 2 | 12/1/2010 | 12/2/2010 | chg rating |
5 | 345 | Active | Bronze | 2 | 12/3/2010 | 12/4/2010 | chg type |
6 | 345 | InActive | Bronze | 2 | 12/5/2010 | 12/31/9999 | chg status |
New DimCustomer | |||||||
cust_pkey | customer_no | status | type | rating | eff_dt | end_dt | |
2 | 345 | Active | Gold | 8 | 8/31/2010 | 11/29/2010 | |
3 | 345 | InActive | Silver | 8 | 11/30/2010 | 12/1/2010 | |
4 | 345 | InActive | Silver | 2 | 12/2/2010 | 12/3/2010 | |
5 | 345 | InActive | Bronze | 2 | 12/4/2010 | 12/31/9999 |
Lisa86- Posts : 4
Join date : 2010-12-15
Re: Updating a Fact Table
examples of current factamt and new factamt and dimdate
FactAmt | New FactAmt | |||||||
cust_pkey | date_pkey | balance amt | outstanding_amt | cust_pkey | date_pkey | balance amt | outstanding_amt | |
3 | 43 | 5658.00 | 6589.00 | 2 | 43 | 5658.00 | 6589.00 | |
3 | 44 | 5658.00 | 6589.00 | 2 | 44 | 5658.00 | 6589.00 | |
3 | 45 | 5658.00 | 6589.00 | 2 | 45 | 5658.00 | 6589.00 | |
3 | 46 | 5658.00 | 6589.00 | 3 | 46 | 0.00 | 0.00 | |
4 | 47 | 5658.00 | 6589.00 | 3 | 47 | 0.00 | 0.00 | |
4 | 48 | 5658.00 | 6589.00 | 4 | 48 | 0.00 | 0.00 | |
5 | 49 | 5658.00 | 6589.00 | 4 | 49 | 0.00 | 0.00 | |
5 | 50 | 5658.00 | 6589.00 | 5 | 50 | 0.00 | 0.00 | |
6 | 51 | 0.00 | 0.00 | 5 | 51 | 0.00 | 0.00 |
DimDate | |
date_pkey | day_dt |
43 | 11/27/2010 |
44 | 11/28/2010 |
45 | 11/29/2010 |
46 | 11/30/2010 |
47 | 12/1/2010 |
48 | 12/2/2010 |
49 | 12/3/2010 |
50 | 12/4/2010 |
51 | 12/5/2010 |
Lisa86- Posts : 4
Join date : 2010-12-15
Re: Updating a Fact Table
Thanks so much for the replies, this has been spinning in my head. The phase 'Late Arriving Data' is a perfect description. They only want to update the facts since the last day of the prior month. So if the chg 'Inactive status' comes in on the 12/5 only the fact records from 11/30 forward need to be updated not the entire history. Below I have given examples of how the data currently works on the dimcustomer and factamt. The changes they want me to make are reflected on the new dimcustomer and new factamt.
what did you mean by 'get users to sign off on a firm rule -- dimension updates made today can be retroactive for one week, one month, one year, etc. but there must be a rule'?
I do know ahead of time what customers this may happen to so I was thinking about making a temp dimcustomer and temp factamt so when this is activated on 12/5 I would just overwrite the records in the original fact. Shouldn't I also create some kind of audit trail for the records in the dim and fact I changed? Any suggestions are appreciated.
I had to put the examples on different posts below because I kept getting an error 'Message is too Big'. See last two posts for examples.
what did you mean by 'get users to sign off on a firm rule -- dimension updates made today can be retroactive for one week, one month, one year, etc. but there must be a rule'?
I do know ahead of time what customers this may happen to so I was thinking about making a temp dimcustomer and temp factamt so when this is activated on 12/5 I would just overwrite the records in the original fact. Shouldn't I also create some kind of audit trail for the records in the dim and fact I changed? Any suggestions are appreciated.
I had to put the examples on different posts below because I kept getting an error 'Message is too Big'. See last two posts for examples.
Lisa86- Posts : 4
Join date : 2010-12-15
Re: Updating a Fact Table
Using business start/end date in the source to reflect SCD start/end date is tricky. Sometimes it's almost impossible to change business process to get these two pair of dates in synch.
Having business start/end date directly in the dimension as normal SCD attributes, part from SCD start/end date, may create confusion and produce inaccurate dimension context for the fact.
I suggest to have special treatment to your SCD start/end date to reflect the business dates. So just have one pair of dates, SCD start/end date, no other start/end date attribute. The business start/end date is not an SCD attribute at all but used to find the previous record and update SCD start/end date accordingly. The process could be a little complicated in ETL but would work if implemented properly.
So basically, you retrospectively fit in the changed records instead of comparing with current record and add new record if there is an SCD change. To do that, you need to find and compare with the previous record that covers the business start date, ie. BusStartDate between SCDStartDate and SCDEndDate. There would be only one record for each business key to meet that criteria. You then need to update and insert records as follows:
1. If there is a match and any SCD change, set the previous SCDEndDate to BusStartDate, SCDStatus to inactive if it's active.
2. Add any new record or changed record from the source with BusStartDate as SCDStartDate.
3. With each added record, if it's for any SCD change, set SCDEndDate to previous SCDEndDate, and SCDStatus to previous SCDStatus. If it's new record, set SCDEndDate to '12/31/9999' and SCDStatus to active.
Having business start/end date directly in the dimension as normal SCD attributes, part from SCD start/end date, may create confusion and produce inaccurate dimension context for the fact.
I suggest to have special treatment to your SCD start/end date to reflect the business dates. So just have one pair of dates, SCD start/end date, no other start/end date attribute. The business start/end date is not an SCD attribute at all but used to find the previous record and update SCD start/end date accordingly. The process could be a little complicated in ETL but would work if implemented properly.
So basically, you retrospectively fit in the changed records instead of comparing with current record and add new record if there is an SCD change. To do that, you need to find and compare with the previous record that covers the business start date, ie. BusStartDate between SCDStartDate and SCDEndDate. There would be only one record for each business key to meet that criteria. You then need to update and insert records as follows:
1. If there is a match and any SCD change, set the previous SCDEndDate to BusStartDate, SCDStatus to inactive if it's active.
2. Add any new record or changed record from the source with BusStartDate as SCDStartDate.
3. With each added record, if it's for any SCD change, set SCDEndDate to previous SCDEndDate, and SCDStatus to previous SCDStatus. If it's new record, set SCDEndDate to '12/31/9999' and SCDStatus to active.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Updating a fact table ?
» Updating records in a fact table
» Updating Measures In Fact Table
» Updating Fact Table : Best practice
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Updating records in a fact table
» Updating Measures In Fact Table
» Updating Fact Table : Best practice
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum