Late arriving dim *change* - how to update the fact?
3 posters
Page 1 of 1
Late arriving dim *change* - how to update the fact?
Unlike the widely discussed late arriving dimension case, I have late arriving CHANGE to previously perfectly legit dimensional record.
Back in time, the fact data was assigned to the latest version of dim member, but later, it turns out some attribute for that member has changed a while ago and needs to be tracked as type2 even if retroactively.
I can re-read the dim source data and capture the type2 change in the dim table, but how do I re-assign my affected facts to the new surrogate key(s)? I realize it is not a common practice to update the fact, but given the scenario above, is there anything fundamentally wrong with deleting a period- or product-related section of the fact and re-populating it based on the updated dimension?
Back in time, the fact data was assigned to the latest version of dim member, but later, it turns out some attribute for that member has changed a while ago and needs to be tracked as type2 even if retroactively.
I can re-read the dim source data and capture the type2 change in the dim table, but how do I re-assign my affected facts to the new surrogate key(s)? I realize it is not a common practice to update the fact, but given the scenario above, is there anything fundamentally wrong with deleting a period- or product-related section of the fact and re-populating it based on the updated dimension?
antonkov- Posts : 5
Join date : 2012-06-18
Re: Late arriving dim *change* - how to update the fact?
Are you deleting fact rows and repopulating them?
I've got a situation where we have to drop and reload historical membership info. Clients retroactively change Effective and Termination dates for subscribers all the time. We have to account for the changes in aggregate tables. We have to remove 2 years worth of data and repopulate. The aggregate tables are fairly small so we rename the table to Table_Name_Old, recreate Table_Name, load upto the last 2 years of aggregated data from Table_Name_Old, and load the last 2 years of data from the most recent source.
I also have set it up so that I have 2 tables - Current and History with a Union view. I would keep 2 years of data in the current table. When I load data, I move the oldest month of data from Current to History, and then truncate the current verision and reload. If the table is really big, I'm sure you could use partitioning and moving partitions to make it smoother.
I've got a situation where we have to drop and reload historical membership info. Clients retroactively change Effective and Termination dates for subscribers all the time. We have to account for the changes in aggregate tables. We have to remove 2 years worth of data and repopulate. The aggregate tables are fairly small so we rename the table to Table_Name_Old, recreate Table_Name, load upto the last 2 years of aggregated data from Table_Name_Old, and load the last 2 years of data from the most recent source.
I also have set it up so that I have 2 tables - Current and History with a Union view. I would keep 2 years of data in the current table. When I load data, I move the oldest month of data from Current to History, and then truncate the current verision and reload. If the table is really big, I'm sure you could use partitioning and moving partitions to make it smoother.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Late arriving dim *change* - how to update the fact?
Right, what you described, Jeff, is very similar to what I had in mind. The options I considered were A) to partially re-load my fact table or B) load my fact as type 2 dim and then put a view or another table on top materializing the latest state of the facts and expose that one for reporting.
Option A breaks the main modeling dogma about never updating the facts, many would't like it
Option B looks as over-engineered, besides leaves me with useless versions of the facts, which I know are wrong anyways.
Option A breaks the main modeling dogma about never updating the facts, many would't like it
Option B looks as over-engineered, besides leaves me with useless versions of the facts, which I know are wrong anyways.
antonkov- Posts : 5
Join date : 2012-06-18
Re: Late arriving dim *change* - how to update the fact?
To review, you have a type 2 dimension and discovered an attribute change that occurred in the past that was not captured and you need to restate history.
I am going to assume worst case, that multiple members had such changes and that they occurred at different points in time and some members had multiple changes to the attribute over time.
Any way to resolve this is going to be messy. The cleanest method would be to rebuild the dimension first, with correct timestamps for the changes. Keep the same PK for existing rows and assign new PKs to new rows added due to splitting rows when new changes occur. For those dimension rows that were split (including the original one) build a cross reference of two rows, old pk-old pk & old pk - new pk. I am assuming an existing row would only be split in two, and the changed attribute value will propagate to future rows without rekeying those rows (until the attribute changes again).
For those facts whose FK matches the old fk in the cross reference, you would check the dates of the old and new row and decide which is the correct one and reassign the FK if necessary.
I am going to assume worst case, that multiple members had such changes and that they occurred at different points in time and some members had multiple changes to the attribute over time.
Any way to resolve this is going to be messy. The cleanest method would be to rebuild the dimension first, with correct timestamps for the changes. Keep the same PK for existing rows and assign new PKs to new rows added due to splitting rows when new changes occur. For those dimension rows that were split (including the original one) build a cross reference of two rows, old pk-old pk & old pk - new pk. I am assuming an existing row would only be split in two, and the changed attribute value will propagate to future rows without rekeying those rows (until the attribute changes again).
For those facts whose FK matches the old fk in the cross reference, you would check the dates of the old and new row and decide which is the correct one and reassign the FK if necessary.
Re: Late arriving dim *change* - how to update the fact?
Essentially, it is a performance friendly technique of updating the fact table.For those facts whose FK matches the old fk in the cross reference, you would check the dates of the old and new row and decide which is the correct one and reassign the FK if necessary
Thank you for the suggestions!
antonkov- Posts : 5
Join date : 2012-06-18
Similar topics
» What to do when facts change? Update the fact table with update and deletes? or use SCD type 2?
» Part of fact information arives later
» Late Arriving Facts
» Preserving prior history for late arriving dimensions
» Not quite late arriving dimensions but similar case....
» Part of fact information arives later
» Late Arriving Facts
» Preserving prior history for late arriving dimensions
» Not quite late arriving dimensions but similar case....
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum