Updating historic transactions and snapshots
2 posters
Page 1 of 1
Updating historic transactions and snapshots
Hi,
I have a project where we will be building Transactional Fact Tables along with daily Periodic Snapshot Fact tables based on the same data.
Now, I'll try to be as brief as possible, but this needs some explaining, this is the dilemma;
The task seems straight forward, but the business is able to go back in time and update some of the transactional records and they want to reflect this in the DW, while keeping the old record along with the update as a new record and an audit log of when it was changed, etc...
If I update the Transaction Fact Table with this new data I would just use an active flag so the business can distinguish between the records (I have done this in other parts of the DW), but how do I handle the Periodic Snapshots?
I would assume I would need to go through all snapshots after this update and reflect on the changes, otherwise they would not match up. The snapshot can be calculated based on all previous transactions to the snapshot being processed, so this can be achieved from the Transactional Fact Table, but we are talking about processing 150,000,000+ rows for every snapshot we need to update, resulting in about 2,500,000+ records per snapshot. With these transactions having the ability to be changed at any point of time, ie. 6 months ago, 1 year ago, etc... it could be a very long and messy process.
So, I'm not sure if I should be updating the entire snapshot, or updating only those that have changed within the snapshot. Doing the entire snapshot would mean it should be 100% accurate and does not matter how many records have been updated, but would take (I would imagine) a very long time, whereas only updating the changed records could be faster if only a few records have been changed, but has the potential of missing records.
I know that a DW is meant to be accurate at the point of time the snapshot was taken, but this is what the business is asking for, are they being unrealistic?
What are peoples opinions and advice on this matter? This is just an example of one business processes within the DW, but most have the same requirements, so far about 15 of them.
PS. Another problem I am faced with due to this process is that the reports that have already been generated in the past and been published or used in some way, would now be inaccurate based on the updated DW. How/can this be handled?
I have a project where we will be building Transactional Fact Tables along with daily Periodic Snapshot Fact tables based on the same data.
Now, I'll try to be as brief as possible, but this needs some explaining, this is the dilemma;
The task seems straight forward, but the business is able to go back in time and update some of the transactional records and they want to reflect this in the DW, while keeping the old record along with the update as a new record and an audit log of when it was changed, etc...
If I update the Transaction Fact Table with this new data I would just use an active flag so the business can distinguish between the records (I have done this in other parts of the DW), but how do I handle the Periodic Snapshots?
I would assume I would need to go through all snapshots after this update and reflect on the changes, otherwise they would not match up. The snapshot can be calculated based on all previous transactions to the snapshot being processed, so this can be achieved from the Transactional Fact Table, but we are talking about processing 150,000,000+ rows for every snapshot we need to update, resulting in about 2,500,000+ records per snapshot. With these transactions having the ability to be changed at any point of time, ie. 6 months ago, 1 year ago, etc... it could be a very long and messy process.
So, I'm not sure if I should be updating the entire snapshot, or updating only those that have changed within the snapshot. Doing the entire snapshot would mean it should be 100% accurate and does not matter how many records have been updated, but would take (I would imagine) a very long time, whereas only updating the changed records could be faster if only a few records have been changed, but has the potential of missing records.
I know that a DW is meant to be accurate at the point of time the snapshot was taken, but this is what the business is asking for, are they being unrealistic?
What are peoples opinions and advice on this matter? This is just an example of one business processes within the DW, but most have the same requirements, so far about 15 of them.
PS. Another problem I am faced with due to this process is that the reports that have already been generated in the past and been published or used in some way, would now be inaccurate based on the updated DW. How/can this be handled?
cal.sneds- Posts : 4
Join date : 2009-06-02
Location : Melbourne
Re: Updating historic transactions and snapshots
The easiest way to handle this is to just apply changes. When up update the fact table create a negative version of the previous fact and a positive version of the new fact, collect them in a table, aggregate to the same grain as the snapshot and update the snapshot.
Re: Updating historic transactions and snapshots
By adding a negative fact for the old row, wont that then present issues if the users were to run reports on say how many transactions occurred on a particular day. This would show an extra 2 transaction per update, I know the values will tally up correctly, but the counts will not.
Also, just so that I understand the snapshot update, are you saying that when I update the Transactional Fact Table, to keep a record of all the updates in another table (really only need the keys and date), and then run a script at the end that aggregates only the changes based on whats in this new table and update the Snapshot using these keys for every Snapshot from when the update is based?
Also, just so that I understand the snapshot update, are you saying that when I update the Transactional Fact Table, to keep a record of all the updates in another table (really only need the keys and date), and then run a script at the end that aggregates only the changes based on whats in this new table and update the Snapshot using these keys for every Snapshot from when the update is based?
cal.sneds- Posts : 4
Join date : 2009-06-02
Location : Melbourne
Re: Updating historic transactions and snapshots
You get around the count problem by having a count measure in the fact table. It would contain either 1 or -1 and would be summed to get the count. You would also add an 'as of date' dimension to the fact.
The advantage of applying updates this way is it provides an audit of every change and provides a way of recreating a report at any point in time by filtering on the as of date.
And yes, you would collect the changes (negative and positive rows) in another table (you need to include measures as well) and use an aggregate of that to update your aggregates.
The advantage of applying updates this way is it provides an audit of every change and provides a way of recreating a report at any point in time by filtering on the as of date.
And yes, you would collect the changes (negative and positive rows) in another table (you need to include measures as well) and use an aggregate of that to update your aggregates.
Similar topics
» Current and Historic Dimensions (one table or two?)
» Adjusting historic data in a type II SCD
» Periodic snapshots - what is better?
» Snapshots and Date Dimensions
» Daily Snapshots Best Practices
» Adjusting historic data in a type II SCD
» Periodic snapshots - what is better?
» Snapshots and Date Dimensions
» Daily Snapshots Best Practices
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum