Preserving prior history for late arriving dimensions
4 posters
Page 1 of 1
Preserving prior history for late arriving dimensions
Can't find much about this topic anywhere else so hopefully someone can help me.
Our warehouse has late arriving dimensions for an Employee. Their department changes, for example, can be backdated in the systems and so we have to account for this and adjust the effective dates in the SCD and the relevant surrogate keys in the affected sales fact table.
This is covered nicely in the "Data Warehouse Toolkit" under Health Care, so that bit's fine.
Now what I need to do is allow users to report on the historic version of the truth so that any reports that are rerun for a prior month end will reflect the totals reported at the end of that month (and so will not take into account the backdating).
What I intend to have is an aggregate table built on top of the sales fact which is refreshed daily. At the end of the month I will siphon this off into a monthly snapshot table and add a month dimension surrogate key. This will give me the correct totals by employee at the end of each month.
The problem I will have is that after the snapshot is taken, an employee's department move gets backdated (and the relevant dimension rows) and the corresponding rows in the snapshot table will then point to the incorrect department.
Do I :-
a) add additional "Prior history department" columns to the employee dimension to the record
b) adjust the surrogate keys in the snapshot table in the same way I would with the fact tables
c) forget this idea as a lost cause
Sorry for the long winded explanation, hopefully my issue is clear.
Thanks,
B
Our warehouse has late arriving dimensions for an Employee. Their department changes, for example, can be backdated in the systems and so we have to account for this and adjust the effective dates in the SCD and the relevant surrogate keys in the affected sales fact table.
This is covered nicely in the "Data Warehouse Toolkit" under Health Care, so that bit's fine.
Now what I need to do is allow users to report on the historic version of the truth so that any reports that are rerun for a prior month end will reflect the totals reported at the end of that month (and so will not take into account the backdating).
What I intend to have is an aggregate table built on top of the sales fact which is refreshed daily. At the end of the month I will siphon this off into a monthly snapshot table and add a month dimension surrogate key. This will give me the correct totals by employee at the end of each month.
The problem I will have is that after the snapshot is taken, an employee's department move gets backdated (and the relevant dimension rows) and the corresponding rows in the snapshot table will then point to the incorrect department.
Do I :-
a) add additional "Prior history department" columns to the employee dimension to the record
b) adjust the surrogate keys in the snapshot table in the same way I would with the fact tables
c) forget this idea as a lost cause
Sorry for the long winded explanation, hopefully my issue is clear.
Thanks,
B
Bisquite- Posts : 5
Join date : 2010-02-07
Re: Preserving prior history for late arriving dimensions
If you must carry both versions of the fact, then add a current flag and effective/expiration date columns to the fact table. When you need to restate a fact, expire the old one and insert a new row. The current flag is not absolutely necessary (you could look for unexpired rows) but it does make it easier and more efficient to filter current state in queries, particularly if you DBMS supports bitmap indexes.
Re: Preserving prior history for late arriving dimensions
I have the same situation; backdated data to a dimension SCD type 2 (not the fact table) and want to be able to rerun queries to reproduce results as data was before any corrections/additions to the dimension as well as run queries taking the corrected dimension into account. I am not 100% sure I understand the proposed solution.
Do you
* keep a version of the row in the dimension with the original from and to dates
* add a new row to the dimension with the backdated data - that is corrected from and to dates
* add rows to the fact table with surrogate key for the "new" version of the dimension and keep rows in the fact with the "old" version of the dimension
* separate the facts with the old surrogate dimension key from the new by using effective from and to date in the fact table
Does it make sense?
Thanks, Trond Bjerkan
Do you
* keep a version of the row in the dimension with the original from and to dates
* add a new row to the dimension with the backdated data - that is corrected from and to dates
* add rows to the fact table with surrogate key for the "new" version of the dimension and keep rows in the fact with the "old" version of the dimension
* separate the facts with the old surrogate dimension key from the new by using effective from and to date in the fact table
Does it make sense?
Thanks, Trond Bjerkan
TrondBjerkan- Posts : 3
Join date : 2010-03-24
Re: Preserving prior history for late arriving dimensions
* keep a version of the row in the dimension with the original from and to dates
* add a new row to the dimension with the backdated data - that is corrected from and to dates
No. The dates in the dimension table should represent when, from a business perspective, the particular version of a dimension is in effect. This is not to say there would also be internal timestamps to record when the row was inserted or updated. In other words, it is type 2 as usual. Also, a late arriving scenario may have nothing to do with a change in a dimension row, but rather a retroactive change to a fact/dimension relationship. Such as someone changing jobs or transferring to another department.
* add rows to the fact table with surrogate key for the "new" version of the dimension and keep rows in the fact with the "old" version of the dimension
* separate the facts with the old surrogate dimension key from the new by using effective from and to date in the fact table
Yes. The effective/expiration dates on the fact would represent when you knew about it.
So it gives you two perspectives to the fact/dimension relationship: what it should be and when did you find out.
Re: Preserving prior history for late arriving dimensions
My situation is not related to sales and department but I can think of fictive case.
I am thinking of a scenario where sales bonus to sales persons is settled.
A query I could think of is:
“Give me the sales for all persons in department A on a specific date e.g. 1st of February”
I could use the employee dimension to locate the persons on the given date, that is rows in the dimension that fall between from and to date and then join the sales facts for those rows.
If the to and from dates in the dimension are changed due to backdating for a person in department A (he/she moved to department B on the 20th of January) I am not able to get the same result as before (or ?).
I want the same result (even though the person was not in department A at the time) due to auditing, e.g. so I can explain why a sales bonus is paid to a sales person (or why an action has been taken). You could argue that the sales bonus should not have been paid in the first place since the person was not in the department, but in this fictive case I am not able to get already paid bonus back and have to keep the track due to auditing.
I also want be able to give the real sales per department, that is taking the backdating into account.
So I want is, the data as it was and the data as it is (at the moment).
I am still in the process of understanding the use of effective to and from dates on the fact table and can not yet determine if the above case is solved by using the dates on the fact table.
I can see the use of the dates on the facts when there are backdated corrections to measures of the fact so you can keep several versions of the fact delimited by the date span. But when it comes to versions of dimensions I am still learning.
Any pointers to urls, examples or books are very welcome
Thanks again, Trond Bjerkan
I am thinking of a scenario where sales bonus to sales persons is settled.
A query I could think of is:
“Give me the sales for all persons in department A on a specific date e.g. 1st of February”
I could use the employee dimension to locate the persons on the given date, that is rows in the dimension that fall between from and to date and then join the sales facts for those rows.
If the to and from dates in the dimension are changed due to backdating for a person in department A (he/she moved to department B on the 20th of January) I am not able to get the same result as before (or ?).
I want the same result (even though the person was not in department A at the time) due to auditing, e.g. so I can explain why a sales bonus is paid to a sales person (or why an action has been taken). You could argue that the sales bonus should not have been paid in the first place since the person was not in the department, but in this fictive case I am not able to get already paid bonus back and have to keep the track due to auditing.
I also want be able to give the real sales per department, that is taking the backdating into account.
So I want is, the data as it was and the data as it is (at the moment).
I am still in the process of understanding the use of effective to and from dates on the fact table and can not yet determine if the above case is solved by using the dates on the fact table.
I can see the use of the dates on the facts when there are backdated corrections to measures of the fact so you can keep several versions of the fact delimited by the date span. But when it comes to versions of dimensions I am still learning.
Any pointers to urls, examples or books are very welcome
Thanks again, Trond Bjerkan
TrondBjerkan- Posts : 3
Join date : 2010-03-24
Re: Preserving prior history for late arriving dimensions
There is no reason why you cannot maintain a pair of 'when did I know this' dates in the dimensions as well. Because you are talking about two different sets of dates: the primary set are the dates, from a business perspective, the particular dimension row reflects reality. The second set of dates are, from a technical perspective, is when the system knew about it.
If you tried to do it with only the primary set, you would wind up with overlaps that would cause all sorts of problems. In your particular scenario, you would filter on the when did I know it dates.
If you tried to do it with only the primary set, you would wind up with overlaps that would cause all sorts of problems. In your particular scenario, you would filter on the when did I know it dates.
Re: Preserving prior history for late arriving dimensions
You are right. The overlaping dates did troubled me. I guess I was blinded by the typical SCD type 2 design using only one set of from and to dates. You pointed and push me in other direction, thanks.
I will explore the use of two sets of dates set.
I will explore the use of two sets of dates set.
TrondBjerkan- Posts : 3
Join date : 2010-03-24
Re: Preserving prior history for late arriving dimensions
Go read Tom Johnston and Randall Weis' articles on managing time in databases. It will describe every possible situation you may want to cover. I say may because more often than not, the business doesn't care.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Not quite late arriving dimensions but similar case....
» Late Arriving Facts
» Late Arriving Dimension Data
» Design Alternatives for Late Arriving/Updated Type 2 Dimension Data
» Late arriving dim *change* - how to update the fact?
» Late Arriving Facts
» Late Arriving Dimension Data
» Design Alternatives for Late Arriving/Updated Type 2 Dimension Data
» Late arriving dim *change* - how to update the fact?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum