Adjusting historic data in a type II SCD
3 posters
Page 1 of 1
Adjusting historic data in a type II SCD
Forum,
In some circumstances we are required to adjust historic data in our SCD tables. For example one of the upstream systems sends through corrected data for the most recent month end. We can't see a straightforward way to do this - seems that SCDs just aren't meant to cope with historic data that occasionally needs correction. We have similar problems whenever we need to add a new attribute to a dimension - and we also want to load accurate history into that attribute (assuming of course that we have accurate history in staging). These two problems are creating a perception that SCDs result in a fragile data model that can't cope with changes to history.
Any thoughts most welcome.
Jim
In some circumstances we are required to adjust historic data in our SCD tables. For example one of the upstream systems sends through corrected data for the most recent month end. We can't see a straightforward way to do this - seems that SCDs just aren't meant to cope with historic data that occasionally needs correction. We have similar problems whenever we need to add a new attribute to a dimension - and we also want to load accurate history into that attribute (assuming of course that we have accurate history in staging). These two problems are creating a perception that SCDs result in a fragile data model that can't cope with changes to history.
Any thoughts most welcome.
Jim
jimbo- Posts : 1
Join date : 2009-04-19
Re: Adjusting historic data in a type II SCD
I agree. Restating history is problematic in nearly all databases I have encountered, not just the dimensional ones. The complexity involved in managing this history is usually enough to encourage most folks to simply overwrite it. If you truly want to manage history correctly, Tom Johnston and Randall Weiss wrote a series of articles that address the topic as thoroughly as anyone I've seen. You can find a listing of the articles here, http://mindfuldata.com/Publications/pubs-pdf/The%20Time%20and%20Time%20Again%20Series.pdf. Set aside some time for reading since the topic tends to make your head hurt.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Adjusting historic data in a type II SCD
Consider the implications regarding SCD design if you can revise history in a type 2 SCD without raising a storm of complaints from users. There is a high degree of probability that a type 1 or 3 SCD is what the users really need for the revisable columns, no matter what was said during requirements gathering.
I think that the biggest challenges around SCDs relate to getting the requirements right, and to cases where different sets of users truly have different requirements. The best way that I've found to get the requirements right is to discuss real-life cases with users using their data (and reports if the application is live). Questions like "what do you want to see in this column of this report if the product description was changed in March?" tend to yield much better answers than "for which product attributes do you need the BI system to track history?" (which almost always gets one of 2 answers: "all" or "none", neither of which turns out to be entirely correct).
I agree that the Time and Time Again series of articles gives an excellent analysis of temporal issues and provides good techniques for dealing with them; I would caution you to re-examine assumptions and requirements that were used to create the type 2 design before implementing bi- or tri-temporal SCDs (see the articles), as you may be adding complexity in cases where simplifying (e.g. type 1) may satisfy the requirements.
Since you already have the type 2 implemented, and presumably some users, you have an approach available that was not possible during requirements gathering and design: you can notify users every time there is a revision made to history (this is one of the tasks in Mr. Kimball's more recent descriptions of the process of publishing an updated version of dimension data). The notification should point out that reports for past time periods may return different results today than they did yesterday, due to revising of history. If the users appreciate the notification and can use it to do their jobs better, you can probably get them to fund a project to implement a design change to make it more robust (such as bi-temporal SCD). If the users ask you to please stop bothering them with the notifications you have a pretty good indication that type 2 was used on some columns that don't really need their history tracked.
I think that the biggest challenges around SCDs relate to getting the requirements right, and to cases where different sets of users truly have different requirements. The best way that I've found to get the requirements right is to discuss real-life cases with users using their data (and reports if the application is live). Questions like "what do you want to see in this column of this report if the product description was changed in March?" tend to yield much better answers than "for which product attributes do you need the BI system to track history?" (which almost always gets one of 2 answers: "all" or "none", neither of which turns out to be entirely correct).
I agree that the Time and Time Again series of articles gives an excellent analysis of temporal issues and provides good techniques for dealing with them; I would caution you to re-examine assumptions and requirements that were used to create the type 2 design before implementing bi- or tri-temporal SCDs (see the articles), as you may be adding complexity in cases where simplifying (e.g. type 1) may satisfy the requirements.
Since you already have the type 2 implemented, and presumably some users, you have an approach available that was not possible during requirements gathering and design: you can notify users every time there is a revision made to history (this is one of the tasks in Mr. Kimball's more recent descriptions of the process of publishing an updated version of dimension data). The notification should point out that reports for past time periods may return different results today than they did yesterday, due to revising of history. If the users appreciate the notification and can use it to do their jobs better, you can probably get them to fund a project to implement a design change to make it more robust (such as bi-temporal SCD). If the users ask you to please stop bothering them with the notifications you have a pretty good indication that type 2 was used on some columns that don't really need their history tracked.
tim_huck- Posts : 8
Join date : 2009-04-09
Location : Evanston, Illinois
Similar topics
» Type-2 Dates as Date Data Type ?
» Is it a best practice that Data warehouse follows the source system data type?
» "Noisy" data
» Data type matching between OLTP & DW
» where to keep sql server data type text
» Is it a best practice that Data warehouse follows the source system data type?
» "Noisy" data
» Data type matching between OLTP & DW
» where to keep sql server data type text
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum