Tracking of historical data using SCD2 in a non-dimensional data model
2 posters
Page 1 of 1
Tracking of historical data using SCD2 in a non-dimensional data model
Hi all. In my current project we use a standard model for a financial warehouse build by SAS Institute. In this model the tracking of historical changes in data is done using SCD2 Type in a non-dimensional model. Now this model was implemented a while ago with a team which wasn't that experienced in planning or implementing DWH. They didn't grab the tracking of changes and invented a new method to track changes. I want to challange this method. Maybe someone worked with something like that before?
The initial point is: SAS uses the fields VALID_FROM and VALID_TO to track changes. Apart from that they suggest to use a pair of fields called EFFECTIVE_FROM and EFFECTIVE_TO to cover special reporting needs driven by business. These fields are to be thought as attributes rather than timely dimensions.
Now here we have the situation that the fields VALID_FROM and VALID_TO are filled with the actual load data (VALID_FROM) and 31.12.9999 (VALID_TO) and the changes as we know them from SCD2 will be tracked in EFFECTIVE_FROM and EFFECTIVE_TO. VALID_FROM and VALID_TO will be updated only, when they found some error in the data and re-deliver the same data. So to get the current data one has to select the VALID fields with the current date first and then find the correct timestamp using the EFFECTIVE pair. A typical data situation could be:
ID|VALID_FROM|VALID_TO|EFFECTIVE_FROM|EFFECTIVE_TO|ATTRIBUTE
1|01.01.2012|31.12.9999|15.09.2011|31.01.2012|X
1|01.01.2012|31.12.9999|01.02.2012|31.12.9999|U
Now if the data will be corrected on March, 5th 2012 by business (why so ever) it could look lilke this:
ID|VALID_FROM|VALID_TO|EFFECTIVE_FROM|EFFECTIVE_TO|ATTRIBUTE
1|01.01.2012|31.12.9999|15.09.2011|31.01.2012|X
1|01.01.2012|04.03.2012|01.02.2012|31.12.9999|U
1|05.03.2012|31.12.9999|15.01.2012|31.12.9999|U
This looks nice to me but it is unusual. Anyone encountered something like this before?
Thanks for comments.
D
The initial point is: SAS uses the fields VALID_FROM and VALID_TO to track changes. Apart from that they suggest to use a pair of fields called EFFECTIVE_FROM and EFFECTIVE_TO to cover special reporting needs driven by business. These fields are to be thought as attributes rather than timely dimensions.
Now here we have the situation that the fields VALID_FROM and VALID_TO are filled with the actual load data (VALID_FROM) and 31.12.9999 (VALID_TO) and the changes as we know them from SCD2 will be tracked in EFFECTIVE_FROM and EFFECTIVE_TO. VALID_FROM and VALID_TO will be updated only, when they found some error in the data and re-deliver the same data. So to get the current data one has to select the VALID fields with the current date first and then find the correct timestamp using the EFFECTIVE pair. A typical data situation could be:
ID|VALID_FROM|VALID_TO|EFFECTIVE_FROM|EFFECTIVE_TO|ATTRIBUTE
1|01.01.2012|31.12.9999|15.09.2011|31.01.2012|X
1|01.01.2012|31.12.9999|01.02.2012|31.12.9999|U
Now if the data will be corrected on March, 5th 2012 by business (why so ever) it could look lilke this:
ID|VALID_FROM|VALID_TO|EFFECTIVE_FROM|EFFECTIVE_TO|ATTRIBUTE
1|01.01.2012|31.12.9999|15.09.2011|31.01.2012|X
1|01.01.2012|04.03.2012|01.02.2012|31.12.9999|U
1|05.03.2012|31.12.9999|15.01.2012|31.12.9999|U
This looks nice to me but it is unusual. Anyone encountered something like this before?
Thanks for comments.
D
dolic- Posts : 2
Join date : 2012-05-04
Re: Tracking of historical data using SCD2 in a non-dimensional data model
If there is an error in the data why do they deliver the same data again?
What do they track with these two rows?
1|01.01.2012|04.03.2012|01.02.2012|31.12.9999|U
1|05.03.2012|31.12.9999|15.01.2012|31.12.9999|U
What do they track with these two rows?
1|01.01.2012|04.03.2012|01.02.2012|31.12.9999|U
1|05.03.2012|31.12.9999|15.01.2012|31.12.9999|U
umutiscan- Posts : 51
Join date : 2010-11-26
Age : 44
Location : Istanbul, Turkey
Re: Tracking of historical data using SCD2 in a non-dimensional data model
Hi.
Yes, you're right. There the sample shows the handling of an error: As you can see, the EFFECTIVE_FROM date was changed in the new data line:
ID|VALID_FROM|VALID_TO|EFFECTIVE_FROM|EFFECTIVE_TO|ATTRIBUTE
1|01.01.2012|31.12.9999|15.09.2011|31.01.2012|X
1|01.01.2012|04.03.2012|01.02.2012|31.12.9999|U
1|05.03.2012|31.12.9999|15.01.2012|31.12.9999|U
On March, 5th they discovered that ATTRIBUTE U was valid from 15.01.2012 and not from 01.02.2012. So they introduced a new data line, closed the VALID_FROM|VALID_TO pair from the old line. Thats the was to be able to re construct the old (wrong) data reporting.
Yes, you're right. There the sample shows the handling of an error: As you can see, the EFFECTIVE_FROM date was changed in the new data line:
ID|VALID_FROM|VALID_TO|EFFECTIVE_FROM|EFFECTIVE_TO|ATTRIBUTE
1|01.01.2012|31.12.9999|15.09.2011|31.01.2012|X
1|01.01.2012|04.03.2012|01.02.2012|31.12.9999|U
1|05.03.2012|31.12.9999|15.01.2012|31.12.9999|U
On March, 5th they discovered that ATTRIBUTE U was valid from 15.01.2012 and not from 01.02.2012. So they introduced a new data line, closed the VALID_FROM|VALID_TO pair from the old line. Thats the was to be able to re construct the old (wrong) data reporting.
dolic- Posts : 2
Join date : 2012-05-04
Re: Tracking of historical data using SCD2 in a non-dimensional data model
We are not talking about a dimensional model and it's difficult to say that your example is a slowly changing dimension. So if it is working, that's ok for me..
But I think a data model has to be easy to understand and use. I don't know the requirement but if i were you i would implement a simpler model. Using special columns only for data updates is not so feasible for me. But as i said before, if it is working no problem for me.
umutiscan- Posts : 51
Join date : 2010-11-26
Age : 44
Location : Istanbul, Turkey
Similar topics
» Tracking the removed dimensional records in source data
» Data Vault v's Dimensional Model
» creating dimensional model of log data
» Dimensional Model from a Hierarchical Data Source
» Is An Historical Database Always a Data Warehouse?
» Data Vault v's Dimensional Model
» creating dimensional model of log data
» Dimensional Model from a Hierarchical Data Source
» Is An Historical Database Always a Data Warehouse?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum