Handling a correction vs a change in SCD type II dimension.
2 posters
Page 1 of 1
Handling a correction vs a change in SCD type II dimension.
What approach to architect the data architecture and ETL to handle a correction instead of a change in SCD type II dimension, e.g. customer realizes that his address is not correct when he receives his first bill. This update to the customer data is not a change but a correction. Should it be handled as a special adhoc ETL job especially when a source system does not indicate the reason of update and would it requires to add audit tables to capture the correction ...
M. Khan- Posts : 11
Join date : 2012-07-24
Re: Handling a correction vs a change in SCD type II dimension.
Other than adding a 'reason for change' kind of attribute, you shouldn't do anything different.
The whole point of a type 2 is to store states of a dimension as it appeared in the DW. The fact that you will know is was one thing one day, then corrected to something else at some point in the future is the whole point of having a type 2. The fact that is was a correction versus a change is immaterial. As you state you don't have a reason of change attribute, I would put my efforts into changing the source system so you know why the row was updated and storing that information in the warehouse. It is time better spent than developing a 'special' process to handle this type of change. If you need to report current attribute values, you adjust your query against the dimension to retrieve the most current row. There are various techniques to do this.
The whole point of a type 2 is to store states of a dimension as it appeared in the DW. The fact that you will know is was one thing one day, then corrected to something else at some point in the future is the whole point of having a type 2. The fact that is was a correction versus a change is immaterial. As you state you don't have a reason of change attribute, I would put my efforts into changing the source system so you know why the row was updated and storing that information in the warehouse. It is time better spent than developing a 'special' process to handle this type of change. If you need to report current attribute values, you adjust your query against the dimension to retrieve the most current row. There are various techniques to do this.
Re: Handling a correction vs a change in SCD type II dimension.
"The fact that is was a correction versus a change is immaterial."
I see your point of capturing the reason of change in dimensional model from the source system, assuming the source system provides this info. If we leave bad record in the dimension, does it not violates major rule of data quality that provides the foundation to the any data warehouse.
I see your point of capturing the reason of change in dimensional model from the source system, assuming the source system provides this info. If we leave bad record in the dimension, does it not violates major rule of data quality that provides the foundation to the any data warehouse.
M. Khan- Posts : 11
Join date : 2012-07-24
Re: Handling a correction vs a change in SCD type II dimension.
No. Data quality is an operational issue, not a data warehouse issue. Sure, a data warehouse may get bad data, but the data warehouse is not where it should be fixed. The data warehouse needs to be accurate, not correct.
A type 2 reflects the state of the dimension at the time of the transaction. It is important that this not be 'corrected' otherwise you compromise accuracy. If the user wants to see the current state of the dimension, (i.e. after it has been corrected) then you do a self join on the type 2 dimension to locate the current row (that is what the 'current flag' attribute is all about).
If the business is only concerned about the current state of the dimension, don't implement it as a type 2.
A type 2 reflects the state of the dimension at the time of the transaction. It is important that this not be 'corrected' otherwise you compromise accuracy. If the user wants to see the current state of the dimension, (i.e. after it has been corrected) then you do a self join on the type 2 dimension to locate the current row (that is what the 'current flag' attribute is all about).
If the business is only concerned about the current state of the dimension, don't implement it as a type 2.
Similar topics
» Self referencing dimension - How to store Parent Key/Id as type II change
» Correction of Type-2 SCD Fields' Historical Values Results in Effective Date Range Splits with no Changes
» SCD Type 2 Change Reasons
» Type 2 Change Handling
» rationale behind dimension with Type 0 and missing Type 5
» Correction of Type-2 SCD Fields' Historical Values Results in Effective Date Range Splits with no Changes
» SCD Type 2 Change Reasons
» Type 2 Change Handling
» rationale behind dimension with Type 0 and missing Type 5
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum