How to handle situations where the data is deleted from the source system?
3 posters
Page 1 of 1
How to handle situations where the data is deleted from the source system?
How to handle situations where the data is deleted from the source system? How can you check for this deleted data? Should this data also be deleted (or marked as deleted) from the DW?
larus- Posts : 5
Join date : 2011-03-01
Re: How to handle situations where the data is deleted from the source system?
Detecting deletes in a source is a challenge. Unless there is an explicit transaction to work from (or DB logs), the only way to detect them would be to match/merge the source population against the DW population.
On the DW side, you normally flag them as 'deleted'. Physically deleting a row, particularly a dimension, will mess up relational integrity with the facts.
Usually, this type of process need only be run periodically, such as weekly or monthly, depending on the need to know. You don't need much source information other than the natural keys of current members. If the need to know is more immediate, you may consider implementing triggers on the source tables of interest to write key information to a log table when a delete occurs, then drive the DW update process using the log.
On the DW side, you normally flag them as 'deleted'. Physically deleting a row, particularly a dimension, will mess up relational integrity with the facts.
Usually, this type of process need only be run periodically, such as weekly or monthly, depending on the need to know. You don't need much source information other than the natural keys of current members. If the need to know is more immediate, you may consider implementing triggers on the source tables of interest to write key information to a log table when a delete occurs, then drive the DW update process using the log.
Re: How to handle situations where the data is deleted from the source system?
Hi larus,
I assume you are trying to maintain a Dimension table? I usually handle this requirement by a kind of "reverse lookup" step in the ETL.
In the course of delivering the current data from the source system, I cache the set of source system business keys (e.g. in an SSIS Lookup cache). Then I add a post-step that reads all the Dimension table "current" business keys and compares them to the cache. Any rows that dont match are candidates to be "deleted". If you don't use SSIS I guess you could use a Staging table for this.
The "delete" would actually be to turn off a "Row Is Current" flag and close an Effective Date range.
HTH
Mike
I assume you are trying to maintain a Dimension table? I usually handle this requirement by a kind of "reverse lookup" step in the ETL.
In the course of delivering the current data from the source system, I cache the set of source system business keys (e.g. in an SSIS Lookup cache). Then I add a post-step that reads all the Dimension table "current" business keys and compares them to the cache. Any rows that dont match are candidates to be "deleted". If you don't use SSIS I guess you could use a Staging table for this.
The "delete" would actually be to turn off a "Row Is Current" flag and close an Effective Date range.
HTH
Mike
Similar topics
» How to handle situations when there is a combined primary key in the source system?
» create a new record in a SCD2 for a subject that was deleted in the source system
» Is it a best practice that Data warehouse follows the source system data type?
» Poorly sturctured data at source system
» Source System Analysis
» create a new record in a SCD2 for a subject that was deleted in the source system
» Is it a best practice that Data warehouse follows the source system data type?
» Poorly sturctured data at source system
» Source System Analysis
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum