Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

How to handle situations where the data is deleted from the source system?

3 posters

Go down

How to handle situations where the data is deleted from the source system? Empty How to handle situations where the data is deleted from the source system?

Post  larus Tue Nov 13, 2012 2:27 pm

How to handle situations where the data is deleted from the source system? How can you check for this deleted data? Should this data also be deleted (or marked as deleted) from the DW?

larus

Posts : 5
Join date : 2011-03-01

Back to top Go down

How to handle situations where the data is deleted from the source system? Empty Re: How to handle situations where the data is deleted from the source system?

Post  ngalemmo Tue Nov 13, 2012 2:40 pm

Detecting deletes in a source is a challenge. Unless there is an explicit transaction to work from (or DB logs), the only way to detect them would be to match/merge the source population against the DW population.

On the DW side, you normally flag them as 'deleted'. Physically deleting a row, particularly a dimension, will mess up relational integrity with the facts.

Usually, this type of process need only be run periodically, such as weekly or monthly, depending on the need to know. You don't need much source information other than the natural keys of current members. If the need to know is more immediate, you may consider implementing triggers on the source tables of interest to write key information to a log table when a delete occurs, then drive the DW update process using the log.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

How to handle situations where the data is deleted from the source system? Empty Re: How to handle situations where the data is deleted from the source system?

Post  Mike Honey Wed Nov 14, 2012 3:33 am

Hi larus,
I assume you are trying to maintain a Dimension table? I usually handle this requirement by a kind of "reverse lookup" step in the ETL.

In the course of delivering the current data from the source system, I cache the set of source system business keys (e.g. in an SSIS Lookup cache). Then I add a post-step that reads all the Dimension table "current" business keys and compares them to the cache. Any rows that dont match are candidates to be "deleted". If you don't use SSIS I guess you could use a Staging table for this.

The "delete" would actually be to turn off a "Row Is Current" flag and close an Effective Date range.

HTH
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

How to handle situations where the data is deleted from the source system? Empty Re: How to handle situations where the data is deleted from the source system?

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum