Physical Deletes in Source Systems?
2 posters
Page 1 of 1
Physical Deletes in Source Systems?
Hello,
We have an interesting situation in my company's environment. We're experiencing physical deletes in the source OLTP systems and some of the facts/dimensions are never seen by the data warehouse environment as the deletes happen prior to the DW getting updated.
What's the best way to handle this restriction if we would like to implement incremental updates (as in more than once/day) in our DW environment?
Thanks!
We have an interesting situation in my company's environment. We're experiencing physical deletes in the source OLTP systems and some of the facts/dimensions are never seen by the data warehouse environment as the deletes happen prior to the DW getting updated.
What's the best way to handle this restriction if we would like to implement incremental updates (as in more than once/day) in our DW environment?
Thanks!
jchernev- Posts : 14
Join date : 2011-12-08
Re: Physical Deletes in Source Systems?
Depending on how much access you have to the source system and what DBMS it's running on, some Change Data Capture implementations - the one that I know of being SQL Server 2008 - will use the transaction logs to determine data changes, and will therefore detect and log deletions.
If that's not an option (and assuming you don't have enough influence to get the source system fixed!) then I think you're going to need a full load as the first step, but that doesn't mean you can't process incrementally once you've got it in your staging database.
The approach I've adopted where physical deletes are possible in the source data is to merge the full data set (or at least the columns I'm interested in) from the source system into a replica of the table in our staging database, applying a changed_date to each row we insert or update. If a row exists in our replica but not the source data then we give it a deleted_flag of 1 and again set the changed_date, then all processing thereafter can work off that date to load the rest of the warehouse incrementally. We use a MERGE statement so can take advantage of the WHEN NOT MATCHED BY SOURCE, but WHERE NOT EXISTS would also do the job.
If that's not an option (and assuming you don't have enough influence to get the source system fixed!) then I think you're going to need a full load as the first step, but that doesn't mean you can't process incrementally once you've got it in your staging database.
The approach I've adopted where physical deletes are possible in the source data is to merge the full data set (or at least the columns I'm interested in) from the source system into a replica of the table in our staging database, applying a changed_date to each row we insert or update. If a row exists in our replica but not the source data then we give it a deleted_flag of 1 and again set the changed_date, then all processing thereafter can work off that date to load the rest of the warehouse incrementally. We use a MERGE statement so can take advantage of the WHEN NOT MATCHED BY SOURCE, but WHERE NOT EXISTS would also do the job.
PeteGrace- Posts : 7
Join date : 2011-09-01
Similar topics
» Deletes in the source system for Type-2...
» Update and deletes in incremental loading of the fact table
» Is it possible to design a DW in parallel with its source systems?
» Linking or Merging Similar dimensions from Different Source Systems
» Multiple source systems, some with many Business Key namespaces, and little organisation engagement
» Update and deletes in incremental loading of the fact table
» Is it possible to design a DW in parallel with its source systems?
» Linking or Merging Similar dimensions from Different Source Systems
» Multiple source systems, some with many Business Key namespaces, and little organisation engagement
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum