Change Data Capture
3 posters
Page 1 of 1
Change Data Capture
Hi,
We are trying to chose a way to capture change from the source system (we have to support MSSQL and Oracle).
- i have problem understanding the difference between the two method : Using Audit Column and Timed Extract.
Plus, if we choose this technique, we need a way to capture the deleted row.
Relying on trigger is not a very good option for us, because we don't want to add trigger in every table.
Maybe, having a staging table with only the source id for a given table could do the trick ? (we can then compare with the source table to know which id have been deleted)
- i have problem understanding the difference between the two method : Process of Elimination and Initial and Incremental Loads. For both method, you seemed to compare two table, the current and the previous.
And how do you techniqualy implemet these method
We are trying to chose a way to capture change from the source system (we have to support MSSQL and Oracle).
- i have problem understanding the difference between the two method : Using Audit Column and Timed Extract.
Plus, if we choose this technique, we need a way to capture the deleted row.
Relying on trigger is not a very good option for us, because we don't want to add trigger in every table.
Maybe, having a staging table with only the source id for a given table could do the trick ? (we can then compare with the source table to know which id have been deleted)
- i have problem understanding the difference between the two method : Process of Elimination and Initial and Incremental Loads. For both method, you seemed to compare two table, the current and the previous.
And how do you techniqualy implemet these method
ATran- Posts : 11
Join date : 2010-03-25
Re: Change Data Capture
One approach is to use the database's own change logs. Some ETL tools support this type of interface. The challenge is the nature of these logs vary by database system, so you need different code for each database.
The most common generic method is to extract based on an update timestamp stored on the source row. But not all systems provide such a column. If available, the extract process would keep track of the last time it extracted and pull all rows changed since then. However, this approach will not capture hard deletes.
The brute force approach would be to pull everything (within reason) from the source and compare it to what is in the DW and apply changes.
But what you need to do for a particular situation varies depending on the nature of the system and the data you are working with. There is no one method that suits everything and a lot of variations to the methods I mentioned.
The most common generic method is to extract based on an update timestamp stored on the source row. But not all systems provide such a column. If available, the extract process would keep track of the last time it extracted and pull all rows changed since then. However, this approach will not capture hard deletes.
The brute force approach would be to pull everything (within reason) from the source and compare it to what is in the DW and apply changes.
But what you need to do for a particular situation varies depending on the nature of the system and the data you are working with. There is no one method that suits everything and a lot of variations to the methods I mentioned.
Re: Change Data Capture
Reading the log is too specific, and required two big developement when having 2 differents database system.
I think the simplest and better solution for us will be to use the update timestamp column in the source system.
To track hard delete row i may use this solution if i don't find any other :
To track delete row from table EMPLOYEES, i will have a staging table STG_EMPLOYEES with only the existing id from the previous ETL process.
And i will compare the id from the current table EMPLOYEES and STG_EMPLOYEES to know which id was deleted
ATran- Posts : 11
Join date : 2010-03-25
Re: Change Data Capture
I realize this is an old post, but I just came across it. I don't believe ATran's first question was answered, at least I still have the same question. In Ralph Kimball's Design Tip #63 he describes 4 ways to detect changes.
#1 is by looking at "Audit Columns", columns stored on the source record that indicate when the record was created or last updated.
#3 is "Timed extracts. With a timed extract you typically select all of the rows where the date in the Create or Modified date fields equal" ... to some relative date like TODAY -1.
What's the difference between these two options for detecting changed records? It appears to me #3 is just #1 looking at yesterday's records. Am I missing something?
#1 is by looking at "Audit Columns", columns stored on the source record that indicate when the record was created or last updated.
#3 is "Timed extracts. With a timed extract you typically select all of the rows where the date in the Create or Modified date fields equal" ... to some relative date like TODAY -1.
What's the difference between these two options for detecting changed records? It appears to me #3 is just #1 looking at yesterday's records. Am I missing something?
Steveo250k- Posts : 6
Join date : 2012-08-10
Similar topics
» ETL from Oracle to SQL Server 2008 Data Warehouse
» Using checksums for change data capture...
» Change Data Capture(CDC) when running Batch jobs
» Configuring SQL Server 2008 R2 Change Data Capture with SSAS
» Much ado about overlapping type 2 change
» Using checksums for change data capture...
» Change Data Capture(CDC) when running Batch jobs
» Configuring SQL Server 2008 R2 Change Data Capture with SSAS
» Much ado about overlapping type 2 change
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum