SCD Type2 - ETL Design
3 posters
Page 1 of 1
SCD Type2 - ETL Design
Hi,
We are trying to build a new data warehouse. Planning to capture the data as SCD Type2 in the Data Warehouse. However, the source system doesn't has any date fields for extraction.
Let me explain the scenario:
Table Name : DWS_CUST
Scenario 1
Columns : Customer No (PK) Cust_Name Cust_address1 Cust_address2 Cust_address3 Cust_address4 Start_date End_date
Data 1 XYZ 34 04-11-2008 01-01-1999
After few months say on 08-12-2008 the customer changes address: Incremental Load - Ovenight process in Data Warehouse.
Scenario 2
Columns : Customer No (PK) Cust_Name Cust_address1 Cust_address2 Cust_address3 Cust_address4 Start_date End_date
Data 1 XYZ 34 04-11-2008 07-12-2008
1 XYZ 89 08-12-2008 01-01-1999
So the data load which takes place overnight which starts at 08-12-2009 and carries forward to 09-12-2009 should make sure that the Start_date is still 08-12-2009.
To achive this so far upon research and thinking i felt the solution to this is to create a Load audit table that would identify the load date which is the Start_date and supply the information to all the records that are potential for incremental load.
If the record is new then it would Insert as in table whose Customer address 34 Scenario 1 .
If the record is for update then it would be as in customer address 89 Scenario 2.
Am I doing correct? If yes, how exactly can this be achieved practically?
Is there anything I'm missing? If I'm doing anything wrong please correct me. If there is any other way please suggest.
Regards,
KK
We are trying to build a new data warehouse. Planning to capture the data as SCD Type2 in the Data Warehouse. However, the source system doesn't has any date fields for extraction.
Let me explain the scenario:
Table Name : DWS_CUST
Scenario 1
Columns : Customer No (PK) Cust_Name Cust_address1 Cust_address2 Cust_address3 Cust_address4 Start_date End_date
Data 1 XYZ 34 04-11-2008 01-01-1999
After few months say on 08-12-2008 the customer changes address: Incremental Load - Ovenight process in Data Warehouse.
Scenario 2
Columns : Customer No (PK) Cust_Name Cust_address1 Cust_address2 Cust_address3 Cust_address4 Start_date End_date
Data 1 XYZ 34 04-11-2008 07-12-2008
1 XYZ 89 08-12-2008 01-01-1999
So the data load which takes place overnight which starts at 08-12-2009 and carries forward to 09-12-2009 should make sure that the Start_date is still 08-12-2009.
To achive this so far upon research and thinking i felt the solution to this is to create a Load audit table that would identify the load date which is the Start_date and supply the information to all the records that are potential for incremental load.
If the record is new then it would Insert as in table whose Customer address 34 Scenario 1 .
If the record is for update then it would be as in customer address 89 Scenario 2.
Am I doing correct? If yes, how exactly can this be achieved practically?
Is there anything I'm missing? If I'm doing anything wrong please correct me. If there is any other way please suggest.
Regards,
KK
KK_ETL- Posts : 3
Join date : 2009-11-05
Re: SCD Type2 - ETL Design
As far as I understand, your goal is to save the date from the previous day as a cut-off date, is that right?KK_ETL wrote:
So the data load which takes place overnight which starts at 08-12-2009 and carries forward to 09-12-2009 should make sure that the Start_date is still 08-12-2009.
To achive this so far upon research and thinking i felt the solution to this is to create a Load audit table that would identify the load date which is the Start_date and supply the information to all the records that are potential for incremental load.
If so, you can (actually you should for many other reasons, e.g. for monitoring) build the log table containing info about your load jobs.
The other solution, which we sometimes use, is an agreement, that a processing day is not the same as the calendar day (similarly to the difference between calendar year and financial year). You could for example agree that the processing day D starts at 21:00 of calendar day D and lasts till 20:59 of calendar day D+1. So in Oracle SQL scripts, instead of
trunc(SYSDATE)you would then write
trunc(SYSDATE - INTERVAL '21' HOUR)
Jacek Adamowicz
JacekA- Posts : 3
Join date : 2009-10-22
Re: SCD Type2 - ETL Design
Thank you JacekA.
What I did here was to create a Parameter File/Function on a Server that holds the business date and is only changed when the triggering element instructs to change the business date to new one. This also helps because the data can be processed for any historical date just by controlling the business date in the system.
But I have an issue here. Suppose if the data load fails on a given date. The next nightly loads should be able to pick up the data for different business dates. Is there something that anyone can suggest?
What I did here was to create a Parameter File/Function on a Server that holds the business date and is only changed when the triggering element instructs to change the business date to new one. This also helps because the data can be processed for any historical date just by controlling the business date in the system.
But I have an issue here. Suppose if the data load fails on a given date. The next nightly loads should be able to pick up the data for different business dates. Is there something that anyone can suggest?
KK_ETL- Posts : 3
Join date : 2009-11-05
Re: SCD Type2 - ETL Design
I have the potential solution but before that I would like to know what is the incremental mechanism you are following,i.e. how do you pull the incremental data each day,it all depends on that
kapoor_dh- Posts : 24
Join date : 2009-12-08
Re: SCD Type2 - ETL Design
Yes,We are currently using the journalizing method for incremental extraction. We have Journal tables in the source tables where whenever there is a change in the record in a table the respective journal table picks up this change. Using these tables we load the incremental data in the ODI.
KK_ETL- Posts : 3
Join date : 2009-11-05
Similar topics
» Data Vault v's Dimensional Model
» Tracking Type2 SCD when using a dimension outrigger
» Require comprehensive test cases for SCD Type2
» Dealing with multiple many to many related type2 SCDs
» Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion
» Tracking Type2 SCD when using a dimension outrigger
» Require comprehensive test cases for SCD Type2
» Dealing with multiple many to many related type2 SCDs
» Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum