ETL incremental load strategies
3 posters
Page 1 of 1
ETL incremental load strategies
Hi,
I would like to know which is the best strategy in a ETL incremental load.
I have a CDC system that capture changes that I read, validate, transform and load in the target DB.
These changes belong to a period, all the rows that I treat correspond to a month (for example in Dec I treat November rows, and in January I will treat Decemeber rows).
For example in December, I am treating November rows and for the loading I have to optines:
1) In every ETL incremental load delete all rows of Nov in the target DB and insert the new rows.
- delete from targerdwb where sales_date >= 01/11/2015 and sales_date <= 30/11/2015
- insert CDC rows
2) In every incremental load I detect the rows that are news and the rows have to update.
In order to detect inserts or updates I have to make a join between rows to load and rows in the target DB.
Once detected inserts or updates (flag) I wil execute the SQL statements
I suppose the strategy depends in the volumen of the target DB and the volumen of rows in the CDC capture.
More or less the rows in the CDC are 20% at rhe beggining of the period and 5% at the end.
I will do performance tests to know the faster way of do the load, but I would like to know if there is any other thing
that will be usefull to take in consideration or if there are any other way to do the incremenrtal load.
Any advices about ETL incremental load will be usefull
Thanks in advance
I would like to know which is the best strategy in a ETL incremental load.
I have a CDC system that capture changes that I read, validate, transform and load in the target DB.
These changes belong to a period, all the rows that I treat correspond to a month (for example in Dec I treat November rows, and in January I will treat Decemeber rows).
For example in December, I am treating November rows and for the loading I have to optines:
1) In every ETL incremental load delete all rows of Nov in the target DB and insert the new rows.
- delete from targerdwb where sales_date >= 01/11/2015 and sales_date <= 30/11/2015
- insert CDC rows
2) In every incremental load I detect the rows that are news and the rows have to update.
In order to detect inserts or updates I have to make a join between rows to load and rows in the target DB.
Once detected inserts or updates (flag) I wil execute the SQL statements
I suppose the strategy depends in the volumen of the target DB and the volumen of rows in the CDC capture.
More or less the rows in the CDC are 20% at rhe beggining of the period and 5% at the end.
I will do performance tests to know the faster way of do the load, but I would like to know if there is any other thing
that will be usefull to take in consideration or if there are any other way to do the incremenrtal load.
Any advices about ETL incremental load will be usefull
Thanks in advance
juanvg1972- Posts : 25
Join date : 2015-05-05
Re: ETL incremental load strategies
Not sure I understand option #1. If your source is truly incremental, you will only have new or changed data, not the complete month. If it is incremental, #2 is the usual approach.
Assuming the source does contain the complete month, which performs better depends on your DBMS. In most cases option 2 performs better if the source is incremental.
Assuming the source does contain the complete month, which performs better depends on your DBMS. In most cases option 2 performs better if the source is incremental.
ETL incremental load strategies
Hi ngalemmo, thank for your answer,
My option 1 is an error, that was my older strategy, when I did'nt have a CDC system.
One more question:
If I have DW detail fact table (transactions) and DW aggregatted fact table, I am updating data of one month (November f. e.), that is my aproach:
CDC records (transactions) -> join with DW detail records of 1 month, and I flag updates and inserts in CDC records -> execute inserts and updates in DW detail table-> from DW detail table I make an aggregated of the month records and I load this records in DW aggregated table (previosly I delete the records of this month in aggregated table)
is this the usual aproach??, any advice to better performance?, best practices??
Thanks,
My option 1 is an error, that was my older strategy, when I did'nt have a CDC system.
One more question:
If I have DW detail fact table (transactions) and DW aggregatted fact table, I am updating data of one month (November f. e.), that is my aproach:
CDC records (transactions) -> join with DW detail records of 1 month, and I flag updates and inserts in CDC records -> execute inserts and updates in DW detail table-> from DW detail table I make an aggregated of the month records and I load this records in DW aggregated table (previosly I delete the records of this month in aggregated table)
is this the usual aproach??, any advice to better performance?, best practices??
Thanks,
juanvg1972- Posts : 25
Join date : 2015-05-05
Re: ETL incremental load strategies
Unless you have accumulating snapshot facts then you shouldn't normally be updating facts as they are a record of what happened at a point in time - so if you are just processing November's data then you shouldn't have any updates as this would be the 1st time you've processed November's facts and there shouldn't be changes to facts from previous months.
How you deal with aggregations is probably best judged on the performance of various options. If most of your existing aggregate records will change then it's probably quicker/easier to just drop them all and re-create them; if only a few aggregates will change (and you can easily identify them) then it's probably quicker just to either drop those and re-insert them or update them
How you deal with aggregations is probably best judged on the performance of various options. If most of your existing aggregate records will change then it's probably quicker/easier to just drop them all and re-create them; if only a few aggregates will change (and you can easily identify them) then it's probably quicker just to either drop those and re-insert them or update them
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Similar topics
» Question asked about ETL (little explanation needed)
» ETL for Fact load
» Loading Fact table
» Load Dimensions
» ETL Fact Load in SSIS
» ETL for Fact load
» Loading Fact table
» Load Dimensions
» ETL Fact Load in SSIS
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|