Incremental Load for fact table help
2 posters
Page 1 of 1
Incremental Load for fact table help
Hi All,
I have a fact table which has 8 million rows now and currently gets updated on drop and rebuild basis.
I will briefly describe the ETL process from our source to this fact table first.
1. I firstly incrementally load data from 8 source table into the 8 data storage tables in the data warehouse. (This data storage table contains all the historical data.)
Each source table contains date created and modified column so I retrieve the data that is created or modified later than the MAX date created or modified field on our each data storage table in the data warehouse.
2. Then I load the every data from the 8 data storage tables into the staging tables for transformation.
3. Once transformation is done, those data from 8 tables goes into the fact table. (Fact table is firstly truncated and stores those 8 million rows from the staging table)
I tried to incrementally load data from data storage tables to the fact table. However, there is one concern I can't figure out.
Lets assume we have 3 dimensions and one measure in the fact table called (product, provider, country as dimensions and sales_count as a fact) .
On the first day, one sales transaction is made and inserted into the fact table. Lets called the row ("iPhone", "Apple", "USA", 5)
And on the next day, manager found that the data entered incorrectly and fixed it to ("iPhone","Apple","UK", 5)
My approach for the incremental load was , firstly delete every records from the fact table that has the same dimension keys as in the last staging table, and insert those records.
But I realized that this approach can only handle when dimension key hasn't been updated. If any of dimension key gets updated in the source system, it will insert as a new record and leave the old record in the fact table.
In the above case, Dimension value has been updated so there is no way I can compare updated data with the row already inserted into the fact table on the first day.
Can someone please help me how I should handle this incremental load case?
I have a fact table which has 8 million rows now and currently gets updated on drop and rebuild basis.
I will briefly describe the ETL process from our source to this fact table first.
1. I firstly incrementally load data from 8 source table into the 8 data storage tables in the data warehouse. (This data storage table contains all the historical data.)
Each source table contains date created and modified column so I retrieve the data that is created or modified later than the MAX date created or modified field on our each data storage table in the data warehouse.
2. Then I load the every data from the 8 data storage tables into the staging tables for transformation.
3. Once transformation is done, those data from 8 tables goes into the fact table. (Fact table is firstly truncated and stores those 8 million rows from the staging table)
I tried to incrementally load data from data storage tables to the fact table. However, there is one concern I can't figure out.
Lets assume we have 3 dimensions and one measure in the fact table called (product, provider, country as dimensions and sales_count as a fact) .
On the first day, one sales transaction is made and inserted into the fact table. Lets called the row ("iPhone", "Apple", "USA", 5)
And on the next day, manager found that the data entered incorrectly and fixed it to ("iPhone","Apple","UK", 5)
My approach for the incremental load was , firstly delete every records from the fact table that has the same dimension keys as in the last staging table, and insert those records.
But I realized that this approach can only handle when dimension key hasn't been updated. If any of dimension key gets updated in the source system, it will insert as a new record and leave the old record in the fact table.
In the above case, Dimension value has been updated so there is no way I can compare updated data with the row already inserted into the fact table on the first day.
Can someone please help me how I should handle this incremental load case?
bonosungho- Posts : 2
Join date : 2012-02-03
Re: Incremental Load for fact table help
Hi,
how much new data you will be loading on each run and how many will be a update ?
Do you have a Order number field in the fact which helps in identifying the transaction, if yes then may be you can use that to identify
any updates to your transaction.
thanks
how much new data you will be loading on each run and how many will be a update ?
Do you have a Order number field in the fact which helps in identifying the transaction, if yes then may be you can use that to identify
any updates to your transaction.
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Re: Incremental Load for fact table help
hkandpal wrote:Hi,
how much new data you will be loading on each run and how many will be a update ?
Do you have a Order number field in the fact which helps in identifying the transaction, if yes then may be you can use that to identify
any updates to your transaction.
thanks
I believe the volume of the rows doesn't need be concerned at this stage.
I want to find out the right methodology which I can apply to this case.
Is there no way to update the fact if one of business key changes? or should the business key never gets updated with the best practice?
Also, I don't have a order number field in the fact.
bonosungho- Posts : 2
Join date : 2012-02-03
Re: Incremental Load for fact table help
Hi,
it depends upon what you want to capture in your DW, do you want to store the history of changes to your data or no it the requirement is to store the data than you should store it.
Now the question arises how to capture changes in your dimension, how are you identifying the data is it a order number / invoice number or PO number, as you may need something which you will use to idnetify the transaction.
thanks
it depends upon what you want to capture in your DW, do you want to store the history of changes to your data or no it the requirement is to store the data than you should store it.
Now the question arises how to capture changes in your dimension, how are you identifying the data is it a order number / invoice number or PO number, as you may need something which you will use to idnetify the transaction.
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Similar topics
» FACT table Incremental load!
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» Inventory Fact Table: How to load it?
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Integrating new fact table which has one to many relationship with the main fact table in existing star schema
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» Inventory Fact Table: How to load it?
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Integrating new fact table which has one to many relationship with the main fact table in existing star schema
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum