Historical and Incremental Loading of Dimension and Fact tables
3 posters
Page 1 of 1
Historical and Incremental Loading of Dimension and Fact tables
Hello
I have been reading three of the Kimball books and playing with the MDWT_AdventureWorks_SSIS sample. I have learnt a lot but I have also become overwhelmed as I was new to DW/BI up until a few weeks ago. Also, the books are very detailed and there is a huge amount of information to digest.
I am now trying to start the ETL part of our project (we've completed the Dimensional Modelling). I have already hit a problem on my first dimension table. Doing the historical load was very simple using SSIS. But, we are now trying to find a SSIS sample that shows how to do "incremental load" into an already populated dimension table. Some questions:
- Is there a SSIS sample package that someone point me at
- Is "incremental load" the right term to use for this
- Is this covered in the Kimball books - I don't have the ETL book by the way, but I have the others
We haven't even considered our fact tables yet as we are focusing on the dimension table first. However, I am sure the same issues apply for fact tables so any advice or sample packages for fact tables would be greatly appreciated.
For your information, our DW will initially be loaded with 1 year of historical data and never be required to maintain more that 5 years of data.
Thanks in advance!
I have been reading three of the Kimball books and playing with the MDWT_AdventureWorks_SSIS sample. I have learnt a lot but I have also become overwhelmed as I was new to DW/BI up until a few weeks ago. Also, the books are very detailed and there is a huge amount of information to digest.
I am now trying to start the ETL part of our project (we've completed the Dimensional Modelling). I have already hit a problem on my first dimension table. Doing the historical load was very simple using SSIS. But, we are now trying to find a SSIS sample that shows how to do "incremental load" into an already populated dimension table. Some questions:
- Is there a SSIS sample package that someone point me at
- Is "incremental load" the right term to use for this
- Is this covered in the Kimball books - I don't have the ETL book by the way, but I have the others
We haven't even considered our fact tables yet as we are focusing on the dimension table first. However, I am sure the same issues apply for fact tables so any advice or sample packages for fact tables would be greatly appreciated.
For your information, our DW will initially be loaded with 1 year of historical data and never be required to maintain more that 5 years of data.
Thanks in advance!
amir2- Posts : 29
Join date : 2010-07-29
Re: Historical and Incremental Loading of Dimension and Fact tables
This article seems to do the trick
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
amir2- Posts : 29
Join date : 2010-07-29
Great article. Is this then current best practice?
amir2 wrote:This article seems to do the trick
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
Great blog post - thanks for the link, Amir. Is this approach (the use of checksums - or even better hashsums) to determine record changes then considered to be the current best practice for running incremental ETL loads?
Peace.
- Savage
NbleSavage- Posts : 24
Join date : 2010-11-28
Any examples / citations?
ttt for any practical incremental ETL examples along the lines of the fabulous Kimball SQL 2005 AdvWorks example depicting the historical load.
NbleSavage- Posts : 24
Join date : 2010-11-28
Re: Historical and Incremental Loading of Dimension and Fact tables
Be wary of checksum algorithms. Not all are appropriate for reliable change detection.
I like to use a 32 bit CRC. It is an established algorithm that is very sensitive to minor changes and transpositions in a string and produces a 32 bit hash, which as a false positive risk of 1 in 4 billion... well within acceptable limits. A 16 bit algorithm has 1 in 65K chance of a false positive, which is way too high.
Some suggest using an MD5 hash, which is 128 or 160? bits. I think it is overkill for simple change detection.
I like to use a 32 bit CRC. It is an established algorithm that is very sensitive to minor changes and transpositions in a string and produces a 32 bit hash, which as a false positive risk of 1 in 4 billion... well within acceptable limits. A 16 bit algorithm has 1 in 65K chance of a false positive, which is way too high.
Some suggest using an MD5 hash, which is 128 or 160? bits. I think it is overkill for simple change detection.
Re: Historical and Incremental Loading of Dimension and Fact tables
^^Great post^^
Thanks!!
Thanks!!
NbleSavage- Posts : 24
Join date : 2010-11-28
Similar topics
» Looking for advise on loading keys into factless fact tables
» Loading fact table and dimension indexing.
» Storing Date Keys in dimension tables versus fact tables
» Number of Columns in Fact Tables vs. Dimension Tables
» Automating dimension and fact loading ETL at the database level - Good or Bad?
» Loading fact table and dimension indexing.
» Storing Date Keys in dimension tables versus fact tables
» Number of Columns in Fact Tables vs. Dimension Tables
» Automating dimension and fact loading ETL at the database level - Good or Bad?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum