Using Hash keys instead of Numeric sequence keys in Dimensional Model?
3 posters
Page 1 of 1
Using Hash keys instead of Numeric sequence keys in Dimensional Model?
Greetings,
I have been a proponent of the standard Dimensional Models for years and understand very well the reason for using surrogate keys that are implemented via numeric sequences. However, lately I have been getting into the Data Vault methodology which uses Hash Keys instead of numeric sequences and was wondering if anyone has used these in Dimensional Models? The benefits are apparent:
1. The hash keys are computed using the natural key value, so you can derive the values instead of looking them up.
2. They automatically provide an even distribution of values across partitions, which helps with parallel processing.
3. They will never "get out of synch" the way that Numeric sequences do.
The only disadvantage that I can think of is that these Hash Keys are 32-byte values instead of a 4 or 8 byte integer which supposedly will slow down joins dramatically.
Does anyone out there have any real world experience with this, or any experiments/performance tests that have been run on this?
Thanks,
SB
I have been a proponent of the standard Dimensional Models for years and understand very well the reason for using surrogate keys that are implemented via numeric sequences. However, lately I have been getting into the Data Vault methodology which uses Hash Keys instead of numeric sequences and was wondering if anyone has used these in Dimensional Models? The benefits are apparent:
1. The hash keys are computed using the natural key value, so you can derive the values instead of looking them up.
2. They automatically provide an even distribution of values across partitions, which helps with parallel processing.
3. They will never "get out of synch" the way that Numeric sequences do.
The only disadvantage that I can think of is that these Hash Keys are 32-byte values instead of a 4 or 8 byte integer which supposedly will slow down joins dramatically.
Does anyone out there have any real world experience with this, or any experiments/performance tests that have been run on this?
Thanks,
SB
sbendayan- Posts : 6
Join date : 2013-12-20
Re:Using Hash keys instead of Numeric sequence keys in Dimensional Model?
Hi ,
you may have duplicate since the hash function will return the deterministic value, which means the same natural key you will get the same hash value. This will crete problem when you are doing SCD 2 for a dimension.
thanks
you may have duplicate since the hash function will return the deterministic value, which means the same natural key you will get the same hash value. This will crete problem when you are doing SCD 2 for a dimension.
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Re:Using Hash keys instead of Numeric sequence keys in Dimensional Model?
For the Type 2 SCD problem, you could just hash the Load Date along with the Natural key of the dimension, I think.
sbendayan- Posts : 6
Join date : 2013-12-20
Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?
A 32 byte hash is a huge key. You won't get collisions, but performance will suffer. It really depends on where and when you want to take a performance hit: when you load the data or when users query the data.
The idea behind assigning surrogate keys using integer or bigint sequences is to take the hit one time when loading the data, rather than every time users query the data. 'Out of sync' sequences is a non-issue with proper backup and restore processes.
It's your choice.
The idea behind assigning surrogate keys using integer or bigint sequences is to take the hit one time when loading the data, rather than every time users query the data. 'Out of sync' sequences is a non-issue with proper backup and restore processes.
It's your choice.
Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?
Yes, I would tend to agree....the size would be the one huge downside to using these keys.....what I was wondering is if anyone has tried it and how much does it really slow down the queries?
sbendayan- Posts : 6
Join date : 2013-12-20
Similar topics
» Rule based algorithm to convert an ER model to a dimensional model
» Complexities of Relational Model and Simplicities of Dimensional Model
» Meta-model of Kimball dimensional model
» Why we use Dimensional Model over De-normalized relational Model ?
» Dimensional keys in both parent and child fact tables
» Complexities of Relational Model and Simplicities of Dimensional Model
» Meta-model of Kimball dimensional model
» Why we use Dimensional Model over De-normalized relational Model ?
» Dimensional keys in both parent and child fact tables
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum