best way for dimension table lookups, if natural key is multi-column (string)
+2
ngalemmo
JimBeam
6 posters
Page 1 of 1
best way for dimension table lookups, if natural key is multi-column (string)
I have a Property table in a Real Estate Database with a compound primary key { company code (varchar), PropertyID (int), Business Entity (int) }. When I am populating the Property dimension table in the DWH, I don't want to have to lookup every single field to decide if the Record already exists, so I thought of mapping (hashing) the 3-column-natural key to a single value field but I am afraid, that collissions may sometime occur and that the data will get invalidated.
A second thought of mine was to not hash but string-concatenate the 3-column key to a single field (varchar). That would guarantee the uniqueness but I wonder how efficient that would be.
So, how would you solve this problem? Any smarter suggestions?
A second thought of mine was to not hash but string-concatenate the 3-column key to a single field (varchar). That would guarantee the uniqueness but I wonder how efficient that would be.
So, how would you solve this problem? Any smarter suggestions?
JimBeam- Posts : 2
Join date : 2011-09-20
Re: best way for dimension table lookups, if natural key is multi-column (string)
Concatenated strings work well. A hash value as a unique key is never a good idea.
Re: best way for dimension table lookups, if natural key is multi-column (string)
I don't know why you wouldn't want to join on each field individually. If you concatinate the fields into one, you have to convert the integers to varchar's and what if you have companycode 001, propertyid 12, and BusinessEntity 1 as well as companycode 001, propertyid 1, and businessentity 21? The concatinated string is the same for both.
Better to join on each field. There's no good reason not to.
Better to join on each field. There's no good reason not to.
buckleyc- Posts : 7
Join date : 2011-09-19
Re: best way for dimension table lookups, if natural key is multi-column (string)
The reason to do so is to make the natural key source independent.
One extreme example in integrating two disparate order systems: In one case the items were identified by a SKU, while in the other system there were 7 different attributes that identified an item. By abstracting the natural key into a string in the staging table, you can use the same process to update the dimension and load facts regardless of the data source.
One extreme example in integrating two disparate order systems: In one case the items were identified by a SKU, while in the other system there were 7 different attributes that identified an item. By abstracting the natural key into a string in the staging table, you can use the same process to update the dimension and load facts regardless of the data source.
Re: best way for dimension table lookups, if natural key is multi-column (string)
Then how do you know "001121" is the same as "0011121" in my example below?
If your dimenional table has column1, column2, and column3 in it, what's the harm in joining on the three keys? Why concatinate? I agree if system 1 stores the SKU in one field and position 1-4 means x and position 5-8 means y in system 2, then concatination makes sense. I disagree with making a blanket assumption that concatinating multiple field business keys is the best way to go.
If your dimenional table has column1, column2, and column3 in it, what's the harm in joining on the three keys? Why concatinate? I agree if system 1 stores the SKU in one field and position 1-4 means x and position 5-8 means y in system 2, then concatination makes sense. I disagree with making a blanket assumption that concatinating multiple field business keys is the best way to go.
buckleyc- Posts : 7
Join date : 2011-09-19
Re: best way for dimension table lookups, if natural key is multi-column (string)
You should always use a delimiter when building a concatinated key from variable-length fields.
Using the example of companycode 001, propertyid 12, and BusinessEntity 1 with a pipe character for the delimiter, the concatinated key would be 001|12|1. Gives a single unique string key with no ambiguity.
Using the example of companycode 001, propertyid 12, and BusinessEntity 1 with a pipe character for the delimiter, the concatinated key would be 001|12|1. Gives a single unique string key with no ambiguity.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: best way for dimension table lookups, if natural key is multi-column (string)
I'm not sure you actually have a problem. Like others have said, just do a 3 column lookup. More than likely, you'll want this data in the warehouse as distinct columns. A properly created index will address any performance issues.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: best way for dimension table lookups, if natural key is multi-column (string)
Either approach is fine, and neither one breaks any Kimball rules! :-) My guess is that the lookup on one string field plus two integer fields would be slightly slower than a lookup on a single (slightly longer) string field, but I wouldn't expect it to be a huge difference, and becasue this is happening during ETL and not during user queries you shouldn't be too concerned. Use whichever approach you like better.
Are you planning to index the natural key fields? As B&L indicated, an index including the natural key field(s) should you give you swift performance for both loading the dimension and resolving the SK when loading facts.
(With some database engines you may need to explicetly add the SK to your index as well to make it a "covering index" for resolving the SK while loading facts; others will do this automatically if the SK is the PK of the dimension (as it should be!) and there is a clustered primary key on it (which there should be!) Look at before-and-after execution plans as you tune you database. What you want is a single Index Seek whenever possible.)
Are you planning to index the natural key fields? As B&L indicated, an index including the natural key field(s) should you give you swift performance for both loading the dimension and resolving the SK when loading facts.
(With some database engines you may need to explicetly add the SK to your index as well to make it a "covering index" for resolving the SK while loading facts; others will do this automatically if the SK is the PK of the dimension (as it should be!) and there is a clustered primary key on it (which there should be!) Look at before-and-after execution plans as you tune you database. What you want is a single Index Seek whenever possible.)
Last edited by VHF on Thu Sep 22, 2011 10:27 am; edited 2 times in total (Reason for editing : clarification)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: best way for dimension table lookups, if natural key is multi-column (string)
Like B&L and buckleyc, generally I would just take the simple approach leaving the NK as is in the dimension no matter how many columns the key is made of. Converting the NK to other format may unnecessarily complicate the ETL process which is already ridden with other heavy lifting tasks. If the dimension size is far less than 100000, it may not be even worth considering indexing at all.
I know SK can make the dimension resilient to the change of NK definition in the source system in terms of its RI with fact tables, but I am not quite sure about how transforming NK can make it independent of source system.
I know SK can make the dimension resilient to the change of NK definition in the source system in terms of its RI with fact tables, but I am not quite sure about how transforming NK can make it independent of source system.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Column count on dimension table
» Advice on a single Fact Table Column which could link to more than one different dimension
» Multi-value data column in OLTP modeled in datamart
» Table column analysis/profiling tool?
» Fact Table Natural ID
» Advice on a single Fact Table Column which could link to more than one different dimension
» Multi-value data column in OLTP modeled in datamart
» Table column analysis/profiling tool?
» Fact Table Natural ID
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum