Business keys or Natural keys in the Fact table
+7
LAndrews
pitbull mix
John Simon
ngalemmo
KS_EDW
BoxesAndLines
ParuD
11 posters
Page 1 of 1
Business keys or Natural keys in the Fact table
As I have read from Kimball methodology the fact table always contains surrogate keys of the dimensions and the measures. My take away on this is when the reporting is done the entry point to the facts is always through dimensions that joins on the surrogate keys in the facts. I was under the impression that we don't need the business keys at all in the fact table.
But very recently I have seen few architects who say there is advantage in keeping the business keys in the fact table as it sometimes avoid joins to the dimensions in the reports. I am not very clear under which scenarios would this happen.
Can you please help me to understand in which scenarios are the business keys included in the fact table and the advantages of doing so?
But very recently I have seen few architects who say there is advantage in keeping the business keys in the fact table as it sometimes avoid joins to the dimensions in the reports. I am not very clear under which scenarios would this happen.
Can you please help me to understand in which scenarios are the business keys included in the fact table and the advantages of doing so?
ParuD- Posts : 2
Join date : 2011-09-13
Re: Business keys or Natural keys in the Fact table
These people are leading you astray. Joins are not bad. Wide fact tables are bad. First they add the natural key columns, then it will be another column and pretty soon you have 300 columns in your fact table of which 50% is dimension data.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Business keys or Natural keys in the Fact table
Couple of things…
We maintain some keys for quick reference during deletes. Many times the production system will delete a row for legal reasons (HR, Law Enforcement, courts, etc.). Those need changed immediately in EDW so, having the keys helps us find and delete the fact row within minutes of notification.
Sometimes the attributes alone are not enough to maintain the grain of the fact table, thus the natural key is maintained. (This is extremely very really rare but does happen.)
Our EDW supports large extracts and db links for external systems (organization oversight systems (regulatory), contractors, etc.). In many cases, we need to pass them our business keys. Having those in the fact facilitates that requirement.
Most BI tool semantic/metadata layers will allow a fact to fact connection – using the natural keys is tempting. I would NOT recommend doing this because it creates exceptions in an otherwise standard methodology, i.e. “all these hundreds of facts use a dimension to communicate but these two are the exception”.
Honestly, I would not recommend carrying the key in the fact table as a means to connect the fact tables… that method has “potential regret” written all over it. BUT… with that said, in your situation… it may work. (clear as mud… right? Add to that… BoxesAndLines is correct… they will keep pushing.)
In our shop we avoid it where we can but… it is most important that we meet our user’s requirements… if the users say, “key” (with good reason) we say “yes”.
We maintain some keys for quick reference during deletes. Many times the production system will delete a row for legal reasons (HR, Law Enforcement, courts, etc.). Those need changed immediately in EDW so, having the keys helps us find and delete the fact row within minutes of notification.
Sometimes the attributes alone are not enough to maintain the grain of the fact table, thus the natural key is maintained. (This is extremely very really rare but does happen.)
Our EDW supports large extracts and db links for external systems (organization oversight systems (regulatory), contractors, etc.). In many cases, we need to pass them our business keys. Having those in the fact facilitates that requirement.
Most BI tool semantic/metadata layers will allow a fact to fact connection – using the natural keys is tempting. I would NOT recommend doing this because it creates exceptions in an otherwise standard methodology, i.e. “all these hundreds of facts use a dimension to communicate but these two are the exception”.
Honestly, I would not recommend carrying the key in the fact table as a means to connect the fact tables… that method has “potential regret” written all over it. BUT… with that said, in your situation… it may work. (clear as mud… right? Add to that… BoxesAndLines is correct… they will keep pushing.)
In our shop we avoid it where we can but… it is most important that we meet our user’s requirements… if the users say, “key” (with good reason) we say “yes”.
KS_EDW- Posts : 20
Join date : 2011-09-07
Age : 49
Location : Kansas
Re: Business keys or Natural keys in the Fact table
The only acceptable reason to put a natural key into a fact table is the case of a degenerate dimension.
Storing the natural key in the fact table, which also appears in a dimension table with the appropriate surrogate FK from the fact, does not improve one's ability to identify a fact row and only serves to hamper performance for no real gain.
Storing the natural key in the fact table, which also appears in a dimension table with the appropriate surrogate FK from the fact, does not improve one's ability to identify a fact row and only serves to hamper performance for no real gain.
Re: Business keys or Natural keys in the Fact table
The only reason you may want to store a natural business key in the fact table is if you need to see the latest version for that particular dimension. This should be very rare - see here for examples of performance using natural keys vs surrogate keys:
http://jsimonbi.wordpress.com/2011/09/24/tracking-history-with-slowly-changing-dimensions/
Otherwise you are simply affecting performance.
Here's a link to an post on my blog that shows the performance impact of storing the natural key and other non-foreign keys in the fact table:
http://jsimonbi.wordpress.com/2011/05/16/dimensional-modeling-worst-practices/
http://jsimonbi.wordpress.com/2011/09/24/tracking-history-with-slowly-changing-dimensions/
Otherwise you are simply affecting performance.
Here's a link to an post on my blog that shows the performance impact of storing the natural key and other non-foreign keys in the fact table:
http://jsimonbi.wordpress.com/2011/05/16/dimensional-modeling-worst-practices/
Business / Natural Key in Fact Table
ngalemmo wrote:The only acceptable reason to put a natural key into a fact table is the case of a degenerate dimension.
Storing the natural key in the fact table, which also appears in a dimension table with the appropriate surrogate FK from the fact, does not improve one's ability to identify a fact row and only serves to hamper performance for no real gain.
Hello,
You say not to put a natural or business key in the fact table. However, after I created my model I was faced with the task of keeping my fact table updated as related to the source system. If you don't have a way to know what changes happened related to facts how do you keep your fact table up to date? I could put all of the events in a dimension (yuck!), but that creates a one to one relationship that I don't want to maintain and is considered poor design.
In my scenario I record absence events for students for various source systems. Changes occur when a student transfers from one school to another meaning the absence event is now related to where the student is currently. I don't want to delete the event/fact, but do want to acknowledge that this record or the event has been removed or moved in the source system. I just mark the record to 'N' for not current.
All of this relates to keeping the fact table up to date. I'm finding it very useful to place a natural key in the fact table, but am open ideas or another approach to staying in synch with the source systems. Also, this business key is not used in a dimension that connects to the fact table.
Thanks!
pitbull mix- Posts : 8
Join date : 2012-06-11
Re: Business keys or Natural keys in the Fact table
In my scenario I record absence events for students for various source systems. Changes occur when a student transfers from one school to another meaning the absence event is now related to where the student is currently. I don't want to delete the event/fact, but do want to acknowledge that this record or the event has been removed or moved in the source system. I just mark the record to 'N' for not current.
What you are describing here is restating the fact .... dangerous road.
The absence event occured in school A. That is the fact.
When the student transfers to school B, any new absences occur in school B. That is the fact.
If you want to report all the students absences as if they occured in school B, there are many reporting techniques to do this (i.e. reporting using the current school).
LAndrews- Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada
Re: Business keys or Natural keys in the Fact table
LAndrews wrote:In my scenario I record absence events for students for various source systems. Changes occur when a student transfers from one school to another meaning the absence event is now related to where the student is currently. I don't want to delete the event/fact, but do want to acknowledge that this record or the event has been removed or moved in the source system. I just mark the record to 'N' for not current.
What you are describing here is restating the fact .... dangerous road.
The absence event occured in school A. That is the fact.
When the student transfers to school B, any new absences occur in school B. That is the fact.
If you want to report all the students absences as if they occured in school B, there are many reporting techniques to do this (i.e. reporting using the current school).
Ah yes, this is the adjustment that I need to make in the ETL when loading into the fact table. This will help clear up mixing of events between locations. I agree with you that where the absence that occurs is the fact. I need to revise my ETL. I can report on the current school by using the student dimension which tracks the current school. Thanks!
pitbull mix- Posts : 8
Join date : 2012-06-11
Re: Business keys or Natural keys in the Fact table
cdowney wrote:You say not to put a natural or business key in the fact table. However, after I created my model I was faced with the task of keeping my fact table updated as related to the source system. If you don't have a way to know what changes happened related to facts how do you keep your fact table up to date? I could put all of the events in a dimension (yuck!), but that creates a one to one relationship that I don't want to maintain and is considered poor design.
Your dimension tables contain the surrogate key and the natural key. Your fact tables only contain the surrogate key. How would you not know the natural key value based on facts? Relational databases allow you to join tables.
There is a 1:1 relationship between natural keys and surrogate keys. If you know one it shouldn't be hard to figure out the other.
Business Key in Fact Table
ngalemmo wrote:cdowney wrote:You say not to put a natural or business key in the fact table. However, after I created my model I was faced with the task of keeping my fact table updated as related to the source system. If you don't have a way to know what changes happened related to facts how do you keep your fact table up to date? I could put all of the events in a dimension (yuck!), but that creates a one to one relationship that I don't want to maintain and is considered poor design.
Your dimension tables contain the surrogate key and the natural key. Your fact tables only contain the surrogate key. How would you not know the natural key value based on facts? Relational databases allow you to join tables.
There is a 1:1 relationship between natural keys and surrogate keys. If you know one it shouldn't be hard to figure out the other.
Are you saying that I do need to have a dimension that has every absence event? The surrogate and natural key would be in this dimension and related to the fact table, but then it becomes a one to one. So for every absence fact that I enter I will enter a dimension record. Is this poor design and what's the alternative to finding the fact if you want to update , remove? Thanks!
pitbull mix- Posts : 8
Join date : 2012-06-11
Re: Business keys or Natural keys in the Fact table
Depending on the business key (natural key) data type and the likelihood of it changing over time, or it being recycled, you may decide to use it, or create a surrogate key for it. I always have a unique surrogate key on my dimensions, and that will be a FK in the fact. But in the event the dimension has SCD 2 attributes then I am also considering adding the the business key (if appropriate data type) to the fact table, or create another surrogate key that is one to one to the business key in the dimension (it doesn't change when the dimension record versions) and add that additional surrogate key to the fact table. This allows for 3 types of reporting:
1. As Was (fact joins to the dimension on the unique surrogate key).
2. Current (fact joins to the dimension on the business key or additional surrogate key and a current record indicator being true)
3. As Of (fact joins to the dimension on the business key or additional surrogate key and As Of Date being between the effective dates of the dimension). This makes sense when the As Of date is different than the fact date.
Employee Dimension
Fact table
So, sales by "As Was" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp surr Key = Surrogate Key
group by REGION
sales by "Current" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp Add Key = Additional Key and Current = TRUE
group by REGION
sales by "As Of 01/01/2012" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp Add Key = Additional Key and 01/01/2012 BETWEEN eff st dt and eff end dt
group by REGION
There may be instances where there is no reason to create that additional surrogate key and just use the Business Key or Natural Key instead: e.g. when data type doesn't impact joining performance and it is not recycled by the business and it is not re-cast, or when none of the drawbacks of using a business key instead of a surrogate key are present. In those instance by just using the business key or natural key will save you some etl processing time, some maintenance of an additional set of surrogate key, and it's readily available in the fact to create metrics such as counts distinct, as count distinct would not work on a true unique surrogate key for SCD 2 dimensions. Also joining to a dimension and getting a measure from there might not work well with some BI tools. E.g. counting distinct business key from the dimension instead of having the business key on the fact table itself.
1. As Was (fact joins to the dimension on the unique surrogate key).
2. Current (fact joins to the dimension on the business key or additional surrogate key and a current record indicator being true)
3. As Of (fact joins to the dimension on the business key or additional surrogate key and As Of Date being between the effective dates of the dimension). This makes sense when the As Of date is different than the fact date.
Employee Dimension
Surrogate Key | Business Key/ Additional Surrogate Key | Employee Name | Region | Eff StartDt | Eff End Dt | Current |
1 | 107 | John Smith | East | 01/01/2011 | 07/01/2011 | FALSE |
2 | 107 | John Smith | West | 07/02/2011 | 12/31/2099 | TRUE |
3 | 243 | Jane Doe | East | 01/01/2011 | 12/31/2099 | TRUE |
Fact table
SaleDate | Emp SurrKey | Emp Add Key | Dollar Amount |
02/15/2011 | 1 | 107 | $50 |
03/30/2011 | 3 | 243 | $25 |
06/30/2011 | 1 | 107 | $5 |
08/24/2012 | 2 | 107 | $10 |
So, sales by "As Was" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp surr Key = Surrogate Key
group by REGION
REGION | SUM |
EAST | $50 + $25 + $5 |
WEST | $10 |
sales by "Current" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp Add Key = Additional Key and Current = TRUE
group by REGION
REGION | SUM |
EAST | $25 |
WEST | $50 + $5 + 10 |
sales by "As Of 01/01/2012" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp Add Key = Additional Key and 01/01/2012 BETWEEN eff st dt and eff end dt
group by REGION
REGION | SUM |
EAST | $25 |
WEST | $50 + $5 + 10 |
There may be instances where there is no reason to create that additional surrogate key and just use the Business Key or Natural Key instead: e.g. when data type doesn't impact joining performance and it is not recycled by the business and it is not re-cast, or when none of the drawbacks of using a business key instead of a surrogate key are present. In those instance by just using the business key or natural key will save you some etl processing time, some maintenance of an additional set of surrogate key, and it's readily available in the fact to create metrics such as counts distinct, as count distinct would not work on a true unique surrogate key for SCD 2 dimensions. Also joining to a dimension and getting a measure from there might not work well with some BI tools. E.g. counting distinct business key from the dimension instead of having the business key on the fact table itself.
adypoko- Posts : 5
Join date : 2012-10-24
Re: Business keys or Natural keys in the Fact table
I think both SK and NK or DK (Durable Key) in fact table make sense based on Kimball's recent article:
http://www.kimballgroup.com/2010/06/21/extreme-status-tracking-for-real-time-customer-analysis/
Before reading this article, I used to rely on self join on dimension to achieve the same result, ie. applying current dimension version for the fact in the past. Self join would work quite well if the dimension is not big. However in case of customer dimension, an extra join would be quite significant for performance, while adding a NK in the fact table is a good tradeoff.
I guess it comes down to educating user to understand what these different types keys in fact table are meant to be used. On the same token, self join approach also involves some education.
http://www.kimballgroup.com/2010/06/21/extreme-status-tracking-for-real-time-customer-analysis/
Before reading this article, I used to rely on self join on dimension to achieve the same result, ie. applying current dimension version for the fact in the past. Self join would work quite well if the dimension is not big. However in case of customer dimension, an extra join would be quite significant for performance, while adding a NK in the fact table is a good tradeoff.
I guess it comes down to educating user to understand what these different types keys in fact table are meant to be used. On the same token, self join approach also involves some education.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Business keys or Natural keys in the Fact table
We have put the NK into one fact table. The fact is a lifetime summary of a customer's activity and as such has only one row per customer. Since our customer dimension is SCD2, either have to use the NK or go and update the SK in the fact every time there is a new SCD2 change. The downside is remembering that you have to return a specific customer row (usually the current one) when querying the lifetime fact.
Dave Jermy- Posts : 33
Join date : 2011-03-24
Location : London, UK
Re: Business keys or Natural keys in the Fact table
That is similar to what I suggested above. What Kimball calls a durable key, I was calling it an additional surrogate key and suggesting that depending on the natural key, the natural key may be used instead of the additional surrogate key. I am happy to see that Kimball is on the same page with me. Also, I may sound a bit of a wise guy, but as a subscriber to Kimball's desing tips, I wouldn't call a June 2010 article recent.hang wrote:I think both SK and NK or DK (Durable Key) in fact table make sense based on Kimball's recent article:
http://www.kimballgroup.com/2010/06/21/extreme-status-tracking-for-real-time-customer-analysis/
Before reading this article, I used to rely on self join on dimension to achieve the same result, ie. applying current dimension version for the fact in the past. Self join would work quite well if the dimension is not big. However in case of customer dimension, an extra join would be quite significant for performance, while adding a NK in the fact table is a good tradeoff.
I guess it comes down to educating user to understand what these different types keys in fact table are meant to be used. On the same token, self join approach also involves some education.
adypoko- Posts : 5
Join date : 2012-10-24
Re: Business keys or Natural keys in the Fact table
hang wrote:I think both SK and NK or DK (Durable Key) in fact table make sense based on Kimball's recent article:
http://www.kimballgroup.com/2010/06/21/extreme-status-tracking-for-real-time-customer-analysis/
Before reading this article, I used to rely on self join on dimension to achieve the same result, ie. applying current dimension version for the fact in the past. Self join would work quite well if the dimension is not big. However in case of customer dimension, an extra join would be quite significant for performance, while adding a NK in the fact table is a good tradeoff.
I guess it comes down to educating user to understand what these different types keys in fact table are meant to be used. On the same token, self join approach also involves some education.
This has been around for years... the durable key is not the natural key however (NK's are not very durable), it is a type 1 surrogate key. Works fine and eliminates the need for a self-join to get the current row. Alternately, you field two dimension tables, a type 1 version and a type 2 version.
Re: Business keys or Natural keys in the Fact table
No. Events do not go in dimension tables. If you mean, should you have an event type dimension? Probably. This would be very small and the fact would reference that to get information describing the kind of event.... such as: absence - no notice, absence - sick, etc...cdowney wrote:
Are you saying that I do need to have a dimension that has every absence event? The surrogate and natural key would be in this dimension and related to the fact table, but then it becomes a one to one. So for every absence fact that I enter I will enter a dimension record. Is this poor design and what's the alternative to finding the fact if you want to update , remove? Thanks!
Kimball Group Design tip on Durable Key
There is a July 2012 article on Durable Keys. Also I do remember reading a posting where ngalemmo recommended this approach for historical lookups.
http://www.kimballgroup.com/2012/07/10/design-tip-147-durable-super-natural-keys/
http://www.kimballgroup.com/2012/07/10/design-tip-147-durable-super-natural-keys/
jmagana- Posts : 5
Join date : 2010-11-29
Location : California
Similar topics
» Can a FACT table contains Natural Primary keys and text columns
» Composite Dimension Natural Keys - Attribute Used In Multiple Dimensions as well as Fact Tables
» Business keys in Fact. Is this accepted?
» Two level keys from Hierarchhy Dimension into Fact table
» Factless fact table with null foreign keys
» Composite Dimension Natural Keys - Attribute Used In Multiple Dimensions as well as Fact Tables
» Business keys in Fact. Is this accepted?
» Two level keys from Hierarchhy Dimension into Fact table
» Factless fact table with null foreign keys
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum