Primary Key of Fact Table
+6
juz_b
warrent
slaborda
BrianJarrett
DilMustafa
kurtv
10 posters
Page 1 of 1
Primary Key of Fact Table
When designing a fact table, is it best practice to make a composite primary key out of all the foreign keys?
Thanks,
Kurt V
Thanks,
Kurt V
kurtv- Posts : 2
Join date : 2009-02-03
Re: Primary Key of Fact Table
Yes it is. Readers please, also comment on the idea of including timestamps as part of the PK along with these surrogate FK's.
DilMustafa- Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada
Re: Primary Key of Fact Table
In general I'd say yes. However, in most of my designs I do it a bit differently. In almost every model I've built I have at least one many-to-many relationship between my fact and a dimension. In this case I use a bridge table between the fact and dimension table to address this relationship. What I do is create a unique constraint on the foreign key fields and allow that to be an "alternate key", acting as a composite primary key. I then generate a normal surrogate primary key for the fact table. That surrogate key goes into a bridge table, along with a dimension surrogate key and a divisor, for any of the many-to-many dimensions.
Using this primary key along with the alternate key on the fact table allows me to represent the entire fact record with a single key in the bridge table. If I didn't do this then I'd have to insert every foreign key of my composite primary key into the bridge table for every record joining to the many-to-many dimension. If I had 10 composite keys in my fact table I'd have to have 10 records for a single dimension record join. That turns into a modeling nightmare very quickly and the code to retrieve data from the many to many dimension is pretty brutal.
There might be a different way to address this but I haven't found it. This method works nicely, is very clean, and is very easy to implement.
As far as including timestamps as part of the primary key; I'm not sure what benefit that provides. I usually include some housekeeping fields (record date, who modified) but that timestamp (record date) isn't needed to make the composite key unique. Can you provide some more detail on why this field would be necessary? Maybe I'm missing something.
Using this primary key along with the alternate key on the fact table allows me to represent the entire fact record with a single key in the bridge table. If I didn't do this then I'd have to insert every foreign key of my composite primary key into the bridge table for every record joining to the many-to-many dimension. If I had 10 composite keys in my fact table I'd have to have 10 records for a single dimension record join. That turns into a modeling nightmare very quickly and the code to retrieve data from the many to many dimension is pretty brutal.
There might be a different way to address this but I haven't found it. This method works nicely, is very clean, and is very easy to implement.
As far as including timestamps as part of the primary key; I'm not sure what benefit that provides. I usually include some housekeeping fields (record date, who modified) but that timestamp (record date) isn't needed to make the composite key unique. Can you provide some more detail on why this field would be necessary? Maybe I'm missing something.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Primary key in a fact table?
Hi all,
My Database teacher always said... for all questions in IT the answer is "It depends".
The main idea that i want transmit you is.. There are not absolute ideas, it depends of each concret situation.
In most cases i use surrogate keys as PK in fact tables, in this cases i built an "alternate key" with the set of foreing keys.
Why we use Pk's/AK in fact tables? who access the fact table using the associated pk/ak index?
We use PK/AK just in order to be sure that there are not data incoherencies?
Why this control must be in the data structure? Always the DWH must be a semantical model?
In fact, we are designers and we can try for each problem our customized solution. You can use PK's in fact tables, but you must have arguments for do this. If you haven't don't use it.
Best regards,
Sergi
My Database teacher always said... for all questions in IT the answer is "It depends".
The main idea that i want transmit you is.. There are not absolute ideas, it depends of each concret situation.
In most cases i use surrogate keys as PK in fact tables, in this cases i built an "alternate key" with the set of foreing keys.
Why we use Pk's/AK in fact tables? who access the fact table using the associated pk/ak index?
We use PK/AK just in order to be sure that there are not data incoherencies?
Why this control must be in the data structure? Always the DWH must be a semantical model?
In fact, we are designers and we can try for each problem our customized solution. You can use PK's in fact tables, but you must have arguments for do this. If you haven't don't use it.
Best regards,
Sergi
slaborda- Posts : 3
Join date : 2009-02-18
Age : 47
Location : Barcelona, Via lactea, earth, Europe, Spain, Barcelona
Re: Primary Key of Fact Table
I'm glad to see someone else doing this too. I normally do it for all fact tables, just in case I have a many-to-many dimension relationship. It also provides consistency across the model.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Primary Key of Fact Table
There are also some other reasons to assign a surrogate key to the fact table, like updates, indexes, and debugging. See the following design tips for more info:
- Design tip 81
- Design tip 84
warrent- Posts : 41
Join date : 2008-08-18
Re: Primary Key of Fact Table
Hi Brian,
For my curosity, why dont you use PK of the bridge table as part of the key in fact table.
http://www.intelligententerprise.com/010810/412warehouse1_1.jhtml
For my curosity, why dont you use PK of the bridge table as part of the key in fact table.
http://www.intelligententerprise.com/010810/412warehouse1_1.jhtml
DilMustafa- Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada
Re: Primary Key of Fact Table
I read the article but I think the bridge table Ralph talks about is a special kind of bridge; a "helper" bridge table. This one acts more like a fact table between two different dimensions with a many-to-many relationship between each other.dmustafa wrote:Hi Brian,
For my curosity, why dont you use PK of the bridge table as part of the key in fact table.
http://www.intelligententerprise.com/010810/412warehouse1_1.jhtml
What I've been talking about is a single dimension that has a many-to-many relationship directly with the fact table. For example; I used to work for a portrait photography company. On a given order line item we could have a single photo product that contained pictures from many different photo sessions. Each session could also be attached to multiple order line items as well.
You mention using the primary key of the bridge table as part of the key in the fact table. The type of bridge I'm talking about doesn't have a single primary key; instead the primary key of this bridge table is a composite key, consisting of the surrogate key from the fact table and the surrogate key from the dimension table. Creating a surrogate key on the fact table simplifies the bridge table, allowing us to represent the fact record with a single key rather than the composite key (made up of all the foreign keys that make the record unique). Uniqueness on the fact table is maintained by an "alternate key", a unique constraint built on the foreign keys that make the record unique.
In the article, as I understood it, Ralph was using the surrogate from one dimension in the fact table, then joining it to another dimension using a helper table. So this fact table still maintained a one-to-many relationship with a single dimension but that dimension then took on a many-to-many relationship with another dimension (via the helper table). I think these are two different kinds of modeling requirements.
Let me know if I've misunderstood something and thanks for the feedback.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Primary Key of Fact Table
Thanks for the reply. I appreciate your help. You are the most active member on this forum for sure.
Now getting back to M:M between dimension and fact. Kimball recommends creating a group tables. He quotes an example where many salesrep can be assigned to an order. Kimball creates a salesrepgroup dimension with salesrep key and salesrep group key and uses the salesrep group key in the fact table. ETL process will do lookups and create a group if does not exist based on salesrep combos. I will appreciate your opinion on this. Once again thanks for the effort and time.
Now getting back to M:M between dimension and fact. Kimball recommends creating a group tables. He quotes an example where many salesrep can be assigned to an order. Kimball creates a salesrepgroup dimension with salesrep key and salesrep group key and uses the salesrep group key in the fact table. ETL process will do lookups and create a group if does not exist based on salesrep combos. I will appreciate your opinion on this. Once again thanks for the effort and time.
DilMustafa- Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada
Re: Primary Key of Fact Table
I've read about that approach in the Toolkit but haven't needed to use it in any of my models yet. I'm sure the group comes in handy in specific situations but for a simple many-to-many I've never found it necessary. I guess I'd say that if the group itself doesn't need to be named and identified then it's probably not necessary to design for it. That may be oversimplifying it though.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Primary Key of Fact Table
Thanks Brian... It was a great discussion. I do like your approach for modeling M:M relationships.
Thanks once again,
Dil Mustafa
Thanks once again,
Dil Mustafa
DilMustafa- Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada
Re: Primary Key of Fact Table
No problem; I love talking about this stuff. Although there is a solid and consistent approach to dimensional modeling there will always be some exceptions that will require us to be creative. This board is a great place to learn from each other. I know I already have.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Primary Key of Fact Table
Perhaps the answer to when to use groups lies in how often the group would be reused.
Kurt
Kurt
kurtv- Posts : 2
Join date : 2009-02-03
Re: Primary Key of Fact Table
We always assign a generic surrogate key to all our tables. This is a pretty clean and efficient approach to help facilitate UPDATES for the ETL process (as Warren had suggested). And like Brian had mentioned, we also use it a lot for M:M references in a bridge table. Having to deal with composite keys in these situations can get very inefficient and cumbersome.
The other point I'd like to share is that we define alternate keys strictly for enforcing record uniqueness. This is typically defined by the grain of the Fact table, which may or may not include all of your FKs.
Hope this helps.
The other point I'd like to share is that we define alternate keys strictly for enforcing record uniqueness. This is typically defined by the grain of the Fact table, which may or may not include all of your FKs.
Hope this helps.
juz_b- Posts : 17
Join date : 2009-02-07
Re: Primary Key of Fact Table
I have found, in some cases, I have more than 16 FK's in my fact table, which is the largest number of columns allows in a composite index or key.
In cases like this, I need to do careful thinking about my fk indexing strategy.
In these cases, I always adding a surrogate PK for no other reason than it makes a row individually identifiable. I have a J Celko view of that, a table should always contain uniquely identifiable rows. If not, deleting and updating becomes impossible (before everyone leaps on me about updating fact rows, I know it is not recommended)
In cases like this, I need to do careful thinking about my fk indexing strategy.
In these cases, I always adding a surrogate PK for no other reason than it makes a row individually identifiable. I have a J Celko view of that, a table should always contain uniquely identifiable rows. If not, deleting and updating becomes impossible (before everyone leaps on me about updating fact rows, I know it is not recommended)
monsieur_arrie- Posts : 2
Join date : 2009-04-21
Re: Primary Key of Fact Table
I update rows in accumulating snapshot fact tables all the time.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Primary Key and Clustered Index on the Fact Table
http://dwbi1.wordpress.com/2010/02/24/primary-key-and-clustered-index-on-the-fact-table/
just a thought
just a thought
Re: Primary Key of Fact Table
Personally, I don’t have any primary keys in my fact tables as I see no advantages and plenty of disadvantages to the approach.
Having a surrogate primary key makes the table wider, giving me less rows per page, thus impacting performance. This is especially the case if I have billions of records in my fact table. I want as few columns as possible to improve query performance. If want to do a delete or update I can use ROW_NUMBER() OVER (PARTITION BY ...
Having the primary key as a composite key of all my foreign keys will impact my ETL process because it needs to test for uniqueness, thus slowing down the load. If I design my ETL process correctly then I shouldn’t need to test for uniqueness using referential integrity.
Having a surrogate primary key makes the table wider, giving me less rows per page, thus impacting performance. This is especially the case if I have billions of records in my fact table. I want as few columns as possible to improve query performance. If want to do a delete or update I can use ROW_NUMBER() OVER (PARTITION BY ...
Having the primary key as a composite key of all my foreign keys will impact my ETL process because it needs to test for uniqueness, thus slowing down the load. If I design my ETL process correctly then I shouldn’t need to test for uniqueness using referential integrity.
Similar topics
» Primary key in fact table
» Primary Key on a fact table
» fact table primary key/identity choice - not sure which is correct
» Can a FACT table contains Natural Primary keys and text columns
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Primary Key on a fact table
» fact table primary key/identity choice - not sure which is correct
» Can a FACT table contains Natural Primary keys and text columns
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum