Primary key in fact table
+2
hkandpal
gbaren
6 posters
Page 1 of 1
Primary key in fact table
Does the primary key in a fact table consist of all dimension keys or do I always need to create a fact surrogate key for that purpose?
Re:Primary key in fact table
Hi,
it depends upon the requirement you may need a primary key in the fact table. If you need then one approach is to use the surrogate key, using the dimension keys may not work in some cases where the rows can not be uniquely identified.
You can also control uniqueness in the Fact table from the ETL code but having the constraint on the database is better as chances of having a defect in the index to maintain uniqueness is less than to the ETL code.
thanks
Himanshu
it depends upon the requirement you may need a primary key in the fact table. If you need then one approach is to use the surrogate key, using the dimension keys may not work in some cases where the rows can not be uniquely identified.
You can also control uniqueness in the Fact table from the ETL code but having the constraint on the database is better as chances of having a defect in the index to maintain uniqueness is less than to the ETL code.
thanks
Himanshu
hkandpal- Posts : 113
Join date : 2010-08-16
Re: Primary key in fact table
Hi - the dimension keys on your fact table will always uniquely identify each record; in most circumstances you would only need a subset of these keys to uniquely identify each record
Almost the only reason you might need to explicitly define a single column PK on a fact table is where you need to update the fact table and having a multi-column PK would be unwieldy. Updating fact tables should normally be avoided (unless they are accumulating snapshot tables, in which case you have to update them).
Regards,
Almost the only reason you might need to explicitly define a single column PK on a fact table is where you need to update the fact table and having a multi-column PK would be unwieldy. Updating fact tables should normally be avoided (unless they are accumulating snapshot tables, in which case you have to update them).
Regards,
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Primary key in fact table
Hi,
If you are storing atomic transactions in fact table ( which is preferrable) , the dimension keys don’t always uniquely identify a fact row: the same customer, on the same day may buy the same product twice in the same shop!
Use the surrogate key!
If you are storing atomic transactions in fact table ( which is preferrable) , the dimension keys don’t always uniquely identify a fact row: the same customer, on the same day may buy the same product twice in the same shop!
Use the surrogate key!
gvarga- Posts : 43
Join date : 2010-12-15
Re: Primary key in fact table
If you can't differentiate between two rows in a fact table then you should be aggregating them into a single row - having them as separate rows just increases your storage volumes and adversely impacts query performance without providing any benefit.
In the example given, you should have a single row with a quantity of 2 rather than 2 identical rows with a quantity of 1
Don't use surrogate keys!
In the example given, you should have a single row with a quantity of 2 rather than 2 identical rows with a quantity of 1
Don't use surrogate keys!
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Primary key in fact table
I love having a single column unique column on fact tables in addition to the combination of dimensions that define the grain. It's very useful.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Primary key in fact table
You should read Kimball’s Design tip from 2007: Keep to the grain in dimensional modeling: When developing fact tables, aggregated data is NOT the place to start. To avoid “mixed granularity” woes including bad and overlapping data, stick to rich, expressive, atomic-level data that’s closely connected to the original source and collection process.
Two other aspects:
In several cases in a BI tool you have to drill down to atomic level
When we have to modify the fact row then surrogate key is needed again. Modification can be made not only in accumulating snapshot tables. A very common modification: there is a late arriving dimension instance, so we have to connect the fact row to the “Undefined” dimension row. Later the dimension row arrives and the business wants to correct the fact row but wants to store the original row as well. In this case we have to create timespan rows with validity data (row effective date, row expiration date, and current row indicator).
gvarga- Posts : 43
Join date : 2010-12-15
Re: Primary key in fact table
You should read Kimball’s Design tip from 2007: Keep to the grain in dimensional modeling: When developing fact tables, aggregated data is NOT the place to start. To avoid “mixed granularity” woes including bad and overlapping data, stick to rich, expressive, atomic-level data that’s closely connected to the original source and collection process.
Two other aspects:
In several cases in a BI tool you have to drill down to atomic level
When we have to modify the fact row then surrogate key is needed again. Modification can be made not only in accumulating snapshot tables. A very common modification: there is a late arriving dimension instance, so we have to connect the fact row to the “Undefined” dimension row. Later the dimension row arrives and the business wants to correct the fact row but wants to store the original row as well. In this case we have to create timespan rows with validity data (row effective date, row expiration date, and current row indicator).
Two other aspects:
In several cases in a BI tool you have to drill down to atomic level
When we have to modify the fact row then surrogate key is needed again. Modification can be made not only in accumulating snapshot tables. A very common modification: there is a late arriving dimension instance, so we have to connect the fact row to the “Undefined” dimension row. Later the dimension row arrives and the business wants to correct the fact row but wants to store the original row as well. In this case we have to create timespan rows with validity data (row effective date, row expiration date, and current row indicator).
gvarga- Posts : 43
Join date : 2010-12-15
Re: Primary key in fact table
You only need to worry about primary keys in a fact table if you intend to update rows in the table. A primary key is not necessarily all the dimensions, just the columns necessary to identify a row.
If you need a primary key, a surrogate can be useful but not absolutely necessary. One advantage of a surrogate is it allows you to update dimension values (keys or degenerates) that might otherwise be part of a compound primary key. A downside is it can complicate the update process if the surrogate key is not based on a natural key.
If you need a primary key, a surrogate can be useful but not absolutely necessary. One advantage of a surrogate is it allows you to update dimension values (keys or degenerates) that might otherwise be part of a compound primary key. A downside is it can complicate the update process if the surrogate key is not based on a natural key.
Re: Primary key in fact table
Quite agree that if you need to update a fact table then having a surrogate PK is useful (and sometimes essential).
Fact grain: the grain of a fact is set by the dimensional keys it has. If you have more than one record in a fact table with the same Dim keys then you have broken the grain of the fact. Effectively you have records that are more granular than the Fact tables allows for.
If you want to hold, for example, individual transactions in your fact table then you need to define the grain of your fact table in such a way that it uniquely identifies each transaction (for example, by including a time Dim or a transaction ID degenerate dim)
Fact grain: the grain of a fact is set by the dimensional keys it has. If you have more than one record in a fact table with the same Dim keys then you have broken the grain of the fact. Effectively you have records that are more granular than the Fact tables allows for.
If you want to hold, for example, individual transactions in your fact table then you need to define the grain of your fact table in such a way that it uniquely identifies each transaction (for example, by including a time Dim or a transaction ID degenerate dim)
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Similar topics
» Primary Key on a fact table
» Primary Key of Fact Table
» fact table primary key/identity choice - not sure which is correct
» Can a FACT table contains Natural Primary keys and text columns
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Primary Key of Fact Table
» fact table primary key/identity choice - not sure which is correct
» Can a FACT table contains Natural Primary keys and text columns
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum