Primary key in fact table

View previous topic View next topic Go down

Primary key in fact table

Post  gbaren on Tue Apr 14, 2015 8:10 am

Does the primary key in a fact table consist of all dimension keys or do I always need to create a fact surrogate key for that purpose?

gbaren

Posts : 15
Join date : 2014-01-30
Location : 02

View user profile http://equalsql.wordpress.com

Back to top Go down

Re:Primary key in fact table

Post  hkandpal on Tue Apr 14, 2015 8:22 am

Hi,

it depends upon the requirement you may need a primary key in the fact table. If you need then one approach is to use the surrogate key, using the dimension keys may not work in some cases where the rows can not be uniquely identified.
You can also control uniqueness in the Fact table from the ETL code but having the constraint on the database is better as chances of having a defect in the index to maintain uniqueness is less than to the ETL code.


thanks

Himanshu

hkandpal

Posts : 113
Join date : 2010-08-16

View user profile

Back to top Go down

Re: Primary key in fact table

Post  nick_white on Tue Apr 14, 2015 8:42 am

Hi - the dimension keys on your fact table will always uniquely identify each record; in most circumstances you would only need a subset of these keys to uniquely identify each record

Almost the only reason you might need to explicitly define a single column PK on a fact table is where you need to update the fact table and having a multi-column PK would be unwieldy. Updating fact tables should normally be avoided (unless they are accumulating snapshot tables, in which case you have to update them).

Regards,

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Primary key in fact table

Post  gvarga on Tue Apr 14, 2015 9:25 am

Hi,
If you are storing atomic transactions in fact table ( which is preferrable) , the dimension keys don’t always uniquely identify a fact row: the same customer, on the same day may buy the same product twice in the same shop!
Use the surrogate key!

gvarga

Posts : 43
Join date : 2010-12-15

View user profile

Back to top Go down

Re: Primary key in fact table

Post  nick_white on Tue Apr 14, 2015 9:46 am

If you can't differentiate between two rows in a fact table then you should be aggregating them into a single row - having them as separate rows just increases your storage volumes and adversely impacts query performance without providing any benefit.
In the example given, you should have a single row with a quantity of 2 rather than 2 identical rows with a quantity of 1

Don't use surrogate keys!

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Primary key in fact table

Post  BoxesAndLines on Tue Apr 14, 2015 10:32 am

I love having a single column unique column on fact tables in addition to the combination of dimensions that define the grain. It's very useful.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Primary key in fact table

Post  gvarga on Tue Apr 14, 2015 3:23 pm

You should read Kimball’s Design tip from 2007: Keep to the grain in dimensional modeling: When developing fact tables, aggregated data is NOT the place to start. To avoid “mixed granularity” woes including bad and overlapping data, stick to rich, expressive, atomic-level data that’s closely connected to the original source and collection process.

Two other aspects:

In several cases in a BI tool you have to drill down to atomic level

When we have to modify the fact row then surrogate key is needed again.  Modification can be made not only in accumulating snapshot tables. A very common modification: there is a late arriving dimension instance, so we have to connect the fact row to the “Undefined” dimension row. Later the dimension row arrives and the business wants to correct the fact row but wants to store the original row as well. In this case we have to create timespan rows with validity data (row effective date, row expiration date, and current row indicator).


gvarga

Posts : 43
Join date : 2010-12-15

View user profile

Back to top Go down

Re: Primary key in fact table

Post  gvarga on Tue Apr 14, 2015 3:35 pm

You should read Kimball’s Design tip from 2007: Keep to the grain in dimensional modeling: When developing fact tables, aggregated data is NOT the place to start. To avoid “mixed granularity” woes including bad and overlapping data, stick to rich, expressive, atomic-level data that’s closely connected to the original source and collection process.
Two other aspects:
In several cases in a BI tool you have to drill down to atomic level
When we have to modify the fact row then surrogate key is needed again. Modification can be made not only in accumulating snapshot tables. A very common modification: there is a late arriving dimension instance, so we have to connect the fact row to the “Undefined” dimension row. Later the dimension row arrives and the business wants to correct the fact row but wants to store the original row as well. In this case we have to create timespan rows with validity data (row effective date, row expiration date, and current row indicator).

gvarga

Posts : 43
Join date : 2010-12-15

View user profile

Back to top Go down

Re: Primary key in fact table

Post  ngalemmo on Tue Apr 14, 2015 7:12 pm

You only need to worry about primary keys in a fact table if you intend to update rows in the table. A primary key is not necessarily all the dimensions, just the columns necessary to identify a row.
If you need a primary key, a surrogate can be useful but not absolutely necessary. One advantage of a surrogate is it allows you to update dimension values (keys or degenerates) that might otherwise be part of a compound primary key. A downside is it can complicate the update process if the surrogate key is not based on a natural key.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Primary key in fact table

Post  nick_white on Wed Apr 15, 2015 2:44 am

Quite agree that if you need to update a fact table then having a surrogate PK is useful (and sometimes essential).

Fact grain: the grain of a fact is set by the dimensional keys it has. If you have more than one record in a fact table with the same Dim keys then you have broken the grain of the fact. Effectively you have records that are more granular than the Fact tables allows for.
If you want to hold, for example, individual transactions in your fact table then you need to define the grain of your fact table in such a way that it uniquely identifies each transaction (for example, by including a time Dim or a transaction ID degenerate dim)

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Primary key in fact table

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum