Degenerate Dimension Implementation
4 posters
Page 1 of 1
Degenerate Dimension Implementation
I am looking for some guidance around the proper scenario to use a degenerate dimension.
I currently have a fact table defined as such:
Date_Key
Distributor_Key
Item_Key
Customer_Key
Invoice_Number
Invoice_TXN_Date
Purchase_QTY
Purchase_AMT
The Date_Key is the date that the data was provided (its provided once a month) and is what all reporting is based from. The Invoice_TXN_Date is the date of the transaction for the item. It is rare that there are multiple Invoice_TXN_Date values for a given Invoice_Number, but it does happen.
When I originally created this my thought was to make the Invoice_Number a Degenerate Dimension. I do not have any Invoice Line item numbers and since this table is at the Item level, the Invoice Number and Invoice Date are repeated for each item related to a specific invoice. This fact table has 8 million records. The distinct Invoice_Number, Invoice_TXN_Date count is roughly 500,000.
The only reporting requirements given around the Invoice_Number and Invoice_Txn_Date are the following:
- Be able to display items and totals by Invoice Number
- Be able to display items and totals by Invoice_Txn_Date
My first question is if it makes sense to make the Invoice_Number a degenerate dimension given the above data scenario or should it be a separate dimension?
Second, if its best to go the degenerate route and considering that I am putting the data into Analysis Services 2008, would I need to add a surrogate key to my fact table for the Invoice_Number to have a unique key to create the Degenerate dimension on?
Thanks!
I currently have a fact table defined as such:
Date_Key
Distributor_Key
Item_Key
Customer_Key
Invoice_Number
Invoice_TXN_Date
Purchase_QTY
Purchase_AMT
The Date_Key is the date that the data was provided (its provided once a month) and is what all reporting is based from. The Invoice_TXN_Date is the date of the transaction for the item. It is rare that there are multiple Invoice_TXN_Date values for a given Invoice_Number, but it does happen.
When I originally created this my thought was to make the Invoice_Number a Degenerate Dimension. I do not have any Invoice Line item numbers and since this table is at the Item level, the Invoice Number and Invoice Date are repeated for each item related to a specific invoice. This fact table has 8 million records. The distinct Invoice_Number, Invoice_TXN_Date count is roughly 500,000.
The only reporting requirements given around the Invoice_Number and Invoice_Txn_Date are the following:
- Be able to display items and totals by Invoice Number
- Be able to display items and totals by Invoice_Txn_Date
My first question is if it makes sense to make the Invoice_Number a degenerate dimension given the above data scenario or should it be a separate dimension?
Second, if its best to go the degenerate route and considering that I am putting the data into Analysis Services 2008, would I need to add a surrogate key to my fact table for the Invoice_Number to have a unique key to create the Degenerate dimension on?
Thanks!
ChrisJames- Posts : 2
Join date : 2011-04-30
Re: Degenerate Dimension Implementation
would I need to add a surrogate key to my fact table for the Invoice_Number to have a unique key to create the Degenerate dimension on?
A degenerate dimension is a dimension with no attributes. You simply place the invoice number in the fact table. No surrogate key. End of story.
Its an appropriate thing to do for things like invoice numbers.
Invoice transaction date should be an FK to the date dimension. It is not clear that is the case from your column name.
Re: Degenerate Dimension Implementation
Thanks. I understand about not needing a surrogate key if using the Invoice Number as a degenerate dimension. But in regards to my first question, since the Invoice Number is not the same grain as my fact table and is repeated for each Item that makes up the complete invoice, is it still appropriate to leave it in the Fact table? Fact table is over 8 million records. Distinct Invoice number is under 500,000. I know things like Invoice Number are usually left in the fact table but I am curious as to when it is not appropriate and if my given situation is not an appropriate scenario.
ChrisJames- Posts : 2
Join date : 2011-04-30
Re: Degenerate Dimension Implementation
Sure. Each row is an item on the invoice, so invoice number is part of the grain.
Re: Degenerate Dimension Implementation
That wil also provide an easy method for counting the number of invoices.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Degenerate Dimension Implementation
Another comment... repetition of a dimensional key or a degenerate dimension value in a fact table is normal and expected. Which is why fact table queries are almost always aggregate queries.
The cardinality, in relation to the fact table, is only an issue with an actual dimension table, never with a degenerate dimension. Usually, when there is a high number of dimension rows relative to the number of fact rows, it is an indication that the dimension needs to be revisited and possibly broken down. That is why, like in your case, you don't usually see Invoice or Order dimensions, because the attributes relating to invoices and orders are broken down into simpler (and much smaller) dimension tables. This leaves just the identity (invoice number, order number, etc...) which winds up as a degenerate value in the fact.
The cardinality, in relation to the fact table, is only an issue with an actual dimension table, never with a degenerate dimension. Usually, when there is a high number of dimension rows relative to the number of fact rows, it is an indication that the dimension needs to be revisited and possibly broken down. That is why, like in your case, you don't usually see Invoice or Order dimensions, because the attributes relating to invoices and orders are broken down into simpler (and much smaller) dimension tables. This leaves just the identity (invoice number, order number, etc...) which winds up as a degenerate value in the fact.
Re: Degenerate Dimension Implementation
There is another dimension to your query. From the description it appears that the grain of the fact is at 'Invoice Item level'. If in future, there is a plan to have another fact table with the granularity of 'Invoice Number' and roll over/denormalise ceratin item line attributes to the fact directly, you would also prefer to have the drill down capability from the Invoice Fact to Invoice Item Fact. In such circumstance, I guess keeping 'Invoice Number' as dimension and linking both the fact tables will be beneficial.
datamodeller- Posts : 9
Join date : 2010-07-25
Similar topics
» Question on breaking out Degenerate Dimension to separate dimension
» Should I use a degenerate dimension or create a junk dimension?
» Junk Dimension Implementation
» Should this be a degenerate dimension or a junk dimension?
» Normal dimension or degenerate dimension?
» Should I use a degenerate dimension or create a junk dimension?
» Junk Dimension Implementation
» Should this be a degenerate dimension or a junk dimension?
» Normal dimension or degenerate dimension?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum