Specify acceptable Snowflaking scenarios
5 posters
Page 1 of 1
Specify acceptable Snowflaking scenarios
Can someone please specify acceptable snowflaking scenarios.
- Thanks in advance
bi_at_nj
- Thanks in advance
bi_at_nj
bi_at_nj- Posts : 14
Join date : 2009-10-31
Re: Specify acceptable Snowflaking scenarios
Snowflaking is advisable when there exists a many to one relationship between the key attributes of the dimension which are readily accessed along with the Fact table and the attributes which are not so often used along with Fact but are related to the key attributres in Dimension resulting in lot of rows in the Dimension table (due to many to one relationship)
Example : Customer Dim in Sales Star Schema,A customer can have multiple addresses,order_address,billing_address,ship_to_address etc.Now a particular sales record can have one ot all of these address aplicable,meaning all of the addresses can be same for a single customer or they can differ,here it is advisable to keep only the key attributes of the Customer (like name,etc) in cusomer dimension and take out the address attributes in the table name customer_address and join the customer dimension (1:m) to this.
Another candidate could be product dimension.Snoflaking makes the join between the Fact and dimension faster avoiding the scan of too many rows.
Example : Customer Dim in Sales Star Schema,A customer can have multiple addresses,order_address,billing_address,ship_to_address etc.Now a particular sales record can have one ot all of these address aplicable,meaning all of the addresses can be same for a single customer or they can differ,here it is advisable to keep only the key attributes of the Customer (like name,etc) in cusomer dimension and take out the address attributes in the table name customer_address and join the customer dimension (1:m) to this.
Another candidate could be product dimension.Snoflaking makes the join between the Fact and dimension faster avoiding the scan of too many rows.
kapoor_dh- Posts : 24
Join date : 2009-12-08
Re: Specify acceptable Snowflaking scenarios
Addresses are not an acceptable example of snowflaking. If you implement a separate customer dimension and customer address dimension, you may carry the customer dimension key in the customer address dimension, but this is technically not a snowflake because that foreign key would not (and should not) be used when reporting facts. It would only be used when generating a list of customers.
A fact table should contain foreign keys to all related dimensions. So there would be a customer FK as well as multiple role based FKs to the related addresses as necessary.
Snowflaking, which involves joining from a fact to a dimension, then joining again from the dimension to another dimension, doesn't make queries run faster, particularly on database systems (such as Oracle) which have features specifically to support star joins.
A fact table should contain foreign keys to all related dimensions. So there would be a customer FK as well as multiple role based FKs to the related addresses as necessary.
Snowflaking, which involves joining from a fact to a dimension, then joining again from the dimension to another dimension, doesn't make queries run faster, particularly on database systems (such as Oracle) which have features specifically to support star joins.
Re: Specify acceptable Snowflaking scenarios
You are correct,but the scenario which I have mentioned,the addressed are not often used for reporting,hence they are unnessesary increasing the size of the customer dimension.Only in very few Adhoc reportings we do need the addresses as well,I agree in such cases there will be an extra join,but the cases are few.
Also this is what I understand of Snowflaking a table that is coming out of dimension and not linked to the fact table,correct me if I am wrong and you have any other definition for the same,explaing with an example.
Also this is what I understand of Snowflaking a table that is coming out of dimension and not linked to the fact table,correct me if I am wrong and you have any other definition for the same,explaing with an example.
kapoor_dh- Posts : 24
Join date : 2009-12-08
Re: Specify acceptable Snowflaking scenarios
ngalemmo wrote:...If you implement a separate customer dimension and customer address dimension, you may carry the customer dimension key in the customer address dimension, but this is technically not a snowflake because that foreign key would not (and should not) be used when reporting facts. ...
Just checking here, you would put the customer FK in the address dim or vice versa? I would put the address FK('s) in the customer dim.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Specify acceptable Snowflaking scenarios
Yes, a snowflake is a FK from a dimension to another dimension where the latter dimension is not referenced by the fact.
But as far as an application of snowflaking goes, there is little gained by snowflaking shipping and billing information in a order-to-cash subject area. Just add the FK's to the facts.
But as far as an application of snowflaking goes, there is little gained by snowflaking shipping and billing information in a order-to-cash subject area. Just add the FK's to the facts.
Re: Specify acceptable Snowflaking scenarios
BoxesAndLines wrote:Just checking here, you would put the customer FK in the address dim or vice versa? I would put the address FK('s) in the customer dim.
It is typical that a customer may have multiple shipping addresses, and if the company involved is a manufacturer, many billing addresses as well. It would be impractical to place the FK's on the customer. Which is also why you need to put the FK's in the fact table. The shipping and/or billing destination may change from one order to the next.
Re: Specify acceptable Snowflaking scenarios
On the flip side of the coin, an address plays in umpteen (very precise term ;^)) different roles within the organization. Burdening an address dimension with all of these roles (i.e. FK's to other dimensions would be unwieldy). I've recently consolidated half of the addresses within my current client's organization and I already have 20M distinct addresses. The only snowflakes I've designed are to demographic dimensions. Customer relationships are temporary relationships to an address and would cause serious performance issues maintaining the current customer in a type 1 or actual history on a type 2 on the address dimension.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Specify acceptable Snowflaking scenarios
We can a case of a revenue fact table. The fact table had a huge number of rows and 1 single fact - revenue. The dimensions include product, customer, and dates for when the revenue was generated. There were 6 or 7 dates associate with the fact table, only 1 of which was used with any frequency. The database had a date dimension and a decision was made to create snow flake design that involved a table that had 5 columns containing the the date deminsion keys for each of the dates in the revenue data plus a "Revenue Date Key" that was put on the fact table. This greatly reduced the size of the fact table and improved the performance of the vast majority of queries against the fact table since few of them used the Dates in the Revenue Date Dimension.
I would only use such a design for very detailed data. Imagine a large bank with tens of millions of accounts and the interest rate spread changing daily plus transaction fees.
Of course, this design was from several years ago when a terabyte of disk actually took up space in a data center. Now a days, the actual physical space to store 10 columns of integers is less than the old disk used to store 1 column of integers.
I would only use such a design for very detailed data. Imagine a large bank with tens of millions of accounts and the interest rate spread changing daily plus transaction fees.
Of course, this design was from several years ago when a terabyte of disk actually took up space in a data center. Now a days, the actual physical space to store 10 columns of integers is less than the old disk used to store 1 column of integers.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Similar topics
» How many measures in a fact table is acceptable?
» Snowflake scenarios
» Is it acceptable to update fact row with latest status?
» master-detail scenarios
» SNOWFLAKING
» Snowflake scenarios
» Is it acceptable to update fact row with latest status?
» master-detail scenarios
» SNOWFLAKING
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum