Additional customer attributes or new dimensions - when is a dimension too wide?
2 posters
Page 1 of 1
Additional customer attributes or new dimensions - when is a dimension too wide?
I'm having a dilemma as to which direction I should go in when it comes to modeling our Customer dimension.
I want to keep all the customer attributes in a single dimension and avoid the need for additional joins when retrieving the data from the DW but the dimension is getting quite wide - around 70 columns at this point.
After reading some of the articles online a lot of time this is the way to go although some people do recommend breaking the table into separate dimensions. I've read an article that recommends breaking a single dimension into one or more (snowflake design) but only if 80% of the time the main dimension will suffice and 20% of the time we may need to grab additional data from the snowflaked dimension (visitors vs reliable/regular customers). In my case, however, 98% of the time I need all the information that I have so if I split the data into few dimensions, 98% of the time I would need to join all those dimensions and pull the data. It sounds like while I will have a cleaner design and narrower dimensions the performance will inevitably suffer. Btw, I'm talking about attributes such as customer's billing address, customer's shipping address, and a bunch of other unrelated customer's characteristics. The addresses are the only attributes that to me it could make sense to keep in a separate dimension but even with those I'd rather keep them in the customer dimension unless this is not something that is usually done.
What is your opinion and a common approach in this situation and are very wide dimensions (if say 100 column wide dimension is even considered very wide) an issue?
Thanks,
I want to keep all the customer attributes in a single dimension and avoid the need for additional joins when retrieving the data from the DW but the dimension is getting quite wide - around 70 columns at this point.
After reading some of the articles online a lot of time this is the way to go although some people do recommend breaking the table into separate dimensions. I've read an article that recommends breaking a single dimension into one or more (snowflake design) but only if 80% of the time the main dimension will suffice and 20% of the time we may need to grab additional data from the snowflaked dimension (visitors vs reliable/regular customers). In my case, however, 98% of the time I need all the information that I have so if I split the data into few dimensions, 98% of the time I would need to join all those dimensions and pull the data. It sounds like while I will have a cleaner design and narrower dimensions the performance will inevitably suffer. Btw, I'm talking about attributes such as customer's billing address, customer's shipping address, and a bunch of other unrelated customer's characteristics. The addresses are the only attributes that to me it could make sense to keep in a separate dimension but even with those I'd rather keep them in the customer dimension unless this is not something that is usually done.
What is your opinion and a common approach in this situation and are very wide dimensions (if say 100 column wide dimension is even considered very wide) an issue?
Thanks,
dk2014- Posts : 15
Join date : 2014-11-10
Re: Additional customer attributes or new dimensions - when is a dimension too wide?
Given that customer will (hopefully) be your largest dimension, you don't want to go too wide to avoid performance issues.
I would put address off into another table. It takes up a lot of space and most of the information is useless for analysis. You may want to leave a handful of important address attributes on the customer dim itself (zip, state, etc…) that are commonly used in queries.
If you have to deal with segmentation attributes, particularly if marketing keeps adding new ones, you should consider going vertical. Have a segmentation attribute dimension and a bridge between customer and attribute.
I would put address off into another table. It takes up a lot of space and most of the information is useless for analysis. You may want to leave a handful of important address attributes on the customer dim itself (zip, state, etc…) that are commonly used in queries.
If you have to deal with segmentation attributes, particularly if marketing keeps adding new ones, you should consider going vertical. Have a segmentation attribute dimension and a bridge between customer and attribute.
Re: Additional customer attributes or new dimensions - when is a dimension too wide?
Thanks for your reply ngalemmo! I'll definitely consider doing it the way you suggested.
DimCustomer is/will be the largest dimension but is still relatively small when initially loaded and before SCDs start inserting new rows - around 550k records.
Would I combine all the addresses (shipping & billing) into one dimension, sayDim_CustomerAddress, or would you have one for shipping and another one for billing?
The whole address part is only 5-6 attributes and if I leave State, Zip, City in the main Dim_Customer as you suggested then I would only be storing 2-3 fields, per address type, in this new address dimension.
Makes me wonder if it's worth it?
DimCustomer is/will be the largest dimension but is still relatively small when initially loaded and before SCDs start inserting new rows - around 550k records.
Would I combine all the addresses (shipping & billing) into one dimension, sayDim_CustomerAddress, or would you have one for shipping and another one for billing?
The whole address part is only 5-6 attributes and if I leave State, Zip, City in the main Dim_Customer as you suggested then I would only be storing 2-3 fields, per address type, in this new address dimension.
Makes me wonder if it's worth it?
dk2014- Posts : 15
Join date : 2014-11-10
Re: Additional customer attributes or new dimensions - when is a dimension too wide?
There would be one address dimension with all addresses. I would include city, state and zip in both the address dimension (for completeness) and customer. I would probably not include versions of city, state and zip in customer for all address types (billing, shipping) as it may have little or infrequent use. Also, these address could vary by order. If someone wanted to do analysis by shipping address, for example, they could join to the address from the fact.
Re: Additional customer attributes or new dimensions - when is a dimension too wide?
Thank you ngalemmo!
dk2014- Posts : 15
Join date : 2014-11-10
Re: Additional customer attributes or new dimensions - when is a dimension too wide?
One more thing, when you say 'to do analysis by shipping address, for example, they could join to the address from the fact' do you mean that I should include the CustomerAddressKey in the Fact table rather than in the Dim_Customer table or that maybe I should include it in both!?
My thought was that if I have the OrderSalesFact table with the granularity of an order item, the fact table would include OrderNumber degenerate dimension, CustomerKey, StoreKey, OrderDateKey, etc. and then DimCustomer table would have a FK CustomerAddressKey (from DimCustomerAddress) in which case I don't see how one could join the DimCustomerAddress directly from the Fact table.
My thought was that if I have the OrderSalesFact table with the granularity of an order item, the fact table would include OrderNumber degenerate dimension, CustomerKey, StoreKey, OrderDateKey, etc. and then DimCustomer table would have a FK CustomerAddressKey (from DimCustomerAddress) in which case I don't see how one could join the DimCustomerAddress directly from the Fact table.
dk2014- Posts : 15
Join date : 2014-11-10
Re: Additional customer attributes or new dimensions - when is a dimension too wide?
The fact table would contain foreign keys to the address dimension. There would be separate keys for each role: customer address, shipping address, etc… Using a FK off customer is snowflaking, which is unnesscessary and can hamper query performance.
dk2014- Posts : 15
Join date : 2014-11-10
Similar topics
» Add customer attributes to fact or new dimension
» Calculated attributes in Customer Dimension?
» attributes depending on more than 1 dimension (2 dimensions)
» Attributes from a dimension appearing in other dimensions; multiple-inheritance in hierarchies.
» De-normalizing Customer Information to create a Customer Dimension
» Calculated attributes in Customer Dimension?
» attributes depending on more than 1 dimension (2 dimensions)
» Attributes from a dimension appearing in other dimensions; multiple-inheritance in hierarchies.
» De-normalizing Customer Information to create a Customer Dimension
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum