Large Mini dimension alternate ?
3 posters
Page 1 of 1
Large Mini dimension alternate ?
We are planning to create a Mini dimension for our customers with following fields as these fields would make the Cust Dim(Type-2) huge.
Cust State, Cust 3 digit zip(929 unique values), Age(120 unique values), Rating Age(15 unique values), Gender(3 unique values) and this is exploding as 929*120*15*3 would come around 5 millions which is very large for Mini dimension. Should we outrigg the 3 digit zip from Cust dim or cust mini dim ? What other options do we have ?
Thanks for your time...
Cust State, Cust 3 digit zip(929 unique values), Age(120 unique values), Rating Age(15 unique values), Gender(3 unique values) and this is exploding as 929*120*15*3 would come around 5 millions which is very large for Mini dimension. Should we outrigg the 3 digit zip from Cust dim or cust mini dim ? What other options do we have ?
Thanks for your time...
VTK- Posts : 50
Join date : 2011-07-15
Re: Large Mini dimension alternate ?
When designing mini (aka junk) dimensions you need to consider correlation and cardinality. State and zip code have very high correlation, so it makes sense to put them together. Age and sex have very low correlation, but also have low cardinality. Creating two tables will give you two fairly small manageable tables. However, when you put them together into one table, you have five attributes with no correlation and, in combination, high cardinality... not a good way to do things.
As far as zip code goes, I would not truncate it to 3 digits if you have all five. Why lose information that may be useful later (i.e. geographic analysis)?
As far as zip code goes, I would not truncate it to 3 digits if you have all five. Why lose information that may be useful later (i.e. geographic analysis)?
Re: Large Mini dimension alternate ?
Thanks for your reply. If we create two mini dims now we have totally 3 dims for Customer and we have to use all of them wherever we need customer info(assuming they will be using these fields). Correct ? I am just wondering if that's a good practice. Is this how others are handling this ? I guess it would not be just me who faced this scenario and others would have faced similar issue when dealing with Customers.
VTK- Posts : 50
Join date : 2011-07-15
Re: Large Mini dimension alternate ?
Why is a mini-dimension considered a junk dimension? Mini-dimensions can be created for any point of a hierarchy. If your Geographic dimension starts at Census Block Group and rolls to Census tract, County, State, you can add a 2nd surrogate key at say County, roll fact data up to the county_skey and then create a view from the geo dimension that included county and state.
I would not recommend doing this on very long dimension such as a customer or member dimension unless you can create a materialized view of the mini-dimension otherwise performance could be really slow.
Zip Code information can be tricky. You can purchase data that rolls Zip Code up to City, County, and State (it's not 100% accurate, but many people use it). If this is the case, I would put Zip Code in it's own dimension along with City, County, and state as it can be used with more than just the customer.
Age is another element that I tend to put in it's own dimension table. I usually want to know the age of the customer at the time of a transaction. If people aren't careful, they can match the customers current age with activity from years ago. It's OK to put gender on the Customer as it doesn't change very often, but if you need to aggregate data to Gender, consider putting it in a junk dimension. You could put Gender and Age together in the same dimension - giving you a dimension with 360 rows.
Again, I tend not to create mini-dimensions off really big dimension tables. There are times when it can't be avoided. We had a member dimension in which the relationship code of the member to subscriber was part of the unique identifier of the member. The challenge was that many reports had data at the relationship level. Creating a mini-dimension on the member dimension wasn't practical because the member dimension had millions of rows. Instead, I created a Relationship Dimension and denormalized it within the member dimension.
I would not recommend doing this on very long dimension such as a customer or member dimension unless you can create a materialized view of the mini-dimension otherwise performance could be really slow.
Zip Code information can be tricky. You can purchase data that rolls Zip Code up to City, County, and State (it's not 100% accurate, but many people use it). If this is the case, I would put Zip Code in it's own dimension along with City, County, and state as it can be used with more than just the customer.
Age is another element that I tend to put in it's own dimension table. I usually want to know the age of the customer at the time of a transaction. If people aren't careful, they can match the customers current age with activity from years ago. It's OK to put gender on the Customer as it doesn't change very often, but if you need to aggregate data to Gender, consider putting it in a junk dimension. You could put Gender and Age together in the same dimension - giving you a dimension with 360 rows.
Again, I tend not to create mini-dimensions off really big dimension tables. There are times when it can't be avoided. We had a member dimension in which the relationship code of the member to subscriber was part of the unique identifier of the member. The challenge was that many reports had data at the relationship level. Creating a mini-dimension on the member dimension wasn't practical because the member dimension had millions of rows. Instead, I created a Relationship Dimension and denormalized it within the member dimension.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Large Mini dimension alternate ?
They were referred to as 'junk' dimensions in the first Toolkit book. The second version of the book uses the term 'mini'. Both are essentially the same thing.
Re: Large Mini dimension alternate ?
kaps wrote:Thanks for your reply. If we create two mini dims now we have totally 3 dims for Customer and we have to use all of them wherever we need customer info(assuming they will be using these fields). Correct ? I am just wondering if that's a good practice. Is this how others are handling this ? I guess it would not be just me who faced this scenario and others would have faced similar issue when dealing with Customers.
Yes, you would include FK's to those dimensions when applicable. It's not a big deal, as smaller dimensions help performance in most cases.
Similar topics
» alternate approaches for late arriving dimension attributes
» Tip #136 Adding a Mini-Dimension to a Bridge Table: What if you can't split the dimension?
» modelling Product dimension for Pizza outlet
» Mini Dimension Needed?
» When to create mini-dimension
» Tip #136 Adding a Mini-Dimension to a Bridge Table: What if you can't split the dimension?
» modelling Product dimension for Pizza outlet
» Mini Dimension Needed?
» When to create mini-dimension
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum