dimension model for many indicators field
4 posters
Page 1 of 1
dimension model for many indicators field
I have 30+ fileds that has either 1 or 0 ( true or false)value in a fact table
Patid -- -- --- PTInd_1 Ptind_2 Ptind_3........ PTind_30 -- etc
1 0 1 1
I need to be able to create a dimension using thse attributes( patind1 ~ 30)..what is the best way to design the dimensional model for these fileds? there could be new indcator filed in the future...
so it has to be easy to add new attribute ...
junk dimension can have too many rows 2^30... and it will be hard to maintain if there are new fields need to be added right?
any suggestions would be appreciated
Patid -- -- --- PTInd_1 Ptind_2 Ptind_3........ PTind_30 -- etc
1 0 1 1
I need to be able to create a dimension using thse attributes( patind1 ~ 30)..what is the best way to design the dimensional model for these fileds? there could be new indcator filed in the future...
so it has to be easy to add new attribute ...
junk dimension can have too many rows 2^30... and it will be hard to maintain if there are new fields need to be added right?
any suggestions would be appreciated
j- Posts : 6
Join date : 2010-05-15
Re: dimension model for many indicators field
How many combinations do you actually have in the data? The junk dimension works since multiple facts all have the same combination of values.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: dimension model for many indicators field
thansk for the reply..
The problem is if we add a new indicator filed,.let's say patind_31.. we have to updte all the fact records.. or not.. how will the junk dimension work in case new columns got added to a fact table and need to add them to a junk dimension ?
right now.. it has 30 fileds.. so it can be multi millions combination.. what is the best way to do this?
The problem is if we add a new indicator filed,.let's say patind_31.. we have to updte all the fact records.. or not.. how will the junk dimension work in case new columns got added to a fact table and need to add them to a junk dimension ?
right now.. it has 30 fileds.. so it can be multi millions combination.. what is the best way to do this?
j- Posts : 6
Join date : 2010-05-15
Re: dimension model for many indicators field
I'll ask again, how many actual combinations do you have in the source data? Potential and reality are what make junk dimensions a viable solution.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: dimension model for many indicators field
Storing 31 attributes in a single junk dimension can reduce 31 fields to only one surrogate key in your fact table assuming most of the combinations are highly repeated. You can use SELECT DISTINCT field1, ..., field31 from the fact to figure out the number. If it is a manageable number (say thousands, definitely not millions), and the trend for future facts is clear, then you can incrementally build your junk dimension from the fact. You cannot pre-build the junk dimension by cartesian join in this case as it would exceed 1 billion records which is absolutely not a go for a dimension. Don't forget you might have to cater for null value as well.
Junk dimension is a very effective and elegant dimensional modelling technique when dealing with small number (say less than 10) of low cardinality (say less than 5) attributes in the fact. It avoid having duplicated character fields in fact table if the fields are not classified as degenerate dimensions, so that the fact table becomes normalised.
Junk dimension is a very effective and elegant dimensional modelling technique when dealing with small number (say less than 10) of low cardinality (say less than 5) attributes in the fact. It avoid having duplicated character fields in fact table if the fields are not classified as degenerate dimensions, so that the fact table becomes normalised.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: dimension model for many indicators field
If you are really worried about the number of combinations, do some correlation analysis on the data.
As B&L pointed out, most of the time, the number of actual combinations are far less than all possible combinations. But, 30+ attributes may be pushing it (even if they are only yes/no fields). The way around it is to identify attributes that correlate well and group them into smaller junk dimensions.
As B&L pointed out, most of the time, the number of actual combinations are far less than all possible combinations. But, 30+ attributes may be pushing it (even if they are only yes/no fields). The way around it is to identify attributes that correlate well and group them into smaller junk dimensions.
Re: dimension model for many indicators field
"The way around it is to identify attributes that correlate well and group them into smaller junk dimensions"
thanks.. can you give me some data example?
Alos, If a new column get added in junk dimension, do I have to update all the fact record? what will happened? and how can I handle if there are new columns got added in junk dimension?
thanks.. can you give me some data example?
Alos, If a new column get added in junk dimension, do I have to update all the fact record? what will happened? and how can I handle if there are new columns got added in junk dimension?
j- Posts : 6
Join date : 2010-05-15
Re: dimension model for many indicators field
For example, you have a demographics junk dimension containing level of education and salary. These two would have a strong correlation because, most of the time, those with higher education levels have higher salaries. While something like zip code and level of education would have a low correlation, since for most zip codes one could expect a mix of education levels. Whereas, zip code and city and state have a very high correlation. So, you would take these correlations into account when combining attributes into a junk dimension. You may put city, state and zip in one junk dimension and education, salary level into another.
As far as new columns go, junk dimensions are essentially point-in-time. If you need to retroactivly assign values historically, you would need to rebuild the dimesion and rebuild the facts. Not a task one should take lightly. However, it is usually the case that such new attributes do not have historical significance. In which case it is simply a matter of adding the column and assign appropriate keys moving forward. Adding such an attribute to a junk dimension, however, will impact the overall size of the dimension table.
As far as new columns go, junk dimensions are essentially point-in-time. If you need to retroactivly assign values historically, you would need to rebuild the dimesion and rebuild the facts. Not a task one should take lightly. However, it is usually the case that such new attributes do not have historical significance. In which case it is simply a matter of adding the column and assign appropriate keys moving forward. Adding such an attribute to a junk dimension, however, will impact the overall size of the dimension table.
Similar topics
» Date Dimension or DateTime Field
» Dimension Version Field Variations you've seen.
» How to track the change of a specific field in a dimension table?
» How do you model a dimension that behaves both as a dimension and a fact??
» How to model IP Dimension
» Dimension Version Field Variations you've seen.
» How to track the change of a specific field in a dimension table?
» How do you model a dimension that behaves both as a dimension and a fact??
» How to model IP Dimension
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum