dimension model for many indicators field

j Sat May 15, 2010 1:58 am

I have 30+ fileds that has either 1 or 0 ( true or false)value in a fact table

Patid -- -- --- PTInd_1 Ptind_2 Ptind_3........ PTind_30 -- etc

1 0 1 1

I need to be able to create a dimension using thse attributes( patind1 ~ 30)..what is the best way to design the dimensional model for these fileds? there could be new indcator filed in the future...
so it has to be easy to add new attribute ...

junk dimension can have too many rows 2^30... and it will be hard to maintain if there are new fields need to be added right?

any suggestions would be appreciated

BoxesAndLines Sat May 15, 2010 10:40 am

How many combinations do you actually have in the data? The junk dimension works since multiple facts all have the same combination of values.

j Sat May 15, 2010 2:44 pm

thansk for the reply..
The problem is if we add a new indicator filed,.let's say patind_31.. we have to updte all the fact records.. or not.. how will the junk dimension work in case new columns got added to a fact table and need to add them to a junk dimension ?

right now.. it has 30 fileds.. so it can be multi millions combination.. what is the best way to do this?

BoxesAndLines Sat May 15, 2010 7:14 pm

I'll ask again, how many actual combinations do you have in the source data? Potential and reality are what make junk dimensions a viable solution.

hang Sat May 15, 2010 9:00 pm

Storing 31 attributes in a single junk dimension can reduce 31 fields to only one surrogate key in your fact table assuming most of the combinations are highly repeated. You can use SELECT DISTINCT field1, ..., field31 from the fact to figure out the number. If it is a manageable number (say thousands, definitely not millions), and the trend for future facts is clear, then you can incrementally build your junk dimension from the fact. You cannot pre-build the junk dimension by cartesian join in this case as it would exceed 1 billion records which is absolutely not a go for a dimension. Don't forget you might have to cater for null value as well.

Junk dimension is a very effective and elegant dimensional modelling technique when dealing with small number (say less than 10) of low cardinality (say less than 5) attributes in the fact. It avoid having duplicated character fields in fact table if the fields are not classified as degenerate dimensions, so that the fact table becomes normalised.

ngalemmo Mon May 17, 2010 11:50 am

If you are really worried about the number of combinations, do some correlation analysis on the data.

As B&L pointed out, most of the time, the number of actual combinations are far less than all possible combinations. But, 30+ attributes may be pushing it (even if they are only yes/no fields). The way around it is to identify attributes that correlate well and group them into smaller junk dimensions.

j Mon May 17, 2010 12:37 pm

"The way around it is to identify attributes that correlate well and group them into smaller junk dimensions"
thanks.. can you give me some data example?

Alos, If a new column get added in junk dimension, do I have to update all the fact record? what will happened? and how can I handle if there are new columns got added in junk dimension?

ngalemmo Mon May 17, 2010 2:01 pm

For example, you have a demographics junk dimension containing level of education and salary. These two would have a strong correlation because, most of the time, those with higher education levels have higher salaries. While something like zip code and level of education would have a low correlation, since for most zip codes one could expect a mix of education levels. Whereas, zip code and city and state have a very high correlation. So, you would take these correlations into account when combining attributes into a junk dimension. You may put city, state and zip in one junk dimension and education, salary level into another.

As far as new columns go, junk dimensions are essentially point-in-time. If you need to retroactivly assign values historically, you would need to rebuild the dimesion and rebuild the facts. Not a task one should take lightly. However, it is usually the case that such new attributes do not have historical significance. In which case it is simply a matter of adding the column and assign appropriate keys moving forward. Adding such an attribute to a junk dimension, however, will impact the overall size of the dimension table.

Sponsored content

dimension model for many indicators field

dimension model for many indicators field

Re: dimension model for many indicators field

Re: dimension model for many indicators field

Re: dimension model for many indicators field

Re: dimension model for many indicators field

Re: dimension model for many indicators field

Re: dimension model for many indicators field

Re: dimension model for many indicators field

Re: dimension model for many indicators field