Modeling question from DW amateur
3 posters
Page 1 of 1
Modeling question from DW amateur
Was hoping someone might be able to provide some guidance on this question.
Working on setting up a starschema to be cubed via SSAS. Purpose of the cube is to show how student achievement of 'indicators' (like "Completed College Math", "Utilized Writing Center") affect 'milestones' (like "Reenrolled for second year", "Graduated Within 4 Years", etc.)
The endusers will want to view the cube (probably via PowerPivot) as Milestones across columns, Indicators across rows, and counts/%s in the cells.
My initial impulse is to model as follows:
Milestones_dim:
Key Milestone Value
1 Reenrolled for second year Y
2 Reenrolled for second year N
3 Graduated Within 4 Years Y
4 Graduated Within 4 Years N
indicators_dim:
Key Indicators Value
1 Completed College Math Y
2 Completed College Math N
3 Utilized Writing Center Y
4 Utilized Writing Center N
student_success_fact:
student_key milestone1_key milestone2_key indicator1_key indicator2_key
1 1 4 1 3
2 2 4 2 3
.
.
.
In this model I would have one column in the fact table for each milestone (and one for each indicator)  am i totally offbase?
Thanks for any help
Working on setting up a starschema to be cubed via SSAS. Purpose of the cube is to show how student achievement of 'indicators' (like "Completed College Math", "Utilized Writing Center") affect 'milestones' (like "Reenrolled for second year", "Graduated Within 4 Years", etc.)
The endusers will want to view the cube (probably via PowerPivot) as Milestones across columns, Indicators across rows, and counts/%s in the cells.
My initial impulse is to model as follows:
Milestones_dim:
Key Milestone Value
1 Reenrolled for second year Y
2 Reenrolled for second year N
3 Graduated Within 4 Years Y
4 Graduated Within 4 Years N
indicators_dim:
Key Indicators Value
1 Completed College Math Y
2 Completed College Math N
3 Utilized Writing Center Y
4 Utilized Writing Center N
student_success_fact:
student_key milestone1_key milestone2_key indicator1_key indicator2_key
1 1 4 1 3
2 2 4 2 3
.
.
.
In this model I would have one column in the fact table for each milestone (and one for each indicator)  am i totally offbase?
Thanks for any help
engil Posts : 4
Join date : 20110715
Re: Modeling question from DW amateur
You should use junk dimension instead. Basically crosstab your two dimension records into columns in a single dimension, and each record represents the combination of the possible Y and N in all the columns. In this model, you only have a single indicator key in the fact table which supports all the possible indicator filtering. There are a lot of posts on junk dimension, please search the topic for how to construct such a dimension.
hang Posts : 528
Join date : 20100507
Location : Brisbane, Australia
More
Actually that's what I was considering initially (a single dimension with every possible combination)  I eventually discarded this for several reasons:
 I only showed two milestones and two indicators, but ultimately there will be 10 at least of each. while I realize that 10 Y/N items is only 1024 rows, I worried a little about scalability and maintainability
 more importantly given my newness to SSAS I couldn't figure out with the junk dimension method whether it would lend itself to my hoped for results, ie:
I figured that with the junk dimension I would end up with
 I only showed two milestones and two indicators, but ultimately there will be 10 at least of each. while I realize that 10 Y/N items is only 1024 rows, I worried a little about scalability and maintainability
 more importantly given my newness to SSAS I couldn't figure out with the junk dimension method whether it would lend itself to my hoped for results, ie:
Reenrolled second semester  Graduated within 4 years  
Yes  No  Yes  No  
Completed College Math  Yes  67%  55%  45%  53%  47% 
No  33%  45%  55%  24%  76%  
Completed College English  Yes  36%  24%  76%  
No  64%  
Milestone 3  Yes  
No 
Completed College Math  Didnt complete college english  Didnt complete milestone 3  7%  
Completed College Math  Didnt complete college english  Did complete milestone 3  12%  
Completed College Math  Did complete college english  Didnt complete milestone 3  15%  
Completed College Math  Did complete college english  Did complete milestone 3  2% 
engil Posts : 4
Join date : 20110715
Re: Modeling question from DW amateur
So it's four attribute junk dimension. I guess it's perfect candidate for junk dimension. Trust me, junk dimension may be a little misleading by its name, but it is a very elegant and effective approach to your problem and SSAS loves junk dimensions from my own experience. I think the problem lies in whether you truly understand the concept which is so unique and different from traditional relational modeling. There are many expert advices on the forum which should give you solid ground, but you need to do your own search on it to fully understand it.engil wrote:I only showed two milestones and two indicators
hang Posts : 528
Join date : 20100507
Location : Brisbane, Australia
Re: Modeling question from DW amateur
I appreciate your reply, but how are you getting that its a 4 attribute junk dimension?
I have been doing a bunch of searching on this topic, but haven't come up with a solution that I understand which is why I posted on this forums which has (as you said) many experts.
I appreciate your taking the time to look at this, but I still am kind of in the dark. I think somewhere in one of the Kimball DW Toolkit books it says it helps to think of dimensions in terms of "I want to look at a BY b BY c" where b and c define your dimensions. I want to look at students BY milestones BY indicators, which is why I thought to model the way I've proposed.
I have been doing a bunch of searching on this topic, but haven't come up with a solution that I understand which is why I posted on this forums which has (as you said) many experts.
I appreciate your taking the time to look at this, but I still am kind of in the dark. I think somewhere in one of the Kimball DW Toolkit books it says it helps to think of dimensions in terms of "I want to look at a BY b BY c" where b and c define your dimensions. I want to look at students BY milestones BY indicators, which is why I thought to model the way I've proposed.
engil Posts : 4
Join date : 20110715
Re: Modeling question from DW amateur
OK, Here's a good article by Warrent, http://www.kimballgroup.com/html/09dt/DT113CreatingUsingMaintainingJunkDimensions.pdf
Basically, you don't have to put attributes of the same kind into a single junk dimension. You may combine unrelated attributes into a junk dimension as long as they are of low cardinality, and y/n flags are always perfect candidates for junk dimension.
So in your example, Milestones_dim contains two flags, Enrolled for second year: y/n, graduated within 4 years: y/n. Indicators_dim, Completed College Math: y/n, Utilized Writing Center: y/n. Therefore you end up with one junk dimension as follows:
EducationProfile_dim
 EducationProfileKey SK
 EnrolledFor2ndYear
 GradatedIn4Year
 CompletedCollegeMath
 UtilisedWritingCenter
With so few attributes and low cardinality y/n, you may prebuild the dimension with 2x2x2x2=16 records. Think of this dimension as a denormalised dimension with four attributes and put single FK in your fact table by matching four attributes. So when you use it, through a single join all four attributes are at your disposal for filtering, dicing and slicing. With SSAS, just treat it like normal denormalized dimension. Hope this would help.
Basically, you don't have to put attributes of the same kind into a single junk dimension. You may combine unrelated attributes into a junk dimension as long as they are of low cardinality, and y/n flags are always perfect candidates for junk dimension.
So in your example, Milestones_dim contains two flags, Enrolled for second year: y/n, graduated within 4 years: y/n. Indicators_dim, Completed College Math: y/n, Utilized Writing Center: y/n. Therefore you end up with one junk dimension as follows:
EducationProfile_dim
 EducationProfileKey SK
 EnrolledFor2ndYear
 GradatedIn4Year
 CompletedCollegeMath
 UtilisedWritingCenter
With so few attributes and low cardinality y/n, you may prebuild the dimension with 2x2x2x2=16 records. Think of this dimension as a denormalised dimension with four attributes and put single FK in your fact table by matching four attributes. So when you use it, through a single join all four attributes are at your disposal for filtering, dicing and slicing. With SSAS, just treat it like normal denormalized dimension. Hope this would help.
hang Posts : 528
Join date : 20100507
Location : Brisbane, Australia
Re: Modeling question from DW amateur
Hang  thanks again for taking so much time helping me out.
I checked out the article you linked to, and it was very informative. I think I actually have something of a grasp on junk dimensions (I've used them before, albeit sparingly)  my problem right now is I think related to my lack of experience with SSAS. Ultimately the issue is that I can't figure out how to take this junk dimension and build it in SSAS in such a way as to allow an end user to run the report I showed below via Excel. Would it require a large amount of work on my part in terms of custom calculations during the cube build process?
Note also that ultimately I will have probably 15 milestones and 15 indicators, which leads to 2^30 rows (1B+) if I were to put them all in the same junk dimension (although of course this is only if I populate every possible combination, if I were to only populate values that actually occur it might be less.
So basically I guess the question is: if I use a single junk dimension to store all of my indicator and milestone combinations, how difficult will it be to build the cube such that I can achieve the report shown in my earlier post (all milestones on x axis, all indicators on y axis, %s in cells).
I checked out the article you linked to, and it was very informative. I think I actually have something of a grasp on junk dimensions (I've used them before, albeit sparingly)  my problem right now is I think related to my lack of experience with SSAS. Ultimately the issue is that I can't figure out how to take this junk dimension and build it in SSAS in such a way as to allow an end user to run the report I showed below via Excel. Would it require a large amount of work on my part in terms of custom calculations during the cube build process?
Note also that ultimately I will have probably 15 milestones and 15 indicators, which leads to 2^30 rows (1B+) if I were to put them all in the same junk dimension (although of course this is only if I populate every possible combination, if I were to only populate values that actually occur it might be less.
So basically I guess the question is: if I use a single junk dimension to store all of my indicator and milestone combinations, how difficult will it be to build the cube such that I can achieve the report shown in my earlier post (all milestones on x axis, all indicators on y axis, %s in cells).
engil Posts : 4
Join date : 20110715
Re: Modeling question from DW amateur
That's the very reason why you need to leverage junk dimension, otherwise you would end up with a centipede fact table coined by Kimball as a bad example for fact tables. Imagine what pain it is to join 30 times in order to reach those two value attributes instead of just once or twice when using junk dimensions.engil wrote:Note also that ultimately I will have probably 15 milestones and 15 indicators, which leads to 2^30 rows (1B+)
However since you could potentially have many more attributes for both milestones and indicators, it might be a good idea to split the junk dimension into two early on, so that they would never go beyond acceptable size. For the purpose of ease, you may start with prebuild dimension, and if the number of attributes grow in future, you may change to incrementally build the junk dimensions so that they only contain combinations that have existed in the fact.
Now it comes the tricky part as in your last requirement for the cube. I think the base model is set and can suffice most reporting requirements, but your cube presentation requires a bit of twist on both base fact and dimension tables. For the cube you may need to produce or configure another fact table for the cell and two dimensions, x and y, using the the base model as the source.
From what I can see, you want to make milestone junk dimension column names to be values of one attribute in one dimension x, and the same arrangement for indicator dimension y. A fact view with query of a few union and case statements on the base fact and dimension tables can cater for that. Briefly, each Y in the junk dimension will add to the view a record with the attribute (column) name as the value. If the performance is not an issue, than the new fact view and dimensions can be achieved at logical level, but you need to know how to configure the fact (degenerate) dimension in SSAS without creating physical dimension tables. Otherwise just materialise the data in the physical tables.
hang Posts : 528
Join date : 20100507
Location : Brisbane, Australia
Re: Modeling question from DW amateur
You want to strive for a practical balance. A single junk dimension with 2^30 rows is a bit extreme, but so is having 30 separate dimensions and thus 30 FKs in the fact table.
Just having two junk dimensions (say one for milestones and one for indicators) gives a more reasonable 2^15 for each with only two FKs in the fact table. This would probably be a good approach for you. If you feel that the junk dimensions still have too many rows, you might explore a "middle of the road" design with 4 junk dimensions and 4 FKs.
As far as SSAS is concerned, it doesn't really matter whether you have a single dimension or 30 or somewhere in between as far as sliceanddice analysis is concerned. The user can use any dimension attribute as a row or column header.
Just having two junk dimensions (say one for milestones and one for indicators) gives a more reasonable 2^15 for each with only two FKs in the fact table. This would probably be a good approach for you. If you feel that the junk dimensions still have too many rows, you might explore a "middle of the road" design with 4 junk dimensions and 4 FKs.
As far as SSAS is concerned, it doesn't really matter whether you have a single dimension or 30 or somewhere in between as far as sliceanddice analysis is concerned. The user can use any dimension attribute as a row or column header.
VHF Posts : 236
Join date : 20090428
Location : Wisconsin, US
Re: Modeling question from DW amateur
I’ve been giving this some further thought and took another look at your desired output example. The previous discussions have identified some sound design principles, and would give you a cube suitable for “slice and dice” drilldown analysis, and is also excellent for asking “how many” type questions, but I don’t think it will allow generating the type of output you want in a single pivot table query.
In fact, with any of the designs discussed it would take 6 different queries rather one. If you had 15 milestones * 15 indicators this would 225 queries!
When users drag multiple dimension attributes into a row or column header, they naturally cascade. This is usually desirable because “slice and dice” analysis is normally splitting a “whole” into pieces (thus the “slice”!) However it appears you are looking at each “cross section” of milestone and indicator individually, not as part of a whole (which is why your percentages add up to much more than 100% going across or down.)
So in taking another look at your original design, your dimensions would give the rows and columns you want (i.e. milestone as the toplevel row header with milestone value as the secondlevel row header, same thing for columns with indicators):
Milestones_dim:
Key Milestone Value
1 Reenrolled for second year Y
2 Reenrolled for second year N
3 Graduated Within 4 Years Y
4 Graduated Within 4 Years N
indicators_dim:
Key Indicators Value
1 Completed College Math Y
2 Completed College Math N
3 Utilized Writing Center Y
4 Utilized Writing Center N
However, I think we need to look at a different fact table design. Let’s try a couple of factless fact tables (the sample data represents the same information as in your original example):
student_milestones_factless_fact:
student_key milestone_key
1 1
1 4
2 2
2 4
student_indicators_factless_fact:
student_key indicator_key
1 1
1 3
2 2
2 3
These two factless fact tables represent all the data, but don’t give us an easy way to reconcile milestones and indicators in a single crosstab query. So we create a third factless fact table which is essentially a product of the first two fact tables:
student_success_factless_fact:
student_key milestone_key indicator_key
1 1 1
1 1 3
1 4 1
1 4 3
2 2 2
2 2 3
2 4 2
2 4 3
You would need to populate this fact table with one record for each student for each combination of milestone and indicator. If you had 15 milestones and 15 indicators that would be 225 records per student. With this design you should be able to generate the output you want in a single pivot table, although you’ll need to be careful that the percentages behave correctly.
This design serves a specific purpose and does not lend itself as well to adhoc sliceanddice analysis as the designs discussed previously. There are some dangers here because a simple COUNT will add up to far more than unity—for example, each student that completed college math will have multiple fact records representing that they completed college math—one for each indicator. Given the dangers of doublecounting, I’m not sure I would want to turn this model over to users for adhoc use.
You might consider implementing multiple models in your DW (with separate SSAS cubes) to meet various objectives.
In fact, with any of the designs discussed it would take 6 different queries rather one. If you had 15 milestones * 15 indicators this would 225 queries!
When users drag multiple dimension attributes into a row or column header, they naturally cascade. This is usually desirable because “slice and dice” analysis is normally splitting a “whole” into pieces (thus the “slice”!) However it appears you are looking at each “cross section” of milestone and indicator individually, not as part of a whole (which is why your percentages add up to much more than 100% going across or down.)
So in taking another look at your original design, your dimensions would give the rows and columns you want (i.e. milestone as the toplevel row header with milestone value as the secondlevel row header, same thing for columns with indicators):
Milestones_dim:
Key Milestone Value
1 Reenrolled for second year Y
2 Reenrolled for second year N
3 Graduated Within 4 Years Y
4 Graduated Within 4 Years N
indicators_dim:
Key Indicators Value
1 Completed College Math Y
2 Completed College Math N
3 Utilized Writing Center Y
4 Utilized Writing Center N
However, I think we need to look at a different fact table design. Let’s try a couple of factless fact tables (the sample data represents the same information as in your original example):
student_milestones_factless_fact:
student_key milestone_key
1 1
1 4
2 2
2 4
student_indicators_factless_fact:
student_key indicator_key
1 1
1 3
2 2
2 3
These two factless fact tables represent all the data, but don’t give us an easy way to reconcile milestones and indicators in a single crosstab query. So we create a third factless fact table which is essentially a product of the first two fact tables:
student_success_factless_fact:
student_key milestone_key indicator_key
1 1 1
1 1 3
1 4 1
1 4 3
2 2 2
2 2 3
2 4 2
2 4 3
You would need to populate this fact table with one record for each student for each combination of milestone and indicator. If you had 15 milestones and 15 indicators that would be 225 records per student. With this design you should be able to generate the output you want in a single pivot table, although you’ll need to be careful that the percentages behave correctly.
This design serves a specific purpose and does not lend itself as well to adhoc sliceanddice analysis as the designs discussed previously. There are some dangers here because a simple COUNT will add up to far more than unity—for example, each student that completed college math will have multiple fact records representing that they completed college math—one for each indicator. Given the dangers of doublecounting, I’m not sure I would want to turn this model over to users for adhoc use.
You might consider implementing multiple models in your DW (with separate SSAS cubes) to meet various objectives.
VHF Posts : 236
Join date : 20090428
Location : Wisconsin, US
Similar topics
» Modeling Question
» modeling question
» Invoice dimensional modeling question
» Customer Dimension Modeling Question
» Data Modeling question (not really dimensional though)
» modeling question
» Invoice dimensional modeling question
» Customer Dimension Modeling Question
» Data Modeling question (not really dimensional though)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum

