Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Modeling question from DW amateur

3 posters

Go down

Modeling question from DW amateur Empty Modeling question from DW amateur

Post  engil Fri Jul 15, 2011 12:01 pm

Was hoping someone might be able to provide some guidance on this question.

Working on setting up a star-schema to be cubed via SSAS. Purpose of the cube is to show how student achievement of 'indicators' (like "Completed College Math", "Utilized Writing Center") affect 'milestones' (like "Reenrolled for second year", "Graduated Within 4 Years", etc.)

The end-users will want to view the cube (probably via PowerPivot) as Milestones across columns, Indicators across rows, and counts/%s in the cells.

My initial impulse is to model as follows:
Milestones_dim:
Key Milestone Value
1 Reenrolled for second year Y
2 Reenrolled for second year N
3 Graduated Within 4 Years Y
4 Graduated Within 4 Years N

indicators_dim:
Key Indicators Value
1 Completed College Math Y
2 Completed College Math N
3 Utilized Writing Center Y
4 Utilized Writing Center N

student_success_fact:
student_key milestone1_key milestone2_key indicator1_key indicator2_key
1 1 4 1 3
2 2 4 2 3
.
.
.

In this model I would have one column in the fact table for each milestone (and one for each indicator) - am i totally off-base?

Thanks for any help

engil

Posts : 4
Join date : 2011-07-15

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  hang Fri Jul 15, 2011 5:53 pm

You should use junk dimension instead. Basically cross-tab your two dimension records into columns in a single dimension, and each record represents the combination of the possible Y and N in all the columns. In this model, you only have a single indicator key in the fact table which supports all the possible indicator filtering. There are a lot of posts on junk dimension, please search the topic for how to construct such a dimension.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling question from DW amateur Empty More

Post  engil Fri Jul 15, 2011 6:11 pm

Actually that's what I was considering initially (a single dimension with every possible combination) - I eventually discarded this for several reasons:
- I only showed two milestones and two indicators, but ultimately there will be 10 at least of each. while I realize that 10 Y/N items is only 1024 rows, I worried a little about scalability and maintainability
- more importantly given my newness to SSAS I couldn't figure out with the junk dimension method whether it would lend itself to my hoped for results, ie:



Reenrolled second semester

Graduated within 4 years




Yes
No
Yes
No
Completed College Math
Yes
67%
55%
45%
53%
47%

No
33%
45%
55%
24%
76%
Completed College English
Yes
36%
24%
76%



No
64%




Milestone 3
Yes






No





I figured that with the junk dimension I would end up with














Completed College Math - Didnt complete college english - Didnt complete milestone 3
7%





Completed College Math - Didnt complete college english - Did complete milestone 3
12%





Completed College Math - Did complete college english - Didnt complete milestone 3
15%





Completed College Math - Did complete college english - Did complete milestone 3
2%






engil

Posts : 4
Join date : 2011-07-15

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  hang Fri Jul 15, 2011 7:11 pm

engil wrote:I only showed two milestones and two indicators
So it's four attribute junk dimension. I guess it's perfect candidate for junk dimension. Trust me, junk dimension may be a little misleading by its name, but it is a very elegant and effective approach to your problem and SSAS loves junk dimensions from my own experience. I think the problem lies in whether you truly understand the concept which is so unique and different from traditional relational modeling. There are many expert advices on the forum which should give you solid ground, but you need to do your own search on it to fully understand it.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  engil Fri Jul 15, 2011 7:17 pm

I appreciate your reply, but how are you getting that its a 4 attribute junk dimension?

I have been doing a bunch of searching on this topic, but haven't come up with a solution that I understand which is why I posted on this forums which has (as you said) many experts.

I appreciate your taking the time to look at this, but I still am kind of in the dark. I think somewhere in one of the Kimball DW Toolkit books it says it helps to think of dimensions in terms of "I want to look at a BY b BY c" where b and c define your dimensions. I want to look at students BY milestones BY indicators, which is why I thought to model the way I've proposed.

engil

Posts : 4
Join date : 2011-07-15

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  hang Fri Jul 15, 2011 8:33 pm

OK, Here's a good article by Warrent, http://www.kimballgroup.com/html/09dt/DT113CreatingUsingMaintainingJunkDimensions.pdf

Basically, you don't have to put attributes of the same kind into a single junk dimension. You may combine unrelated attributes into a junk dimension as long as they are of low cardinality, and y/n flags are always perfect candidates for junk dimension.

So in your example, Milestones_dim contains two flags, Enrolled for second year: y/n, graduated within 4 years: y/n. Indicators_dim, Completed College Math: y/n, Utilized Writing Center: y/n. Therefore you end up with one junk dimension as follows:

EducationProfile_dim
- EducationProfileKey SK
- EnrolledFor2ndYear
- GradatedIn4Year
- CompletedCollegeMath
- UtilisedWritingCenter

With so few attributes and low cardinality y/n, you may prebuild the dimension with 2x2x2x2=16 records. Think of this dimension as a denormalised dimension with four attributes and put single FK in your fact table by matching four attributes. So when you use it, through a single join all four attributes are at your disposal for filtering, dicing and slicing. With SSAS, just treat it like normal denormalized dimension. Hope this would help.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  engil Sat Jul 16, 2011 3:51 pm

Hang - thanks again for taking so much time helping me out.

I checked out the article you linked to, and it was very informative. I think I actually have something of a grasp on junk dimensions (I've used them before, albeit sparingly) - my problem right now is I think related to my lack of experience with SSAS. Ultimately the issue is that I can't figure out how to take this junk dimension and build it in SSAS in such a way as to allow an end user to run the report I showed below via Excel. Would it require a large amount of work on my part in terms of custom calculations during the cube build process?

Note also that ultimately I will have probably 15 milestones and 15 indicators, which leads to 2^30 rows (1B+) if I were to put them all in the same junk dimension (although of course this is only if I populate every possible combination, if I were to only populate values that actually occur it might be less.

So basically I guess the question is: if I use a single junk dimension to store all of my indicator and milestone combinations, how difficult will it be to build the cube such that I can achieve the report shown in my earlier post (all milestones on x axis, all indicators on y axis, %s in cells).

engil

Posts : 4
Join date : 2011-07-15

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  hang Sat Jul 16, 2011 8:31 pm

engil wrote:Note also that ultimately I will have probably 15 milestones and 15 indicators, which leads to 2^30 rows (1B+)
That's the very reason why you need to leverage junk dimension, otherwise you would end up with a centipede fact table coined by Kimball as a bad example for fact tables. Imagine what pain it is to join 30 times in order to reach those two value attributes instead of just once or twice when using junk dimensions.

However since you could potentially have many more attributes for both milestones and indicators, it might be a good idea to split the junk dimension into two early on, so that they would never go beyond acceptable size. For the purpose of ease, you may start with pre-build dimension, and if the number of attributes grow in future, you may change to incrementally build the junk dimensions so that they only contain combinations that have existed in the fact.

Now it comes the tricky part as in your last requirement for the cube. I think the base model is set and can suffice most reporting requirements, but your cube presentation requires a bit of twist on both base fact and dimension tables. For the cube you may need to produce or configure another fact table for the cell and two dimensions, x and y, using the the base model as the source.

From what I can see, you want to make milestone junk dimension column names to be values of one attribute in one dimension x, and the same arrangement for indicator dimension y. A fact view with query of a few union and case statements on the base fact and dimension tables can cater for that. Briefly, each Y in the junk dimension will add to the view a record with the attribute (column) name as the value. If the performance is not an issue, than the new fact view and dimensions can be achieved at logical level, but you need to know how to configure the fact (degenerate) dimension in SSAS without creating physical dimension tables. Otherwise just materialise the data in the physical tables.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  VHF Mon Jul 18, 2011 5:17 pm

You want to strive for a practical balance. A single junk dimension with 2^30 rows is a bit extreme, but so is having 30 separate dimensions and thus 30 FKs in the fact table.

Just having two junk dimensions (say one for milestones and one for indicators) gives a more reasonable 2^15 for each with only two FKs in the fact table. This would probably be a good approach for you. If you feel that the junk dimensions still have too many rows, you might explore a "middle of the road" design with 4 junk dimensions and 4 FKs.

As far as SSAS is concerned, it doesn't really matter whether you have a single dimension or 30 or somewhere in between as far as slice-and-dice analysis is concerned. The user can use any dimension attribute as a row or column header.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  VHF Mon Jul 18, 2011 6:42 pm

I’ve been giving this some further thought and took another look at your desired output example. The previous discussions have identified some sound design principles, and would give you a cube suitable for “slice and dice” drill-down analysis, and is also excellent for asking “how many” type questions, but I don’t think it will allow generating the type of output you want in a single pivot table query.

In fact, with any of the designs discussed it would take 6 different queries rather one. If you had 15 milestones * 15 indicators this would 225 queries!

When users drag multiple dimension attributes into a row or column header, they naturally cascade. This is usually desirable because “slice and dice” analysis is normally splitting a “whole” into pieces (thus the “slice”!) However it appears you are looking at each “cross section” of milestone and indicator individually, not as part of a whole (which is why your percentages add up to much more than 100% going across or down.)

So in taking another look at your original design, your dimensions would give the rows and columns you want (i.e. milestone as the top-level row header with milestone value as the second-level row header, same thing for columns with indicators):

Milestones_dim:
Key Milestone Value
1 Reenrolled for second year Y
2 Reenrolled for second year N
3 Graduated Within 4 Years Y
4 Graduated Within 4 Years N

indicators_dim:
Key Indicators Value
1 Completed College Math Y
2 Completed College Math N
3 Utilized Writing Center Y
4 Utilized Writing Center N

However, I think we need to look at a different fact table design. Let’s try a couple of factless fact tables (the sample data represents the same information as in your original example):

student_milestones_factless_fact:
student_key milestone_key
1 1
1 4
2 2
2 4

student_indicators_factless_fact:
student_key indicator_key
1 1
1 3
2 2
2 3

These two factless fact tables represent all the data, but don’t give us an easy way to reconcile milestones and indicators in a single cross-tab query. So we create a third factless fact table which is essentially a product of the first two fact tables:

student_success_factless_fact:
student_key milestone_key indicator_key
1 1 1
1 1 3
1 4 1
1 4 3
2 2 2
2 2 3
2 4 2
2 4 3

You would need to populate this fact table with one record for each student for each combination of milestone and indicator. If you had 15 milestones and 15 indicators that would be 225 records per student. With this design you should be able to generate the output you want in a single pivot table, although you’ll need to be careful that the percentages behave correctly.

This design serves a specific purpose and does not lend itself as well to ad-hoc slice-and-dice analysis as the designs discussed previously. There are some dangers here because a simple COUNT will add up to far more than unity—for example, each student that completed college math will have multiple fact records representing that they completed college math—one for each indicator. Given the dangers of double-counting, I’m not sure I would want to turn this model over to users for ad-hoc use.

You might consider implementing multiple models in your DW (with separate SSAS cubes) to meet various objectives.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

Back to top Go down

Modeling question from DW amateur Empty Re: Modeling question from DW amateur

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum