Static facts in dimensions?

sman Tue Jan 10, 2012 4:16 am

Hello,
I am somewhat new to dimensional modelling and need a bit of advice for the following business scenario.

Business functions
A university accepts applications for "research grants" in various subject areas.
Accepted applications go through 8 different "processing stages" (eg. New, Validated, Accepted, Incomplete, Complete, Withdrawn, Rejected, Unknown). Withdrawn\Rejected\Accepted are the end stages. Applications can arrive at the same stage multiple times.
The whole journey through processing stages can take days to years.
Each submitted application has a "requested grant value" ($ value).
Each approved application has an "approved grant value", which can be different to the requested value.

Queries
Analyse total processing time by "subject area", submitted date\month\year.
Analyse requested grant amounts by "subject area", submitted date\month\year.
Analyse applications in each processing stage (by subject area)
Analyse approved grants amounts

The questions I have are

1. Is "Requested grant amount" a dimension (because it is static) or a fact (because sum/avg operations are to be done)?
2. Can "Requested grant amount" and "Approved grant amount" be placed in the same fact table? They have the same granularity, but different event-times.
3. Should "processing stages" be a separate dimension or attributes in application dimension?
4. How to record applications changing their processing stages, so that we can analyse time spent in each stage?

Any insight is much appreciated.
sman

BoxesAndLines Tue Jan 10, 2012 9:40 am

1. yes
2. yes (accumulating snapshot fact)
3. see accumulating snapshot. Each stage is simply identified by a date and optionally a binary metric indicating that the stage has occurred.
4. Subtract the time period between any two stage dates

sman Tue Jan 10, 2012 6:51 pm

1. Is "Requested grant amount" a dimension (because it is static) or a fact (because sum/avg operations are to be done)?

2. Accumulating Snapshot fact will look like;

Code:: <Surrogate keys to dims> ... New_Start_Date_Key New_End_Data_Key New_Days .. <repeat for all 8 stages> .. Total_Start_Date_key Total_End_Date_Key Total_Days Requested Amount Approved Amount

Thanks BoxesAndLines for your input.

ngalemmo Tue Jan 10, 2012 10:31 pm

The requested and approved grant values are measures and belong in a fact table.

The issue is you are trying to put two facts into one table. Since you are modelling the process of approving a grant each decision or action during that process is a business event (request, review, denial. etc...). So, at the atomic level you track the events... date, grant requestor, officer/staff involved, decision/action, etc... and value as a measure.

The second fact is an aggregation of these atomic facts. This can be implemented as a view or materialized as a table. It would be derived from the atomic table by selecting the request and approval actions. This could be bound by date should there be a need to restrict to the earliest or latest event of that type.

sman Thu Jan 12, 2012 1:56 am

Thanks Ngalemmo.

How do I link the other dimensions (ie. Geography, Subject Area etc) to the star schema? They don't change when an Application changes stages while being processed. So, should they be modeled as outriggers to Application dimension? Or, should they be attributes in the Application Dimension?

Is it normal to have a dimension that has the same granularity as a fact table?
What I mean is that, if we move out "Requested Amount" and "Approved Amount" to a fact table, then the fact table will have same granularity as the Application dimension.

Regards

ngalemmo Thu Jan 12, 2012 3:04 am

All dimensions are referenced by foreign keys from the fact table. So other dimensions (geography, etc...) would appear as foreign key references. It doesn't matter if they don't change from one action to the next.

It is not normal to have a dimension that has a near 1:1 correspondance to a fact table. You should consider breaking the application dimension down into usable (ie conforming) components, such as applicant, and so forth.

sman Thu Jan 12, 2012 3:13 am

ngalemmo,

It is not normal to have a dimension that has a near 1:1 correspondance to a fact table. You should consider breaking the application dimension down into usable (ie conforming) components, such as applicant, and so forth.

A given Application has one and only one "Requested Grant Amount" and an "Approved Grant Amount", so the relationship is 1:1.

So, if those two values are moved out from Application Dim to a fact table, that fact table will have the same 1:1 grain as the Application Dimension. How to avoid this?

The users don't analyse data "by applicants". They analyse/group "dollar amounts" by geography, subject area, Application etc.
For a given Application, geography and subject area are static.

BoxesAndLines Thu Jan 12, 2012 10:42 am

If you are modeling a workflow process, you don't need both start and end dates. Start date should be sufficient. I also don't normally store the day counts between the stages or total day counts since these are easily calculated on the fly and it avoids performing updates every day just to add 1 to all the active applications. I'm not sure why you are focused on whether an attribute is static or not. This has no implication on dimensional modeling or relational modeling for that matter. What attributes of application do users want to analyze facts by? Those are the attributes you should look to move to other dimensions to avoid 1-1 relationship to the fact table. It may be that you still end up with a 1-1, but the smaller the dimension, the better the performance.

ngalemmo Thu Jan 12, 2012 10:41 pm

The users don't analyse data "by applicants". They analyse/group "dollar amounts" by geography, subject area, Application etc.
For a given Application, geography and subject area are static.

What are the attributes of an application? For example, when modeling sales orders, to it rare to have an order dimension table. Attributes of the order header, such as customer, dates, addresses and so on, are represented by separate dimensions. The only thing left is usually the order number which is stored as a degenerate dimension value on the fact table. This is done to avoid near 1:1 correspondence with facts, because queries using such relationships generally perform very poorly. It also provides more useable dimensions to conform with other facts.

As B&L points out, static or dynamic makes no difference. In fact, most of the time, measures are static, and if they are not, often techniques are used (such as delta based fact tables) to make them so.

sman Thu Jan 12, 2012 11:01 pm

B&L and Ngalemmo,

1. Application dimensions corresponds to the paper-based application for grants.
It has attributes like "Project Title", "Project Description", "Subject Area", "Federal Government Funded Flag", "State Government Funded Flag", etc, and "Requested Amount" & "Approved Amount". A lot of these attributes are textual descriptors rather than strict dimensions. What I mean is that, two applications have the same "Project Title", but they are not the same project. Subject Area is, however, picked from a list of values prepared by the department.

2. If we move "Requested Amount" to the Application fact table (transaction fact), then it will look like

Code:: App Key, Processing Stage Key, Requested Amount,....<date keys> 1, 1, $485.00,... 1, 2, $485.00,... 1, 3, $485.00,... 1, 5, $485.00,... 1, 7, $485.00,... 3, 1, $1473.00,... 3, 3, $1473.00,... 3, 2, $1473.00,... 3, 7, $1473.00,... 3, 4, $1473.00,...

Note that the static "Requested Amount" is appearing on the fact table that records events of an application progressing, because as far as application processing is concerned, "Requested Amount" does not change. So, sum of "Requested Amount" from this fact table is not useful because of the duplicates.

BoxesAndLines Thu Jan 12, 2012 11:06 pm

Aha! You're not doing a transaction fact, you are doing an accumulating snapshot. At least that's what I told you to do. ;-)

Sponsored content

Static facts in dimensions?

Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?

Re: Static facts in dimensions?