Two Grains for Time
3 posters
Page 1 of 1
Two Grains for Time
I am having trouble with designing the date dimension for a financial data fact table.
We have two different planning approaches. An annual plan, which consists of months, and a strategic plan, which consists of the next five years. The problem is that the lowest granular level is *not* months for the strategic plan but by half for the next year and just full year numbers for the subsequent years.
I really don't want to split out the strategic plan data from my fact table because the end users use both in conjunction with each other very frequently. I also want to stay as close to a star schema as possible.
I've read both the DW Toolkit book and the posts on this forum about country-specific calendars, but haven't found a way to adapt that to my situation. Should I simply make December be my full year number in the case of the strategic plan? The problem with that is the annual plan is in current period format and not YTD.
Thus, for a plan of $100k in sales per month, the user would see for December:
Sales Dec AnnualPlan $100,000
Sales Dec StrategicPlan $1,200,000
This would be avoided by pulling the FY attribute, but must rely on the user to know/remember this.
Is an outrigger the solution? How would that work?
Any advice would be much appreciated!
dimDate
DateKey
Month {1, 2, 3, ...}
MonthAbbr {Jan, Feb, Mar, ...}
ByQuarter {Q1, Q2, ...}
ByHalf {H1, H2}
ByYear {FY}
CalendarYear {2009, 2010, ...}
We have two different planning approaches. An annual plan, which consists of months, and a strategic plan, which consists of the next five years. The problem is that the lowest granular level is *not* months for the strategic plan but by half for the next year and just full year numbers for the subsequent years.
I really don't want to split out the strategic plan data from my fact table because the end users use both in conjunction with each other very frequently. I also want to stay as close to a star schema as possible.
I've read both the DW Toolkit book and the posts on this forum about country-specific calendars, but haven't found a way to adapt that to my situation. Should I simply make December be my full year number in the case of the strategic plan? The problem with that is the annual plan is in current period format and not YTD.
Thus, for a plan of $100k in sales per month, the user would see for December:
Sales Dec AnnualPlan $100,000
Sales Dec StrategicPlan $1,200,000
This would be avoided by pulling the FY attribute, but must rely on the user to know/remember this.
Is an outrigger the solution? How would that work?
Any advice would be much appreciated!
dimDate
DateKey
Month {1, 2, 3, ...}
MonthAbbr {Jan, Feb, Mar, ...}
ByQuarter {Q1, Q2, ...}
ByHalf {H1, H2}
ByYear {FY}
CalendarYear {2009, 2010, ...}
robfb- Posts : 11
Join date : 2010-03-18
Re: Two Grains for Time
The strategic plan contains snapshot numbers, correct? That is to say, they are not summable over time. If the strategic plan number is the same in January as it is in March, you may want to consider increasing the grain of the strategic plan data to monthly, and just repeat the same numbers across months during the applicable period, so it is compatible with the actuals.
Re: Two Grains for Time
Maybe a better example would help, just to be sure I am being clear.
If this was a plain old spreadsheet, the user would input sales dollars (looking strictly at entered values and ignoring all time calculations):
YEAR ZERO
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec
10, 20, 20, 10, 20, 0, 20, 10, 10, 50, 15, 5
YEAR ONE
H1, H2
100, 150
YEAR TWO
FY
400
The numbers would be additive within the year (year one is 100 + 150 = 250), but beyond that year (not FY year one + FY year two). The sub-optimal approach we've done in the past is make Year One H1 = Jun, Year One H2 = Dec, and leave the remaining months zero. This doesn't make it easy for the financial analyst.
Is your suggestion to take Year One H1 and set Jan = Feb = Mar = Apr = May = Jun = 100? If so, I'm afraid that won't work because it would give a calculated H1 to be 600, rather than the correct value of 100.
Or, do you mean take that H1 total and calendarize it evenly across all applicable months (100/6 = 16.67 per month)? I'm concerned with that approach because of the implied precision that does not truly exist in the data.
We've been struggling with this for a long time!
If this was a plain old spreadsheet, the user would input sales dollars (looking strictly at entered values and ignoring all time calculations):
YEAR ZERO
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec
10, 20, 20, 10, 20, 0, 20, 10, 10, 50, 15, 5
YEAR ONE
H1, H2
100, 150
YEAR TWO
FY
400
The numbers would be additive within the year (year one is 100 + 150 = 250), but beyond that year (not FY year one + FY year two). The sub-optimal approach we've done in the past is make Year One H1 = Jun, Year One H2 = Dec, and leave the remaining months zero. This doesn't make it easy for the financial analyst.
Is your suggestion to take Year One H1 and set Jan = Feb = Mar = Apr = May = Jun = 100? If so, I'm afraid that won't work because it would give a calculated H1 to be 600, rather than the correct value of 100.
Or, do you mean take that H1 total and calendarize it evenly across all applicable months (100/6 = 16.67 per month)? I'm concerned with that approach because of the implied precision that does not truly exist in the data.
We've been struggling with this for a long time!
robfb- Posts : 11
Join date : 2010-03-18
Re: Two Grains for Time
A snapshot would not be summable across time (which is usually the case), so I was suggesting to store 100 in each of the first six months. It would be in a separate fact table... so if you were doing March sales, you would sum your sales facts through March and join with the March row in the plan snapshot to get 100.
The only thing that bothers me is the 6 month vrs 12 month plan. You will probably need another measure - number of months - in the snapshot set to either 6 or 12 so you know the time span for that row. You will probably also need an annualized number which, in your example, would be set to 250 (100 + 150) for all months in the first year to calculate progress against plan. In subsequent years it would be the same value as the unadusted (12 month) plan number.
The only thing that bothers me is the 6 month vrs 12 month plan. You will probably need another measure - number of months - in the snapshot set to either 6 or 12 so you know the time span for that row. You will probably also need an annualized number which, in your example, would be set to 250 (100 + 150) for all months in the first year to calculate progress against plan. In subsequent years it would be the same value as the unadusted (12 month) plan number.
Re: Two Grains for Time
The tricky thing is that your planning period varies in length...
Maybe instead of a standard date dimension you need a Planning Period dimension. A row in the Planning Period dimension could represent a month, a half year, or a full year. The planning fact records would then point to the appropriate Planning Period dimension record. Something like this:
This approach resolves the "full year in December" problem and allow planning numbers to aggregate correctly across any time span, bit it does introduce a couple issues of its own:
(1.) It requires a BETWEEN query on the Start Month and End Month
(2.) It complicates comparisons with actual numbers which presumably would be related to a standard date dimension. Not terribly complex, but once again a BETWEEN query is required, and in this case the Planning Period dimension does act like an outrigger.
Maybe instead of a standard date dimension you need a Planning Period dimension. A row in the Planning Period dimension could represent a month, a half year, or a full year. The planning fact records would then point to the appropriate Planning Period dimension record. Something like this:
SK | Description | Year | StartMonth | EndMonth |
1 | 2009 Jan | 2009 | 1 | 1 |
2 | 2009 Feb | 2009 | 2 | 2 |
3 | 2009 Mar | 2009 | 3 | 3 |
4 | 2010 H1 | 2010 | 1 | 6 |
5 | 2010 H2 | 2010 | 7 | 12 |
6 | 2011 Full Year | 2011 | 1 | 12 |
This approach resolves the "full year in December" problem and allow planning numbers to aggregate correctly across any time span, bit it does introduce a couple issues of its own:
(1.) It requires a BETWEEN query on the Start Month and End Month
(2.) It complicates comparisons with actual numbers which presumably would be related to a standard date dimension. Not terribly complex, but once again a BETWEEN query is required, and in this case the Planning Period dimension does act like an outrigger.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Two Grains for Time
Thanks for the suggestion, VHF. I'll have to study this further, since I am a newbie!
robfb- Posts : 11
Join date : 2010-03-18
Re: Two Grains for Time
Sometimes you have to evaluate several designs and explore the impact of each on reporting and analysis.
A more typical budgeting example is where you have sales quotas by month and actual sales by day. In this case, a standard date dimension table where there is one record for each day is used. The actual sales are recorded in the fact table pointing to each day. The sales quotas/budget are recorded in a sales quota fact table once per month, pointing to the same date dimension. Typically the SK (surrogate key) for either the first day of the month or the last day of the month is chosen to represent the whole month and the quota/budget fact record points to that SK for each month.
One can then compare actual sales to quota by month (sum of sales for the month (many records) compared with sum of quotas for the month (in this case, only a single record)). What you cannot do is compare sales to quota at the daily level because the quota isn't that granular. Query results would be misleading at best.
This is similar to your situation. For the year when you store budgets for half a year, you could compare actual-to-budget for a half-year, but not more granularly. Same thing when you budget at the year level. It would be OK to store the budget for the entire year in December, but you can't drill down any more than the year level when comparing actual-to-budget.
Hope that makes sense. Your varying planning periods certainly complicate matters! The design I presented earlier is just one way to handle it.
A more typical budgeting example is where you have sales quotas by month and actual sales by day. In this case, a standard date dimension table where there is one record for each day is used. The actual sales are recorded in the fact table pointing to each day. The sales quotas/budget are recorded in a sales quota fact table once per month, pointing to the same date dimension. Typically the SK (surrogate key) for either the first day of the month or the last day of the month is chosen to represent the whole month and the quota/budget fact record points to that SK for each month.
One can then compare actual sales to quota by month (sum of sales for the month (many records) compared with sum of quotas for the month (in this case, only a single record)). What you cannot do is compare sales to quota at the daily level because the quota isn't that granular. Query results would be misleading at best.
This is similar to your situation. For the year when you store budgets for half a year, you could compare actual-to-budget for a half-year, but not more granularly. Same thing when you budget at the year level. It would be OK to store the budget for the entire year in December, but you can't drill down any more than the year level when comparing actual-to-budget.
Hope that makes sense. Your varying planning periods certainly complicate matters! The design I presented earlier is just one way to handle it.
Last edited by VHF on Wed Apr 28, 2010 11:22 am; edited 2 times in total (Reason for editing : clarity)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Two Grains for Time
I had an idea when I woke up this morning, based on your last post, VHF. One of the other dimensions in my data, called DimGeneration, is the budget plan type (e.g., 2008 monthly plan, 2008 strategic plan, 2009 monthly plan, 2009 strategic plan). Would a design that requires these two dimensions to work in concert together be an orthodox approach?
In other words, I can have a flag field in DimGeneration for strategic plan T/F. When DimGeneration.strategic = T, then only allow DimDate displays at the FY level. When DimGeneration.strategic = F, then allow DimDate to show monthly, quarterly, and all other time displays. This technique works well in Cognos Planning to show/hide time periods based on the budget plan version the user selects from a drop down list.
I'm just curious from a vendor-neutral, optimal dimensional modeling perspective whether this is a good idea or not. I'm working in SQL Server Analysis Services for this project.
I really appreciate your & everyeone else's comments on this forum!
- Rob
In other words, I can have a flag field in DimGeneration for strategic plan T/F. When DimGeneration.strategic = T, then only allow DimDate displays at the FY level. When DimGeneration.strategic = F, then allow DimDate to show monthly, quarterly, and all other time displays. This technique works well in Cognos Planning to show/hide time periods based on the budget plan version the user selects from a drop down list.
I'm just curious from a vendor-neutral, optimal dimensional modeling perspective whether this is a good idea or not. I'm working in SQL Server Analysis Services for this project.
I really appreciate your & everyeone else's comments on this forum!
- Rob
robfb- Posts : 11
Join date : 2010-03-18
Re: Two Grains for Time
Sounds like a brilliant flash of inspiration!
From a vendor-neutral, optimal dimensional modeling (DM) perspective it would be best to have the dimensional model eliminate the possibility of erroneous queries, but in actual practice the BI tool often has an important role to play in keeping the user out of trouble. A general-purpose relational database such SQL Server doesn’t provide the controls necessary to ensure the correct use of the dimensional model, so we need to look to configurable DM-aware BI tools to keep the user on safe ground.
Moving into the DM-aware MOLAP world (such as SSAS) there is the ability for configuration of dimension usage, granularity, etc., so (at least in theory) it should be possible to ensure correct use of the cube no matter what the BI tool. When working with SSAS I generally strive to make sure that if I turn a user loose with Excel against the cube they will always get correct results no matter how they slice-and-dice.
Although SSAS can handle facts of differing granularities out-of-the-box (such as the typical monthly budget vs. daily actual), your scenario is more complicated. It might be possible to come up with a model that would correctly express planning periods of three different granularities over multiple generations, but it might involve three different fact tables. Perhaps a calculated measure in SSAS could be used to pull them together when looking at long range planning at the FY level. Or perhaps a bridge table could pull it off.
Any solution will be somewhat advanced and could be a challenge even for an experienced DW/SSAS professional. Rather than getting overly complex with either the DM or cube configuration, I would at least consider a clean straightforward solution (such as you described) even if it relies on BI tool configuration--just don’t let users connect to the cube with Excel!
Do give some thought to things that could go wrong as users work with the cube (for example, are you effectively preventing users from inadvertently aggregating (summing) across generations?)
There is no one “right” answer. DW is still an immature field, and you don’t have to go very far to come up with situations that are not covered in the established best practices (or even in the advanced Kimball DM class!) Remember that the bottom line is delivering business value, and sometimes that involves choosing the a resonable solution to get the job done. Theory may inform, but practice convinces!
Sorry if I got too far removed from the specifics of your design issue with this post. I was trying to answer your "vendor-neutral, optimal dimensional modeling" question and I may have gotten carried away!
From a vendor-neutral, optimal dimensional modeling (DM) perspective it would be best to have the dimensional model eliminate the possibility of erroneous queries, but in actual practice the BI tool often has an important role to play in keeping the user out of trouble. A general-purpose relational database such SQL Server doesn’t provide the controls necessary to ensure the correct use of the dimensional model, so we need to look to configurable DM-aware BI tools to keep the user on safe ground.
Moving into the DM-aware MOLAP world (such as SSAS) there is the ability for configuration of dimension usage, granularity, etc., so (at least in theory) it should be possible to ensure correct use of the cube no matter what the BI tool. When working with SSAS I generally strive to make sure that if I turn a user loose with Excel against the cube they will always get correct results no matter how they slice-and-dice.
Although SSAS can handle facts of differing granularities out-of-the-box (such as the typical monthly budget vs. daily actual), your scenario is more complicated. It might be possible to come up with a model that would correctly express planning periods of three different granularities over multiple generations, but it might involve three different fact tables. Perhaps a calculated measure in SSAS could be used to pull them together when looking at long range planning at the FY level. Or perhaps a bridge table could pull it off.
Any solution will be somewhat advanced and could be a challenge even for an experienced DW/SSAS professional. Rather than getting overly complex with either the DM or cube configuration, I would at least consider a clean straightforward solution (such as you described) even if it relies on BI tool configuration--just don’t let users connect to the cube with Excel!
Do give some thought to things that could go wrong as users work with the cube (for example, are you effectively preventing users from inadvertently aggregating (summing) across generations?)
There is no one “right” answer. DW is still an immature field, and you don’t have to go very far to come up with situations that are not covered in the established best practices (or even in the advanced Kimball DM class!) Remember that the bottom line is delivering business value, and sometimes that involves choosing the a resonable solution to get the job done. Theory may inform, but practice convinces!
Sorry if I got too far removed from the specifics of your design issue with this post. I was trying to answer your "vendor-neutral, optimal dimensional modeling" question and I may have gotten carried away!
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Similar topics
» Combined Date Time dimension vs. Separate Day and Time dimensions and timezones
» Time dimension design for cross time zone and custom calendar warehouse, 1min granularity
» Date Dimension at Various Grains
» Time Dimension, NULLs and Time datatype
» How to model with a requirement for multiple grains?
» Time dimension design for cross time zone and custom calendar warehouse, 1min granularity
» Date Dimension at Various Grains
» Time Dimension, NULLs and Time datatype
» How to model with a requirement for multiple grains?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum