Date Dimension

Post new topic   Reply to topic

View previous topic View next topic Go down

Date Dimension

Post  mark.tan on Sat Oct 10, 2009 12:34 am

Hi All,

I understood Kimball way of implementation Date Dimension table. In a way, the lowest granularity date represent each record, with the following month, quarter and year attribute. However, what happen if the lowest grain of certain fact are at date level? They could be at monthly level, so is it right to tag the fact to the date dimension table using the first day of the month (e.g. 01-12-2008 for Dec 2009)?

If not, can anyone tell me what is the best strategy?

Best Regards,
Mark

mark.tan

Posts: 14
Join date: 2009-02-04

View user profile

Back to top Go down

Re: Date Dimension

Post  ngalemmo on Tue Oct 13, 2009 8:37 am

What I do is add separate flags to the date dimension to tag rows as 'month', 'quarter' and 'year' rows, the use the natural key (month/year, quarter/year or year) and flag to locate them rather than hard-coding logic to manipulate the date to set the key. So, there would be one such row each month tagged as the monthly row (I use the last day of the month), one row per quarter as the quarterly row and so forth. These flags would be in addition to such things as 'end-of-month', 'end-of-quarter' and 'end-of-year' flags as they are semantically different, even though they would be set to the same values. This ensures a consistant keying of summaries and snapshots and moves any logic for key selection to the one-of process that builds the dimension.

ngalemmo

Posts: 215
Join date: 2009-05-16
Location: Los Angeles

View user profile http://dimensionaldw.com

Back to top Go down

Re: Date Dimension

Post  Irtaza Hassan on Thu Oct 22, 2009 7:31 am

Can you elaborate on why you would have two separate flags for end-of-month and month? How are they semantically different?

I have a scenario where the Sales Targets are stored at a monthly level. The goal is to show:

1. Month-to-Date Target = (Monthly Target*Working Days Elapsed in Month)/Total Working days in Month

2. Month Daily Target = (Month-to-Date Target)/Working Days Elapsed in Month

Should I add a row per day per month in the Targets fact and store the Monthly Target, Month-to-Date target and Month Daily Targets?

Or

Should I add one row per month in the fact table to store the monthly target and then calculate the other two on the fly?

Irtaza Hassan

Posts: 7
Join date: 2009-10-19

View user profile

Back to top Go down

Re: Date Dimension

Post  ngalemmo on Thu Oct 22, 2009 8:31 am

One flag is used to identify the row that serves as the key for monhtly, quarterly or annual aggregations, it would be used internally by the ETL process to assign surrogate keys. The other is a flag used in queries to identify the business definition of month, quarter or year end. These may be different or may change depending on how the business views the calendar. There could be multiple sets of business use flags for different purposes. For example, accounting may view things according to the standard calendar, while manufacturing may be working on a week based calendar that crosses month, quarter and year boundaries.

In such mixed use situations, you may decide to use the 15th of each month as your surrogate key for aggregations, so that, regardless of calendar interpretation, the aggregate key will fall within the correct period. The actual date/key used is not important, so long as the attributes referenced by the key are appropriate for the aggregated row.

To your other question, I would have a sales target fact table with one row per month. The date dimension would contain attributes such as number of working days in the month and number of working days remaining in the month. As for the rest, either calculate month-to-date on the fly or build an accumulating snapshot (or even a daily snapshot) for the m-t-d data. That would depend on your data volumes and performance goals.

ngalemmo

Posts: 215
Join date: 2009-05-16
Location: Los Angeles

View user profile http://dimensionaldw.com

Back to top Go down

Re: Date Dimension

Post  BrianJarrett on Thu Oct 22, 2009 6:52 pm

I would recommend building a shrunken dimension for month; that's one row per month and is built directly from your date table (remove all your daily level attributes, then group by the monthly and above attributes). You'd join your monthly fact table to this shrunken dimension based on the monthly surrogate key. Any month-to-date facts can be stored in your monthly fact table row, including dates.

Kimball wrote an article about 10 years ago or so on this concept; I don't immediately recall the url but a Google search should return it. If not let me know and I'll try to dig it up.

If you have a daily fact table it would be attached to your normal date dimension (grain of one row per day). Obviously you'll need some sort of aggregate navigation strategy to make this invisible to your end users. Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.

_________________
Brian Jarrett

Those who can make you believe absurdities can make you commit atrocities. -Voltaire

BrianJarrett

Posts: 44
Join date: 2009-02-03
Age: 35
Location: St. Louis, MO

View user profile

Back to top Go down

Re: Date Dimension

Post  Irtaza Hassan on Sun Oct 25, 2009 9:15 am

Thank You ngalemmo and Brian! I think I'll use the shrunken month dimension.

Irtaza Hassan

Posts: 7
Join date: 2009-10-19

View user profile

Back to top Go down

Re: Date Dimension

Post  ngalemmo on Fri Oct 30, 2009 8:07 am

BrianJarrett wrote: Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.


Maybe this works for aggregate navigation, but a mini-dimension does not work if you need to combine facts at different time grains. I did a quick test with BOBJ. I copied the date dimension and linked the copy to a table with quarterly headcount budgets. Another table, containing monthly hire counts, remained associated with the original date dimension. The Universe now contains two sets of date attributes coming from two different tables. From BOBJ's point of view, these are different things. When I create a query selecting hire counts and head count bugdet, I need to select both quarter columns from each of the two date tables. The result is two separate reports in WEBI. Whereas, if both fact tables shared the same date dimension, you only select a single quarter object and BOBJ will produce a single report with hire counts summarized by quarter.

ngalemmo

Posts: 215
Join date: 2009-05-16
Location: Los Angeles

View user profile http://dimensionaldw.com

Back to top Go down

Re: Date Dimension

Post  mark.tan on Mon Nov 02, 2009 6:51 pm

Hi ngalemmo and all,

Thanks for the discussion. I think this is a very useful thread...

I notice your small test and I was doing a few myself. The reason why 2 reports was generated when using 2 date dimension tables is because the 2 tables was not link probably? Since headcount budget and headcount hired are in quarterly and monthly grain, I would assume I will have a quarter and month table (relationship is 1 quarter has one-to-many months). If I were to drag quarter attribute and the 2 measures, I think I should have 1 report, right?

However, that would mean my design is trying to snowflake a date dimension. Not sure what will be the implication...

Best Regards,
Mark

mark.tan

Posts: 14
Join date: 2009-02-04

View user profile

Back to top Go down

Re: Date Dimension

Post  mark.tan on Mon Nov 02, 2009 7:43 pm

Hi ngalemmo and all,

After doing some more testing and research, I feel that if the 2 facts are of the different grain, you can never use a single SQL query parse to fetch data. I think this is what Universe term it as fan/chasm trap.

I tried using a single dimension date table (simple table containing month and year) only. This is how it looks.

(Dim_Month)
month_sid year_id monthyear_desc
200201 2002 Jan 2002
200202 2002 Feb 2002
200203 2002 Mar 2002
200204 2002 Apr 2002

I have 2 fact tables contain actual (monthly) and target (yearly) values. This is how they look.

(Fact_Actual)
month_sid actual
200201 10
200202 20
200301 30

(Fact_Target)
year_id target
2002 100
2003 120
2004 140

If we were to bring in the 2 measures from 2 fact tables and group them by yearly level, this is the SQL parse we will get.

SELECT a.year_id, SUM(actual) actual, SUM(target) target
FROM [dim_month] a,
[fact_actual] b,
[fact_target] c
WHERE a.month_sid = b.month_sid
AND a.year_id = c.year_id
GROUP BY a.year_id

And the query result is:
year_id actual target
2002 30 200
2003 30 120

If you notice, the 2002 (target) value is not so correct because there are 2 transactional records at the monthly level.

Hence, no matter whether it is 1 or 2 date dimension tables, the key is that the tools can be configured to ensure those fan/chasm trap will not arised.

Best Regards,
Mark

mark.tan

Posts: 14
Join date: 2009-02-04

View user profile

Back to top Go down

Re: Date Dimension

Post  ngalemmo on Mon Nov 02, 2009 9:19 pm

BOBJ would not do one query as you would have established contexts for each fact table and its immediate dimensions (avoiding loops, chasms, et al). When combining data from two facts, there would be two contexts and, hense, two queries. Each query will be executed and, if each query shares the same dimension objects(each within its context), BOBJ will merge the two result sets into one.

If you create separate dimension tables for different time grains, a similar attribute (say quarter in a month dimension and quarter in a quarter dimension) cannot be the same object... so BOBJ will not merge them.

So, in your example, DIM_MONTH and FACT_ACTUAL would have its own context and DIM_MONTH and FACT_TARGET would be in another. FACT_TARGET would have an appropriate foreign key to a single row in DIM_MONTH (say 200212 for 2002, 200312 for 2003 and so on). From WEBI, when you query the two, BOBJ will create two queries:

SELECT a.year_id, SUM(actual) actual
FROM [dim_month] a,
[fact_actual] b
WHERE a.month_sid = b.month_sid
GROUP BY a.year_id

SELECT a.year_id, SUM(target) target
FROM [dim_month] a,
[fact_target] c
WHERE a.month_sid = c.month_sid
GROUP BY a.year_id

It would then merge the two result sets on year. (By the way, if you happen to be using Crystal Reports through a Universe, this won't work. The Crystal-Universe interface currently has a 'feature' that causes CR to fail if the query generates more than one SELECT statement.)

ngalemmo

Posts: 215
Join date: 2009-05-16
Location: Los Angeles

View user profile http://dimensionaldw.com

Back to top Go down

View previous topic View next topic Back to top


Permissions of this forum:
You cannot reply to topics in this forum