Date Dimension
Page 1 of 1 • Share •
Date Dimension
Hi All,
I understood Kimball way of implementation Date Dimension table. In a way, the lowest granularity date represent each record, with the following month, quarter and year attribute. However, what happen if the lowest grain of certain fact are at date level? They could be at monthly level, so is it right to tag the fact to the date dimension table using the first day of the month (e.g. 01-12-2008 for Dec 2009)?
If not, can anyone tell me what is the best strategy?
Best Regards,
Mark
I understood Kimball way of implementation Date Dimension table. In a way, the lowest granularity date represent each record, with the following month, quarter and year attribute. However, what happen if the lowest grain of certain fact are at date level? They could be at monthly level, so is it right to tag the fact to the date dimension table using the first day of the month (e.g. 01-12-2008 for Dec 2009)?
If not, can anyone tell me what is the best strategy?
Best Regards,
Mark

mark.tan- Posts: 14
Join date: 2009-02-04
Re: Date Dimension
What I do is add separate flags to the date dimension to tag rows as 'month', 'quarter' and 'year' rows, the use the natural key (month/year, quarter/year or year) and flag to locate them rather than hard-coding logic to manipulate the date to set the key. So, there would be one such row each month tagged as the monthly row (I use the last day of the month), one row per quarter as the quarterly row and so forth. These flags would be in addition to such things as 'end-of-month', 'end-of-quarter' and 'end-of-year' flags as they are semantically different, even though they would be set to the same values. This ensures a consistant keying of summaries and snapshots and moves any logic for key selection to the one-of process that builds the dimension.

ngalemmo- Posts: 215
Join date: 2009-05-16
Location: Los Angeles

Re: Date Dimension
Can you elaborate on why you would have two separate flags for end-of-month and month? How are they semantically different?
I have a scenario where the Sales Targets are stored at a monthly level. The goal is to show:
1. Month-to-Date Target = (Monthly Target*Working Days Elapsed in Month)/Total Working days in Month
2. Month Daily Target = (Month-to-Date Target)/Working Days Elapsed in Month
Should I add a row per day per month in the Targets fact and store the Monthly Target, Month-to-Date target and Month Daily Targets?
Or
Should I add one row per month in the fact table to store the monthly target and then calculate the other two on the fly?
I have a scenario where the Sales Targets are stored at a monthly level. The goal is to show:
1. Month-to-Date Target = (Monthly Target*Working Days Elapsed in Month)/Total Working days in Month
2. Month Daily Target = (Month-to-Date Target)/Working Days Elapsed in Month
Should I add a row per day per month in the Targets fact and store the Monthly Target, Month-to-Date target and Month Daily Targets?
Or
Should I add one row per month in the fact table to store the monthly target and then calculate the other two on the fly?
Irtaza Hassan- Posts: 7
Join date: 2009-10-19
Re: Date Dimension
One flag is used to identify the row that serves as the key for monhtly, quarterly or annual aggregations, it would be used internally by the ETL process to assign surrogate keys. The other is a flag used in queries to identify the business definition of month, quarter or year end. These may be different or may change depending on how the business views the calendar. There could be multiple sets of business use flags for different purposes. For example, accounting may view things according to the standard calendar, while manufacturing may be working on a week based calendar that crosses month, quarter and year boundaries.
In such mixed use situations, you may decide to use the 15th of each month as your surrogate key for aggregations, so that, regardless of calendar interpretation, the aggregate key will fall within the correct period. The actual date/key used is not important, so long as the attributes referenced by the key are appropriate for the aggregated row.
To your other question, I would have a sales target fact table with one row per month. The date dimension would contain attributes such as number of working days in the month and number of working days remaining in the month. As for the rest, either calculate month-to-date on the fly or build an accumulating snapshot (or even a daily snapshot) for the m-t-d data. That would depend on your data volumes and performance goals.
In such mixed use situations, you may decide to use the 15th of each month as your surrogate key for aggregations, so that, regardless of calendar interpretation, the aggregate key will fall within the correct period. The actual date/key used is not important, so long as the attributes referenced by the key are appropriate for the aggregated row.
To your other question, I would have a sales target fact table with one row per month. The date dimension would contain attributes such as number of working days in the month and number of working days remaining in the month. As for the rest, either calculate month-to-date on the fly or build an accumulating snapshot (or even a daily snapshot) for the m-t-d data. That would depend on your data volumes and performance goals.

ngalemmo- Posts: 215
Join date: 2009-05-16
Location: Los Angeles

Re: Date Dimension
I would recommend building a shrunken dimension for month; that's one row per month and is built directly from your date table (remove all your daily level attributes, then group by the monthly and above attributes). You'd join your monthly fact table to this shrunken dimension based on the monthly surrogate key. Any month-to-date facts can be stored in your monthly fact table row, including dates.
Kimball wrote an article about 10 years ago or so on this concept; I don't immediately recall the url but a Google search should return it. If not let me know and I'll try to dig it up.
If you have a daily fact table it would be attached to your normal date dimension (grain of one row per day). Obviously you'll need some sort of aggregate navigation strategy to make this invisible to your end users. Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.
Kimball wrote an article about 10 years ago or so on this concept; I don't immediately recall the url but a Google search should return it. If not let me know and I'll try to dig it up.
If you have a daily fact table it would be attached to your normal date dimension (grain of one row per day). Obviously you'll need some sort of aggregate navigation strategy to make this invisible to your end users. Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.
_________________
Brian Jarrett
Those who can make you believe absurdities can make you commit atrocities. -Voltaire

BrianJarrett- Posts: 44
Join date: 2009-02-03
Age: 35
Location: St. Louis, MO
Re: Date Dimension
Thank You ngalemmo and Brian! I think I'll use the shrunken month dimension.
Irtaza Hassan- Posts: 7
Join date: 2009-10-19
Re: Date Dimension
BrianJarrett wrote: Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.
Maybe this works for aggregate navigation, but a mini-dimension does not work if you need to combine facts at different time grains. I did a quick test with BOBJ. I copied the date dimension and linked the copy to a table with quarterly headcount budgets. Another table, containing monthly hire counts, remained associated with the original date dimension. The Universe now contains two sets of date attributes coming from two different tables. From BOBJ's point of view, these are different things. When I create a query selecting hire counts and head count bugdet, I need to select both quarter columns from each of the two date tables. The result is two separate reports in WEBI. Whereas, if both fact tables shared the same date dimension, you only select a single quarter object and BOBJ will produce a single report with hire counts summarized by quarter.

ngalemmo- Posts: 215
Join date: 2009-05-16
Location: Los Angeles

Re: Date Dimension
Hi ngalemmo and all,
Thanks for the discussion. I think this is a very useful thread...
I notice your small test and I was doing a few myself. The reason why 2 reports was generated when using 2 date dimension tables is because the 2 tables was not link probably? Since headcount budget and headcount hired are in quarterly and monthly grain, I would assume I will have a quarter and month table (relationship is 1 quarter has one-to-many months). If I were to drag quarter attribute and the 2 measures, I think I should have 1 report, right?
However, that would mean my design is trying to snowflake a date dimension. Not sure what will be the implication...
Best Regards,
Mark
Thanks for the discussion. I think this is a very useful thread...
I notice your small test and I was doing a few myself. The reason why 2 reports was generated when using 2 date dimension tables is because the 2 tables was not link probably? Since headcount budget and headcount hired are in quarterly and monthly grain, I would assume I will have a quarter and month table (relationship is 1 quarter has one-to-many months). If I were to drag quarter attribute and the 2 measures, I think I should have 1 report, right?
However, that would mean my design is trying to snowflake a date dimension. Not sure what will be the implication...
Best Regards,
Mark

mark.tan- Posts: 14
Join date: 2009-02-04
Re: Date Dimension
Hi ngalemmo and all,
After doing some more testing and research, I feel that if the 2 facts are of the different grain, you can never use a single SQL query parse to fetch data. I think this is what Universe term it as fan/chasm trap.
I tried using a single dimension date table (simple table containing month and year) only. This is how it looks.
(Dim_Month)
month_sid year_id monthyear_desc
200201 2002 Jan 2002
200202 2002 Feb 2002
200203 2002 Mar 2002
200204 2002 Apr 2002
I have 2 fact tables contain actual (monthly) and target (yearly) values. This is how they look.
(Fact_Actual)
month_sid actual
200201 10
200202 20
200301 30
(Fact_Target)
year_id target
2002 100
2003 120
2004 140
If we were to bring in the 2 measures from 2 fact tables and group them by yearly level, this is the SQL parse we will get.
SELECT a.year_id, SUM(actual) actual, SUM(target) target
FROM [dim_month] a,
[fact_actual] b,
[fact_target] c
WHERE a.month_sid = b.month_sid
AND a.year_id = c.year_id
GROUP BY a.year_id
And the query result is:
year_id actual target
2002 30 200
2003 30 120
If you notice, the 2002 (target) value is not so correct because there are 2 transactional records at the monthly level.
Hence, no matter whether it is 1 or 2 date dimension tables, the key is that the tools can be configured to ensure those fan/chasm trap will not arised.
Best Regards,
Mark
After doing some more testing and research, I feel that if the 2 facts are of the different grain, you can never use a single SQL query parse to fetch data. I think this is what Universe term it as fan/chasm trap.
I tried using a single dimension date table (simple table containing month and year) only. This is how it looks.
(Dim_Month)
month_sid year_id monthyear_desc
200201 2002 Jan 2002
200202 2002 Feb 2002
200203 2002 Mar 2002
200204 2002 Apr 2002
I have 2 fact tables contain actual (monthly) and target (yearly) values. This is how they look.
(Fact_Actual)
month_sid actual
200201 10
200202 20
200301 30
(Fact_Target)
year_id target
2002 100
2003 120
2004 140
If we were to bring in the 2 measures from 2 fact tables and group them by yearly level, this is the SQL parse we will get.
SELECT a.year_id, SUM(actual) actual, SUM(target) target
FROM [dim_month] a,
[fact_actual] b,
[fact_target] c
WHERE a.month_sid = b.month_sid
AND a.year_id = c.year_id
GROUP BY a.year_id
And the query result is:
year_id actual target
2002 30 200
2003 30 120
If you notice, the 2002 (target) value is not so correct because there are 2 transactional records at the monthly level.
Hence, no matter whether it is 1 or 2 date dimension tables, the key is that the tools can be configured to ensure those fan/chasm trap will not arised.
Best Regards,
Mark

mark.tan- Posts: 14
Join date: 2009-02-04
Re: Date Dimension
BOBJ would not do one query as you would have established contexts for each fact table and its immediate dimensions (avoiding loops, chasms, et al). When combining data from two facts, there would be two contexts and, hense, two queries. Each query will be executed and, if each query shares the same dimension objects(each within its context), BOBJ will merge the two result sets into one.
If you create separate dimension tables for different time grains, a similar attribute (say quarter in a month dimension and quarter in a quarter dimension) cannot be the same object... so BOBJ will not merge them.
So, in your example, DIM_MONTH and FACT_ACTUAL would have its own context and DIM_MONTH and FACT_TARGET would be in another. FACT_TARGET would have an appropriate foreign key to a single row in DIM_MONTH (say 200212 for 2002, 200312 for 2003 and so on). From WEBI, when you query the two, BOBJ will create two queries:
SELECT a.year_id, SUM(actual) actual
FROM [dim_month] a,
[fact_actual] b
WHERE a.month_sid = b.month_sid
GROUP BY a.year_id
SELECT a.year_id, SUM(target) target
FROM [dim_month] a,
[fact_target] c
WHERE a.month_sid = c.month_sid
GROUP BY a.year_id
It would then merge the two result sets on year. (By the way, if you happen to be using Crystal Reports through a Universe, this won't work. The Crystal-Universe interface currently has a 'feature' that causes CR to fail if the query generates more than one SELECT statement.)
If you create separate dimension tables for different time grains, a similar attribute (say quarter in a month dimension and quarter in a quarter dimension) cannot be the same object... so BOBJ will not merge them.
So, in your example, DIM_MONTH and FACT_ACTUAL would have its own context and DIM_MONTH and FACT_TARGET would be in another. FACT_TARGET would have an appropriate foreign key to a single row in DIM_MONTH (say 200212 for 2002, 200312 for 2003 and so on). From WEBI, when you query the two, BOBJ will create two queries:
SELECT a.year_id, SUM(actual) actual
FROM [dim_month] a,
[fact_actual] b
WHERE a.month_sid = b.month_sid
GROUP BY a.year_id
SELECT a.year_id, SUM(target) target
FROM [dim_month] a,
[fact_target] c
WHERE a.month_sid = c.month_sid
GROUP BY a.year_id
It would then merge the two result sets on year. (By the way, if you happen to be using Crystal Reports through a Universe, this won't work. The Crystal-Universe interface currently has a 'feature' that causes CR to fail if the query generates more than one SELECT statement.)

ngalemmo- Posts: 215
Join date: 2009-05-16
Location: Los Angeles

Permissions of this forum:
You cannot reply to topics in this forum





