Date Dimension
+3
Irtaza Hassan
ngalemmo
mark.tan
7 posters
Page 1 of 1
Date Dimension
Hi All,
I understood Kimball way of implementation Date Dimension table. In a way, the lowest granularity date represent each record, with the following month, quarter and year attribute. However, what happen if the lowest grain of certain fact are at date level? They could be at monthly level, so is it right to tag the fact to the date dimension table using the first day of the month (e.g. 01-12-2008 for Dec 2009)?
If not, can anyone tell me what is the best strategy?
Best Regards,
Mark
I understood Kimball way of implementation Date Dimension table. In a way, the lowest granularity date represent each record, with the following month, quarter and year attribute. However, what happen if the lowest grain of certain fact are at date level? They could be at monthly level, so is it right to tag the fact to the date dimension table using the first day of the month (e.g. 01-12-2008 for Dec 2009)?
If not, can anyone tell me what is the best strategy?
Best Regards,
Mark
mark.tan- Posts : 14
Join date : 2009-02-04
Re: Date Dimension
What I do is add separate flags to the date dimension to tag rows as 'month', 'quarter' and 'year' rows, the use the natural key (month/year, quarter/year or year) and flag to locate them rather than hard-coding logic to manipulate the date to set the key. So, there would be one such row each month tagged as the monthly row (I use the last day of the month), one row per quarter as the quarterly row and so forth. These flags would be in addition to such things as 'end-of-month', 'end-of-quarter' and 'end-of-year' flags as they are semantically different, even though they would be set to the same values. This ensures a consistant keying of summaries and snapshots and moves any logic for key selection to the one-of process that builds the dimension.
Re: Date Dimension
Can you elaborate on why you would have two separate flags for end-of-month and month? How are they semantically different?
I have a scenario where the Sales Targets are stored at a monthly level. The goal is to show:
1. Month-to-Date Target = (Monthly Target*Working Days Elapsed in Month)/Total Working days in Month
2. Month Daily Target = (Month-to-Date Target)/Working Days Elapsed in Month
Should I add a row per day per month in the Targets fact and store the Monthly Target, Month-to-Date target and Month Daily Targets?
Or
Should I add one row per month in the fact table to store the monthly target and then calculate the other two on the fly?
I have a scenario where the Sales Targets are stored at a monthly level. The goal is to show:
1. Month-to-Date Target = (Monthly Target*Working Days Elapsed in Month)/Total Working days in Month
2. Month Daily Target = (Month-to-Date Target)/Working Days Elapsed in Month
Should I add a row per day per month in the Targets fact and store the Monthly Target, Month-to-Date target and Month Daily Targets?
Or
Should I add one row per month in the fact table to store the monthly target and then calculate the other two on the fly?
Irtaza Hassan- Posts : 7
Join date : 2009-10-19
Re: Date Dimension
One flag is used to identify the row that serves as the key for monhtly, quarterly or annual aggregations, it would be used internally by the ETL process to assign surrogate keys. The other is a flag used in queries to identify the business definition of month, quarter or year end. These may be different or may change depending on how the business views the calendar. There could be multiple sets of business use flags for different purposes. For example, accounting may view things according to the standard calendar, while manufacturing may be working on a week based calendar that crosses month, quarter and year boundaries.
In such mixed use situations, you may decide to use the 15th of each month as your surrogate key for aggregations, so that, regardless of calendar interpretation, the aggregate key will fall within the correct period. The actual date/key used is not important, so long as the attributes referenced by the key are appropriate for the aggregated row.
To your other question, I would have a sales target fact table with one row per month. The date dimension would contain attributes such as number of working days in the month and number of working days remaining in the month. As for the rest, either calculate month-to-date on the fly or build an accumulating snapshot (or even a daily snapshot) for the m-t-d data. That would depend on your data volumes and performance goals.
In such mixed use situations, you may decide to use the 15th of each month as your surrogate key for aggregations, so that, regardless of calendar interpretation, the aggregate key will fall within the correct period. The actual date/key used is not important, so long as the attributes referenced by the key are appropriate for the aggregated row.
To your other question, I would have a sales target fact table with one row per month. The date dimension would contain attributes such as number of working days in the month and number of working days remaining in the month. As for the rest, either calculate month-to-date on the fly or build an accumulating snapshot (or even a daily snapshot) for the m-t-d data. That would depend on your data volumes and performance goals.
Re: Date Dimension
I would recommend building a shrunken dimension for month; that's one row per month and is built directly from your date table (remove all your daily level attributes, then group by the monthly and above attributes). You'd join your monthly fact table to this shrunken dimension based on the monthly surrogate key. Any month-to-date facts can be stored in your monthly fact table row, including dates.
Kimball wrote an article about 10 years ago or so on this concept; I don't immediately recall the url but a Google search should return it. If not let me know and I'll try to dig it up.
If you have a daily fact table it would be attached to your normal date dimension (grain of one row per day). Obviously you'll need some sort of aggregate navigation strategy to make this invisible to your end users. Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.
Kimball wrote an article about 10 years ago or so on this concept; I don't immediately recall the url but a Google search should return it. If not let me know and I'll try to dig it up.
If you have a daily fact table it would be attached to your normal date dimension (grain of one row per day). Obviously you'll need some sort of aggregate navigation strategy to make this invisible to your end users. Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Date Dimension
Thank You ngalemmo and Brian! I think I'll use the shrunken month dimension.
Irtaza Hassan- Posts : 7
Join date : 2009-10-19
Re: Date Dimension
BrianJarrett wrote: Business Objects, for example, has built-in support for shrunken dimensions and aggregate navigation within their universes. I'd be willing to bet other products also support aggregate navigation out of the box.
Maybe this works for aggregate navigation, but a mini-dimension does not work if you need to combine facts at different time grains. I did a quick test with BOBJ. I copied the date dimension and linked the copy to a table with quarterly headcount budgets. Another table, containing monthly hire counts, remained associated with the original date dimension. The Universe now contains two sets of date attributes coming from two different tables. From BOBJ's point of view, these are different things. When I create a query selecting hire counts and head count bugdet, I need to select both quarter columns from each of the two date tables. The result is two separate reports in WEBI. Whereas, if both fact tables shared the same date dimension, you only select a single quarter object and BOBJ will produce a single report with hire counts summarized by quarter.
Re: Date Dimension
Hi ngalemmo and all,
Thanks for the discussion. I think this is a very useful thread...
I notice your small test and I was doing a few myself. The reason why 2 reports was generated when using 2 date dimension tables is because the 2 tables was not link probably? Since headcount budget and headcount hired are in quarterly and monthly grain, I would assume I will have a quarter and month table (relationship is 1 quarter has one-to-many months). If I were to drag quarter attribute and the 2 measures, I think I should have 1 report, right?
However, that would mean my design is trying to snowflake a date dimension. Not sure what will be the implication...
Best Regards,
Mark
Thanks for the discussion. I think this is a very useful thread...
I notice your small test and I was doing a few myself. The reason why 2 reports was generated when using 2 date dimension tables is because the 2 tables was not link probably? Since headcount budget and headcount hired are in quarterly and monthly grain, I would assume I will have a quarter and month table (relationship is 1 quarter has one-to-many months). If I were to drag quarter attribute and the 2 measures, I think I should have 1 report, right?
However, that would mean my design is trying to snowflake a date dimension. Not sure what will be the implication...
Best Regards,
Mark
mark.tan- Posts : 14
Join date : 2009-02-04
Re: Date Dimension
Hi ngalemmo and all,
After doing some more testing and research, I feel that if the 2 facts are of the different grain, you can never use a single SQL query parse to fetch data. I think this is what Universe term it as fan/chasm trap.
I tried using a single dimension date table (simple table containing month and year) only. This is how it looks.
(Dim_Month)
month_sid year_id monthyear_desc
200201 2002 Jan 2002
200202 2002 Feb 2002
200203 2002 Mar 2002
200204 2002 Apr 2002
I have 2 fact tables contain actual (monthly) and target (yearly) values. This is how they look.
(Fact_Actual)
month_sid actual
200201 10
200202 20
200301 30
(Fact_Target)
year_id target
2002 100
2003 120
2004 140
If we were to bring in the 2 measures from 2 fact tables and group them by yearly level, this is the SQL parse we will get.
SELECT a.year_id, SUM(actual) actual, SUM(target) target
FROM [dim_month] a,
[fact_actual] b,
[fact_target] c
WHERE a.month_sid = b.month_sid
AND a.year_id = c.year_id
GROUP BY a.year_id
And the query result is:
year_id actual target
2002 30 200
2003 30 120
If you notice, the 2002 (target) value is not so correct because there are 2 transactional records at the monthly level.
Hence, no matter whether it is 1 or 2 date dimension tables, the key is that the tools can be configured to ensure those fan/chasm trap will not arised.
Best Regards,
Mark
After doing some more testing and research, I feel that if the 2 facts are of the different grain, you can never use a single SQL query parse to fetch data. I think this is what Universe term it as fan/chasm trap.
I tried using a single dimension date table (simple table containing month and year) only. This is how it looks.
(Dim_Month)
month_sid year_id monthyear_desc
200201 2002 Jan 2002
200202 2002 Feb 2002
200203 2002 Mar 2002
200204 2002 Apr 2002
I have 2 fact tables contain actual (monthly) and target (yearly) values. This is how they look.
(Fact_Actual)
month_sid actual
200201 10
200202 20
200301 30
(Fact_Target)
year_id target
2002 100
2003 120
2004 140
If we were to bring in the 2 measures from 2 fact tables and group them by yearly level, this is the SQL parse we will get.
SELECT a.year_id, SUM(actual) actual, SUM(target) target
FROM [dim_month] a,
[fact_actual] b,
[fact_target] c
WHERE a.month_sid = b.month_sid
AND a.year_id = c.year_id
GROUP BY a.year_id
And the query result is:
year_id actual target
2002 30 200
2003 30 120
If you notice, the 2002 (target) value is not so correct because there are 2 transactional records at the monthly level.
Hence, no matter whether it is 1 or 2 date dimension tables, the key is that the tools can be configured to ensure those fan/chasm trap will not arised.
Best Regards,
Mark
mark.tan- Posts : 14
Join date : 2009-02-04
Re: Date Dimension
BOBJ would not do one query as you would have established contexts for each fact table and its immediate dimensions (avoiding loops, chasms, et al). When combining data from two facts, there would be two contexts and, hense, two queries. Each query will be executed and, if each query shares the same dimension objects(each within its context), BOBJ will merge the two result sets into one.
If you create separate dimension tables for different time grains, a similar attribute (say quarter in a month dimension and quarter in a quarter dimension) cannot be the same object... so BOBJ will not merge them.
So, in your example, DIM_MONTH and FACT_ACTUAL would have its own context and DIM_MONTH and FACT_TARGET would be in another. FACT_TARGET would have an appropriate foreign key to a single row in DIM_MONTH (say 200212 for 2002, 200312 for 2003 and so on). From WEBI, when you query the two, BOBJ will create two queries:
SELECT a.year_id, SUM(actual) actual
FROM [dim_month] a,
[fact_actual] b
WHERE a.month_sid = b.month_sid
GROUP BY a.year_id
SELECT a.year_id, SUM(target) target
FROM [dim_month] a,
[fact_target] c
WHERE a.month_sid = c.month_sid
GROUP BY a.year_id
It would then merge the two result sets on year. (By the way, if you happen to be using Crystal Reports through a Universe, this won't work. The Crystal-Universe interface currently has a 'feature' that causes CR to fail if the query generates more than one SELECT statement.)
If you create separate dimension tables for different time grains, a similar attribute (say quarter in a month dimension and quarter in a quarter dimension) cannot be the same object... so BOBJ will not merge them.
So, in your example, DIM_MONTH and FACT_ACTUAL would have its own context and DIM_MONTH and FACT_TARGET would be in another. FACT_TARGET would have an appropriate foreign key to a single row in DIM_MONTH (say 200212 for 2002, 200312 for 2003 and so on). From WEBI, when you query the two, BOBJ will create two queries:
SELECT a.year_id, SUM(actual) actual
FROM [dim_month] a,
[fact_actual] b
WHERE a.month_sid = b.month_sid
GROUP BY a.year_id
SELECT a.year_id, SUM(target) target
FROM [dim_month] a,
[fact_target] c
WHERE a.month_sid = c.month_sid
GROUP BY a.year_id
It would then merge the two result sets on year. (By the way, if you happen to be using Crystal Reports through a Universe, this won't work. The Crystal-Universe interface currently has a 'feature' that causes CR to fail if the query generates more than one SELECT statement.)
General question independant from date
I find the the solution from ngalemmo using "...flags to the date dimension to tag rows as 'month', 'quarter' and 'year' rows,..." very helpful.
I always have the problem having a mix of
- fact tables at different grain (e.g. target data at month level, operational data at daily level),
- aggregates (e.g. aggregation of the same daily operational data at the month level) and
- combining all these fact tables in one report - i.e. showing target and actual data in the same report.
The solution of ngalemmo with the flags makes creating these reports quite easy and straigth forward.
My question, however, to ngalemmo:
Do you use this "flag-method" with the date dimension only?
Or do you use this method consistently in the DW - i.e. you do never create new physical shrunken dimensions?
I am asking because the same problems with day and month in the date-dimension are reoccuring very frequently with other dimensions- e.g. having target data at product brand level but having the operationtal data at the specific product level. So the problems within the BI tools also reoccur again.
Can you share your thoughts with me on that?
I always have the problem having a mix of
- fact tables at different grain (e.g. target data at month level, operational data at daily level),
- aggregates (e.g. aggregation of the same daily operational data at the month level) and
- combining all these fact tables in one report - i.e. showing target and actual data in the same report.
The solution of ngalemmo with the flags makes creating these reports quite easy and straigth forward.
My question, however, to ngalemmo:
Do you use this "flag-method" with the date dimension only?
Or do you use this method consistently in the DW - i.e. you do never create new physical shrunken dimensions?
I am asking because the same problems with day and month in the date-dimension are reoccuring very frequently with other dimensions- e.g. having target data at product brand level but having the operationtal data at the specific product level. So the problems within the BI tools also reoccur again.
Can you share your thoughts with me on that?
z00t- Posts : 2
Join date : 2010-09-17
Re: Date Dimension
ngalemmo, is your date dimension basically a ragged hierarchy? In a cube with data at different "date levels" (single days, entire weeks, entire months, etc), does the data only show up when it hits the approriate level?
Does each level of the hierarchy have a column referencing the appropriate Date Dimension Surrogate Key? For example, is there a Year Month Date Key that has the same Date Surrogate Key all of the Date Level Rows for January 2010, the same for Quarters and Years or do you just use 201001 for January 2010, 20101 for Q1 2010, and 2010 for the year 2010?
Would the Year Month Rows would have nulls in the columns for the Individual days, the Quarter Year Rows would have nulls in the columns for Days and Months, etc.?
Does each level of the hierarchy have a column referencing the appropriate Date Dimension Surrogate Key? For example, is there a Year Month Date Key that has the same Date Surrogate Key all of the Date Level Rows for January 2010, the same for Quarters and Years or do you just use 201001 for January 2010, 20101 for Q1 2010, and 2010 for the year 2010?
Would the Year Month Rows would have nulls in the columns for the Individual days, the Quarter Year Rows would have nulls in the columns for Days and Months, etc.?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Date Dimension
With the way I do date dimensions, the flags are primarily for ETL purposes to locate the appropriate row to assign FK's to facts at a higher grain. The table itself is simply one row per day... no extra 'month' or 'year' rows. You could flag any row in a month as the 'month row', which one you choose would depend on how the business would like to see the date when doing a monthly report (most of the time they want the last day of the month displayed, so you would flag that row as the month row).
Integration is through attributes. So, if you have a day level fact and a month level fact, you combine the two through attributes in the date table.
If you have a separate month table, you could run into problems depending on the particular BI tool you are using. If you are doing a query that combines daily level data with monthly data, you would, of course, summarize the daily data by month. The problem is, the month attribute for daily data is a different column in a different table than the month attribute in the month dimension table. Some BI tools will not recognize this as the same thing and you would need to explicity combine the data in the report. If they are the same column in the same table, all the BI tools would recognize this as a conformed dimension and join the data automatically.
As for other situations, such as Brand summaries mentioned, the proper solution gets fuzzy. Dates are easy. You could consider adding brand rows to the item dimension if that is the only thing you need to be concerned about, but I would tend to lean more towards a traditional hierarchy approach.
Integration is through attributes. So, if you have a day level fact and a month level fact, you combine the two through attributes in the date table.
If you have a separate month table, you could run into problems depending on the particular BI tool you are using. If you are doing a query that combines daily level data with monthly data, you would, of course, summarize the daily data by month. The problem is, the month attribute for daily data is a different column in a different table than the month attribute in the month dimension table. Some BI tools will not recognize this as the same thing and you would need to explicity combine the data in the report. If they are the same column in the same table, all the BI tools would recognize this as a conformed dimension and join the data automatically.
As for other situations, such as Brand summaries mentioned, the proper solution gets fuzzy. Dates are easy. You could consider adding brand rows to the item dimension if that is the only thing you need to be concerned about, but I would tend to lean more towards a traditional hierarchy approach.
Re: Date Dimension
I would suggest to always stick to ANSI standard as long as it is supported by the DB vendor. So the above script can be rewritten much more meaningfully with desired result as follows:mark.tan wrote:SELECT a.year_id, SUM(actual) actual, SUM(target) target
FROM [dim_month] a,
[fact_actual] b,
[fact_target] c
WHERE a.month_sid = b.month_sid
AND a.year_id = c.year_id
GROUP BY a.year_id
SELECT tgt.year_id, COALESCE(act.actual, 0) as actual, tgt.target
FROM fact_target tgt
LEFT JOIN
(
SELECT m.year_id, SUM(actual) actual
FROM fact_actual a
JOIN dim_month m
ON a.month_sid = m.month_sid
GROUP BY m.year_id
) act
ON act.year_id = tgt.year_id
Notice ANSI standard COALESCE and LEFT JOIN are supported by major RDBMS vendors, eg. Oracle and SQL Server, unless you are still using very old version. I am assuming you have one row target in the fact_target for every year so that you don't miss any year by LEFT JOIN and you don't need to do another GROUP BY subquery on fact_target either.
From conformance stand point, the month dimension must be a perfect subset of attributes from a date dimension. This is also true for any other type of aggregate dimension in relation to its base dimension, where the only possible exception is its surrogate key.ngalemmo wrote:The problem is, the month attribute for daily data is a different column in a different table than the month attribute in the month dimension table.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Date Dimension
hang wrote:From conformance stand point, the month dimension must be a perfect subset of attributes from a date dimension. This is also true for any other type of aggregate dimension in relation to its base dimension, where the only possible exception is its surrogate key.
Well, yes, I presumed that was the case. The 'problem' has to do with how various BI tools interpret conformance when combining facts at different grains. They cannot assume things are the same if the value are the same, but all will assume conformance when both facts reference the same columns in the same dimension tables. This is an issue with relational sources and not cubes.
In a cube a value is as good as a name. So, if you write SQL to combine two facts and use that to load a cube (or if both sets of facts exist as cubes), the notion of tables and columns goes away. Having the same value for the month is all you need to worry about.
Similar topics
» Date Dimension: Representing partial dates/Imputing date values
» Always link date fields to Date Dimension?
» Eliminate Date Dimension Surrogate Key
» Using the Date Dimension for Summary Fact Tables or new specialized Month Dimension?
» Question about using date dimension keys in other dimension tables
» Always link date fields to Date Dimension?
» Eliminate Date Dimension Surrogate Key
» Using the Date Dimension for Summary Fact Tables or new specialized Month Dimension?
» Question about using date dimension keys in other dimension tables
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum