Date Dimension with "Latest Data"
3 posters
Page 1 of 1
Date Dimension with "Latest Data"
Hi - I just want to run this by anyone to see if I'm missing anything, or if there are better ways to do any of this...
I'm collecting KPI information into a dimensional database for reporting purposes. The indicators might be things like:
So some items are positional (i.e. a snapshot at a date) and others are cumulative (added up over a time period). A Positional/Cumulative flag is stored on the Indicator dimension.
The data is also collected at different periods: monthly / quarterly / annually.
I basically need to create a report card for any given year. The report card needs to show the latest data for that year, say 2010. So if for Indicator A we have monthly data from February only, and Indicator B we have data for Q1, Q2 & Q3, the report needs to show this correctly: i.e. if Indicator A is positional it must show Feb's data, and if it's cumulative it must show Feb's data too; if Indicator B is positional it must show Q3 and if it's cumulative it must show SUM(Q1,Q2,Q3).
Another requirement is the report card must show the latest date for each indicator. So it should show for Indicator A the value "February 2010", and for Indicator B it must show "Q3 2010".
My understanding of proper date dimension modelling would be to create a table with fields like so:
So let's say 28 Feb 2010 would have the bit flag MonthFlag = 1, and the value for Indicator A would be loaded here. 31 March 2010 would have the QuarterFlag bit = 1, and Q1 data loaded here.
The ETL would then have to load the appropriate data (depending on Cumulative/Positional) into 31 Dec 2010 (YearFlag = 1).
Now the report would only ever be interested in FACTs where YearFlag = 1, but the historic data is stored for other purposes.
Lastly, in the FACT table, I would have a "SourceDate" column, and the ETL would load the appropriate value here, and then the report would have it available to display...
Is that ok? Any gaping holes or shortcuts or potential pitfalls you can see?
Thanks in advance...
Sean
I'm collecting KPI information into a dimensional database for reporting purposes. The indicators might be things like:
- % Budget Spent
- % Employees with Disabilities
- Approved Organogram
- Number of Council Meetings Held
- Level of Programme Management
So some items are positional (i.e. a snapshot at a date) and others are cumulative (added up over a time period). A Positional/Cumulative flag is stored on the Indicator dimension.
The data is also collected at different periods: monthly / quarterly / annually.
I basically need to create a report card for any given year. The report card needs to show the latest data for that year, say 2010. So if for Indicator A we have monthly data from February only, and Indicator B we have data for Q1, Q2 & Q3, the report needs to show this correctly: i.e. if Indicator A is positional it must show Feb's data, and if it's cumulative it must show Feb's data too; if Indicator B is positional it must show Q3 and if it's cumulative it must show SUM(Q1,Q2,Q3).
Another requirement is the report card must show the latest date for each indicator. So it should show for Indicator A the value "February 2010", and for Indicator B it must show "Q3 2010".
My understanding of proper date dimension modelling would be to create a table with fields like so:
- DateID
- Date
- Month
- Quarter
- Year
- MonthFlag
- QuarterFlag
- YearFlag
So let's say 28 Feb 2010 would have the bit flag MonthFlag = 1, and the value for Indicator A would be loaded here. 31 March 2010 would have the QuarterFlag bit = 1, and Q1 data loaded here.
The ETL would then have to load the appropriate data (depending on Cumulative/Positional) into 31 Dec 2010 (YearFlag = 1).
Now the report would only ever be interested in FACTs where YearFlag = 1, but the historic data is stored for other purposes.
Lastly, in the FACT table, I would have a "SourceDate" column, and the ETL would load the appropriate value here, and then the report would have it available to display...
Is that ok? Any gaping holes or shortcuts or potential pitfalls you can see?
Thanks in advance...
Sean
seanw- Posts : 3
Join date : 2011-04-12
Re: Date Dimension with "Latest Data"
Sorry, but can you be a little clearer?
Are you thinking of loading measure data in your Date Dimension? If so, why would you do that?
Are you thinking of loading measure data in your Date Dimension? If so, why would you do that?
Re: Date Dimension with "Latest Data"
Hi John
No, I'm not going to load measures into the date dimension.
I guess there are 2 things I'm struggling with:
1. I have these indicators, and I might have values for one in January, another in Jan+Feb+Mar or just Feb+Mar, another for Q1, another for the Year, etc. When I run a report for a year, I want the latest picture for all indicators. So if the indicator is cumulative (eg: a sales figure), the report must sums all the values for the year. If it's positional (eg: a stock figure), it must take only the latest one in the year. Perhaps this isn't a table design issue, but a query issue: my report query needs to do a nested select to get the latest (Top 1 by Date DESC) for the year - if positional; or a SUM() on all facts in the year - if cumulative.
2. When I run this query, I also need to show which is the latest fact date for each indicator. If I'm storing Quarter- and Month- and Year-level facts together and linking to the Date dimension at the appropriate day (using boolean flags on the date records to indicate Last Day of Month, Last Day of Quarter, Last Day of Year) - then there is no way of telling date granularity (if the fact is originally from a month/quarter/year) unless I store a value in the fact table specifying which date type it is (i.e. Quarter/Month/Year). I currently won't receive facts at different granularities for the same indicator, so this shouldn't be a problem.
But maybe a better solution for (1) is to store the facts in 3 different fact tables: factMonth, factQuarter & factYear. Then when the ETL loads a record into say the factMonth table, it also needs to update the factYear table (if it's the latest positional fact for a year, or if it's a cumulative fact), so that the factYear table always has the latest picture. Then I don't need to have a nested query to get the latest data. I would still need an extra column on the factYear table to store the "latest date type" value for (2).
Does that make more sense? Just wondering which option is better. (2) seems more correct as I'm not mixing granularities but (1) seems simpler as there are less tables/updating.
No, I'm not going to load measures into the date dimension.
I guess there are 2 things I'm struggling with:
1. I have these indicators, and I might have values for one in January, another in Jan+Feb+Mar or just Feb+Mar, another for Q1, another for the Year, etc. When I run a report for a year, I want the latest picture for all indicators. So if the indicator is cumulative (eg: a sales figure), the report must sums all the values for the year. If it's positional (eg: a stock figure), it must take only the latest one in the year. Perhaps this isn't a table design issue, but a query issue: my report query needs to do a nested select to get the latest (Top 1 by Date DESC) for the year - if positional; or a SUM() on all facts in the year - if cumulative.
2. When I run this query, I also need to show which is the latest fact date for each indicator. If I'm storing Quarter- and Month- and Year-level facts together and linking to the Date dimension at the appropriate day (using boolean flags on the date records to indicate Last Day of Month, Last Day of Quarter, Last Day of Year) - then there is no way of telling date granularity (if the fact is originally from a month/quarter/year) unless I store a value in the fact table specifying which date type it is (i.e. Quarter/Month/Year). I currently won't receive facts at different granularities for the same indicator, so this shouldn't be a problem.
But maybe a better solution for (1) is to store the facts in 3 different fact tables: factMonth, factQuarter & factYear. Then when the ETL loads a record into say the factMonth table, it also needs to update the factYear table (if it's the latest positional fact for a year, or if it's a cumulative fact), so that the factYear table always has the latest picture. Then I don't need to have a nested query to get the latest data. I would still need an extra column on the factYear table to store the "latest date type" value for (2).
Does that make more sense? Just wondering which option is better. (2) seems more correct as I'm not mixing granularities but (1) seems simpler as there are less tables/updating.
seanw- Posts : 3
Join date : 2011-04-12
Re: Date Dimension with "Latest Data"
It’s important that you don’t mix grains in a single fact table, but it’s critical that you don’t mix different types of facts in a single fact table. It seems to me you try to mix up transaction grain fact and periodic snapshot fact in a single fact table. You should split them up into different fact tables. The transaction grain fact is additive along the time series while the periodic snapshot fact is not along time series but additive along other hierarchies.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Date Dimension with "Latest Data"
Thanks hang. That sounds like good advice. It just hurts because it's a small project with a small budget, and now I'll have at least 6 fact tables (2 types x 3 grains) instead of one. I'm not using SSRS/SSAS/BObjects over this so I'm handcoding all queries & ETLs; 6 tables is going to hurt!
What other "types" of facts are there - just the 2 (additive vs snapshot) or others I should be aware of?
What other "types" of facts are there - just the 2 (additive vs snapshot) or others I should be aware of?
seanw- Posts : 3
Join date : 2011-04-12
Re: Date Dimension with "Latest Data"
If you need 6 fact tables, you don't have options to reduce them into 2 or 3 and I don't think the amount of work will be reduced if you try to force them into fewer tables by mixing up grains or types. On the other hand it would only lead to flawed design if it goes against best practice.
However, it looks like your base fact grain is monthly, unless there is a daily transaction fact that you have not mentioned. In theory, the other fact tables can be built based on the monthly fact table, and you have them at physical level only for performance reason. I would recommand to use view to dynamically produce higher level aggregates and see if there is any performance issue, you may find out the performance is still quite reasonable, and therefore you may not need so many fact tables, saving the work on ETL process. you would have no options but create aggregate fact tables if the performance is not acceptable. Anyway, loading aggregate tables is the simplest part in ETL, just inserting those "group by" views into physical tables.
However, it looks like your base fact grain is monthly, unless there is a daily transaction fact that you have not mentioned. In theory, the other fact tables can be built based on the monthly fact table, and you have them at physical level only for performance reason. I would recommand to use view to dynamically produce higher level aggregates and see if there is any performance issue, you may find out the performance is still quite reasonable, and therefore you may not need so many fact tables, saving the work on ETL process. you would have no options but create aggregate fact tables if the performance is not acceptable. Anyway, loading aggregate tables is the simplest part in ETL, just inserting those "group by" views into physical tables.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Eliminate Date Dimension Surrogate Key
» Date Dimension refresh for all data marts nightly ??
» scd2 effective date, end date data type
» Date Dimension: Representing partial dates/Imputing date values
» Always link date fields to Date Dimension?
» Date Dimension refresh for all data marts nightly ??
» scd2 effective date, end date data type
» Date Dimension: Representing partial dates/Imputing date values
» Always link date fields to Date Dimension?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum