Date Dimension: Representing partial dates/Imputing date values
4 posters
Page 1 of 1
Date Dimension: Representing partial dates/Imputing date values
Hi,
Suppose your use case is to allow for both complete and partial dates, where dates are intentionally obfuscated for privacy reasons.
What is considered best-practice for handling this scenario?
Should a separate dimension be created to handle partial dates where the grain is limited to month or year
or should one date dimension be kept, with partial dates being handled by imputing them to obtain the granularity of a day(for instance APR-2014 would get translated into APR-01-2014)?
Any advice/suggestions are appreciated. Thank you
Suppose your use case is to allow for both complete and partial dates, where dates are intentionally obfuscated for privacy reasons.
What is considered best-practice for handling this scenario?
Should a separate dimension be created to handle partial dates where the grain is limited to month or year
or should one date dimension be kept, with partial dates being handled by imputing them to obtain the granularity of a day(for instance APR-2014 would get translated into APR-01-2014)?
datekey | string_received | imputed_date | quarter |
20140101 | '2014-01-01' | 2014-01-01 | Q1 |
20140100 | '2014-01' | 2014-01-01 | Q1 |
Any advice/suggestions are appreciated. Thank you
tjmaglio- Posts : 2
Join date : 2014-04-10
Re: Date Dimension: Representing partial dates/Imputing date values
If the obfuscated date is supposed to represent a date then a month dimension isn't going to help, particularly if the same fact date has both proper and obfuscated values.
Create a row in the date dimension and populate the date related columns based on how such dates are to be presented to the business.
Create a row in the date dimension and populate the date related columns based on how such dates are to be presented to the business.
Re: Date Dimension: Representing partial dates/Imputing date values
If your facts are at the monthly grain, then I would create a month dimension.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Date Dimension: Representing partial dates/Imputing date values
Thanks for your help.
To be more explicit-the heart of the problem is that there's a mixture of grain unless a partial date is imputed to be at the grain of a day.
There is a real business need to compare data where the date is known with data where only a piece of a date is known.
Ideally, the fact that the date has been imputed and is not what was received should be represented.
The date dimension I have in mind would look like so:
The fact table would be linked to the date most suitable - whether it is the actual date from source data or it is an imputed date.
For practical purposes, there would have to be imputed dates handling the different combinations of unknown date parts up to all three being unknown in which case you're going to be pointing to the unknown date or invalid date key.
Thanks again for any other input- much appreciated.
To be more explicit-the heart of the problem is that there's a mixture of grain unless a partial date is imputed to be at the grain of a day.
There is a real business need to compare data where the date is known with data where only a piece of a date is known.
Ideally, the fact that the date has been imputed and is not what was received should be represented.
The date dimension I have in mind would look like so:
DateKey | FullDateKey | Imputedfulldatekey | ImputedMonthKey | ImputedDayKey | CalendarYear | CalendarQuarter |
20140411 | 4/11/2014 | 20140411 | 4 | 11 | 2014 | Q2 |
20140011 | 20140111 | 1 | 11 | 2014 | Q2 | |
20140400 | 20140401 | 4 | 1 | 2014 | Q2 |
The fact table would be linked to the date most suitable - whether it is the actual date from source data or it is an imputed date.
For practical purposes, there would have to be imputed dates handling the different combinations of unknown date parts up to all three being unknown in which case you're going to be pointing to the unknown date or invalid date key.
Thanks again for any other input- much appreciated.
tjmaglio- Posts : 2
Join date : 2014-04-10
Re: Date Dimension: Representing partial dates/Imputing date values
It is almost Rule 1 of Dimensional modelling that you define the grain of your fact table and then ensure that all measures in that fact have that grain. If you break this rule you will almost certainly get into a bit of a mess.
Am I right in thinking that by "partial date" you do know the month but not the actual day? If this is the case then I can think of two approaches:
1. If you can assign a day to these partial date records that does make business sense and won't skew other reports then do that - e.g. assign them to the first of the month
2. If you can't follow the first approach then create a fact table which holds all your day facts and then create an aggregate table that rolls this up the month level. For your partial date records either include them in this aggregate fact record or put them in their own "monthly aggregate" table and then put a View across both tables to combine them for querying purposes
Am I right in thinking that by "partial date" you do know the month but not the actual day? If this is the case then I can think of two approaches:
1. If you can assign a day to these partial date records that does make business sense and won't skew other reports then do that - e.g. assign them to the first of the month
2. If you can't follow the first approach then create a fact table which holds all your day facts and then create an aggregate table that rolls this up the month level. For your partial date records either include them in this aggregate fact record or put them in their own "monthly aggregate" table and then put a View across both tables to combine them for querying purposes
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Similar topics
» Date dimension multiple dates
» To store dates or a reference to the date dimension?
» Fact dates before begin date of Dimension
» Too many dates on fact - Is there such a thing as Junk Date dimension
» Eliminate Date Dimension Surrogate Key
» To store dates or a reference to the date dimension?
» Fact dates before begin date of Dimension
» Too many dates on fact - Is there such a thing as Junk Date dimension
» Eliminate Date Dimension Surrogate Key
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum