Null value for dates
+2
gvarga
Jeff Smith
6 posters
Page 1 of 1
Null value for dates
All of my dimension tables have a value -1 that means there was no match to then dimension table. I try to put in some value such as "Unknown" or "Not available".
Any suggestions for the -1 row in a Date dimension? Should I use some future date such as 12/31/9999? And if I use 12/31/9999, should I continue to use -1 as the default value? I tend to use the Date Dimension Key when caluculating the number of days between 2 dates on a fact, such as the number of days between the Date of Service and the Paid Date.
Any suggestions for the -1 row in a Date dimension? Should I use some future date such as 12/31/9999? And if I use 12/31/9999, should I continue to use -1 as the default value? I tend to use the Date Dimension Key when caluculating the number of days between 2 dates on a fact, such as the number of days between the Date of Service and the Paid Date.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Null value for dates
I would suggest also the „Unknown” dimension row instead of the future dummy date!
Try to conciliate with your business partners what to do: if 1 or 2 dates are unknown, how to calculate the days between the Service date and the Paid date ( For instance it could be 0 ?)
Try to conciliate with your business partners what to do: if 1 or 2 dates are unknown, how to calculate the days between the Service date and the Paid date ( For instance it could be 0 ?)
gvarga- Posts : 43
Join date : 2010-12-15
Re: Null value for dates
Another concept that can help is to have a pair of dates, a default low date 1-Jan-1000 and a default high date 31-Dec-9999.
If an "open date" was missing you can use the default low date, if a "close date" is missing you can use the default high date.
This should of course be paired with an audit dimension to record the fact that you have used default dates in place of "real" dates.
Ross
If an "open date" was missing you can use the default low date, if a "close date" is missing you can use the default high date.
This should of course be paired with an audit dimension to record the fact that you have used default dates in place of "real" dates.
Ross
If the date is unknown...
I normally use an integer representation of dates in the format yyyymmdd (earliest date for me is 19000101 = Jan 1, 1900 and 99991231 = Dec 31, 9999).So for a business transaction say "Service End Date", I insert 99991231 as the date_key while an event that happened in the past that I dont know will have a date_key of 19000101. Calculating intervals still works although I would question why we expect to calcuate say "duration of contract" if the contract itself is still active..., just saying. I would also be interested in what your users think from the user experience point of view.
Ross' suggestion of inserting an audit record for "unreal dates" is very sensible too.
rademola
Ross' suggestion of inserting an audit record for "unreal dates" is very sensible too.
rademola
rademola- Posts : 9
Join date : 2010-12-21
Re: Null value for dates
Why would you calculate intervals using the key?
While it is common to use a known key for a date dimension (such as YYYYMMDD) it should always be treated as a surrogate (i.e. meaningless) from the point of view of an application. Intervals should always be based on attributes of the date. This avoids problems when it is necessary to generate unknown or error keys when dealing with unknown or invalid dates from source data.
The only 'smart' use of a known date key value is for defining partitions in the physical database. Such use is transparent to the end-user application.
While it is common to use a known key for a date dimension (such as YYYYMMDD) it should always be treated as a surrogate (i.e. meaningless) from the point of view of an application. Intervals should always be based on attributes of the date. This avoids problems when it is necessary to generate unknown or error keys when dealing with unknown or invalid dates from source data.
The only 'smart' use of a known date key value is for defining partitions in the physical database. Such use is transparent to the end-user application.
Re: Null value for dates
Yes date intervals should only be calculated from date fields. Thanks for pointing that out ngalemmo...
And in a telco I worked at in the past, our call detail records table was partitioned as you explained.
And in a telco I worked at in the past, our call detail records table was partitioned as you explained.
rademola- Posts : 9
Join date : 2010-12-21
Re: Null value for dates
Thanks for the advice about how to define the surrogates for the date dimension. I'm fine with the way it's done. If you want to use YYYYMMDD, then by all means use it. I prefer to do it a different way. It works perfectly. Yeah , it might be a little easier when creating partitions but not much - not really.
In any case, getting back to the original question - if you use yyyymmmdd for the dimension key, what would you use for an unknown date?
In any case, getting back to the original question - if you use yyyymmmdd for the dimension key, what would you use for an unknown date?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Null value for dates
On our dummy row for the calendar I have a default date of 1/1/1900. It could equally have been some far future date - the aim was to have something visible to users that would be immediately apparent as not right. However, I have been considering changing this to be a null date. My rationale for this is that a user running a query and filtering on date (e.g. < 31.12.2009) but not displaying the date on the report would be picking this rows up although they may not belong in that time frame. By using a null default date this situation will not occur, and it also better represents the state of the data. For some snapshots, the event date will not yet be known. I work for a mail order firm, so there is a delay between order capture and parcel despatch. Having a despatch date of null (rather than 1900) means undespatched parcels are not erroneously picked up on a query - it also means that the users would be able to actively select undespatched parcels by specifying a null despatch date (although for reference we do have a parcel despatch status attribute!).
I need to run this past a sample of business users first to make sure they understand the changes, but for me a null date seems the most appropriate option.
I need to run this past a sample of business users first to make sure they understand the changes, but for me a null date seems the most appropriate option.
Guest- Guest
Re: Null value for dates
Jeff Smith wrote:In any case, getting back to the original question - if you use yyyymmmdd for the dimension key, what would you use for an unknown date?
I use standard practice of assigning a surrogate value. So unknown and invalid date instances are assigned low integer values. To handle invalid dates, I usually have multiple natural keys in the date dimension defined as VARCHAR and containing strings of the date in the various formats being received from different sources. Date key assignment from a source does not require actually interpreting the source data... improper date values do not cause a problem. The simply wind up as inferred rows in the date dimension with a low key value for cleanup at a later time.
Re: Null value for dates
ngalemmo, The source system only lists Termination Dates for members that have terminated. For members that have not terminated, there is no date. In systems that you've designed, what does the user see in the Termination Date when they have not yet termed? Would leave it as a NULL or would you plug in some distant date such as 12/31/2999?
I'm not referring to bad dates. I'm referring to dates that are purposely left NULL in the source system.
I'm not referring to bad dates. I'm referring to dates that are purposely left NULL in the source system.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Null value for dates
Jeff Smith wrote:Yeah , it might be a little easier when creating partitions but not much - not really.
Table partitioning is important feature to data warehousing which speeds up the performance on very large table dramatically.
An effective technique, namely Sliding Window Partitioning, is commonly used in ETL process when you need to drop portion of the fact data based on a time series and load fresh source. However automating partition management is not a trivial task and having purely meaningless date key can only make the process far more complicated.
You use whatever integer value that you would use for meaningless surrogate date key, e.g. 0, -1 or -999, as it is just an integer anyway. Some people call yyyymmdd integer date key as smart surrogate key, which I think makes perfect sense. But remember, date dimension is the ONLY exception that allows to use smart surrogate key.Jeff Smith wrote:Yeah In any case, getting back to the original question - if you use yyyymmmdd for the dimension key, what would you use for an unknown date?
12/31/2999 or 12/31/9999, it does not matter, as long as it is in far future and not a normal business date. As ngalemmo said, it is other attributes or natural keys that give the meaning for termination, not the date key. If you like, you may even have NULL in some descriptive attributes in your date dimension, although not recommended. But you must not have NULL date key value in the fact table. as it is RI violation which is not allowed in both dimensional and relational modeling. In SQL server SSAS, you can't even process your cube if you violate this fundamental principle.Jeff Smith wrote:Would leave it as a NULL or would you plug in some distant date such as 12/31/2999
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» NULL DATES IN FACTS
» Null values in facts, yes or no?
» Fact with eff / exp dates referencing dimension with eff / exp dates
» the master detail tables facts
» High % of NULL's - yet they want it!
» Null values in facts, yes or no?
» Fact with eff / exp dates referencing dimension with eff / exp dates
» the master detail tables facts
» High % of NULL's - yet they want it!
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum