Dates as NULLS in Fact Table
+3
Jeff Smith
VHF
abbi
7 posters
Page 1 of 1
Dates as NULLS in Fact Table
I have inherited a Dimensional model that uses SSAS to build the cube. It has been designed as having a Date Dimension where the primary key is the date field. All of the existing fact tables have an event date which is not nullable. The event date is a FK to the Date Dimension. This seems to work fine until now. I have a new Fact table that has a date field, Expiration date, that can contain nulls. I understand the best way to design this is to have a surrogate key for the Date Dimension instead of using the business key, date. This way I can have a number of dates such as invalid date, unknown date, etc. But this was not how it was designed.
I think I only have three choices, none of which I don't like. Please let me know if there is a better one.
1) Redesign the Dimensional model which means:
a) creating a surrogate key for the date dimension
b) add unknown field to date dimension
b) changing all of our fact tables to have a FK point to the surrogate key of the date dimension.
c) changing all of our programs and reports to always join to the Date Dimension... (I don't think so)
2) Keep the new Fact table as is and not be able to join on the expiration date.
(not happy)
3) Add a surrogate key for the date dimension but none of the tables join to the surrogate key except for the new table.
I would have to regenerate the entire date dimension since the date field will now be a nullable field.... for the unknown.
I'm new to SSAS and I wonder if I would have to change the cube for the existing Fact tables that no longer are pointing at the Date Key but at the nonkey column.
(maybe but don't know all of the implications this may cause)
Please advise on the best way to handle this dilemma.
Thanks in advance!
Abbi
I think I only have three choices, none of which I don't like. Please let me know if there is a better one.
1) Redesign the Dimensional model which means:
a) creating a surrogate key for the date dimension
b) add unknown field to date dimension
b) changing all of our fact tables to have a FK point to the surrogate key of the date dimension.
c) changing all of our programs and reports to always join to the Date Dimension... (I don't think so)
2) Keep the new Fact table as is and not be able to join on the expiration date.
(not happy)
3) Add a surrogate key for the date dimension but none of the tables join to the surrogate key except for the new table.
I would have to regenerate the entire date dimension since the date field will now be a nullable field.... for the unknown.
I'm new to SSAS and I wonder if I would have to change the cube for the existing Fact tables that no longer are pointing at the Date Key but at the nonkey column.
(maybe but don't know all of the implications this may cause)
Please advise on the best way to handle this dilemma.
Thanks in advance!
Abbi
abbi- Posts : 8
Join date : 2010-10-11
Re: Dates as NULLS in Fact Table
What about
4) Add a record your date dimension to represent a null date. The key would need to be a valid date such as 12/31/2999.
Although this might not be the first-choice design solution, it sounds like a practical minimal-impact fit for the existing design. In fact, you could define several date values to handle invalid date, unknown date, etc. Not quite as pretty as an integer SK, but still fully functional.
4) Add a record your date dimension to represent a null date. The key would need to be a valid date such as 12/31/2999.
Although this might not be the first-choice design solution, it sounds like a practical minimal-impact fit for the existing design. In fact, you could define several date values to handle invalid date, unknown date, etc. Not quite as pretty as an integer SK, but still fully functional.
Last edited by VHF on Thu Apr 14, 2011 4:37 pm; edited 2 times in total (Reason for editing : typo; clarity)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Dates as NULLS in Fact Table
VHF's suggestion is the way I would go.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dates as NULLS in Fact Table
Not every date attribute in fact table has to point to the date dimension, especially when the attribute is part of effective dated fact. Setting 12/31/9999 for an open end date is a simple and not bad solution in your case. There are a lot of arguments around the date attributes in fact and dimension tables and they all seem to be reasonable. My approach is, take it easy and do what you feel is easy and natural for writing the query efficiently. Since date attribute has such unique place in dimensional modeling, what is really important is the consistency. As long as the current design works fine without major flaw, then follow it.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Dates as NULLS in Fact Table
The nulls in the date can be tricky. 12/31/9999 is a good option, but I wonder if it's appropriate in every instance. We use 12/31/9999 for Termination and End Dates when the client (or whatever) hasn't Terminated.
But I'm not sure if it's the right thing to use for missing dates, which is kind of a different animal. I think when a date is truly missing, then the old -1 might be the right thing to do. It prevents the missing date from causing problems in calulations - average days or years, for example. If 1% of the dates are missing and you put in real date in place of a null, it is possible for counts of days to be incorrect.
I think there may be different solutions for different situations with the missing date.
By the way, whenever I see 12/31/9999 used as the default, I can't help but think about the Y10K problems that it's going to cause :).
But I'm not sure if it's the right thing to use for missing dates, which is kind of a different animal. I think when a date is truly missing, then the old -1 might be the right thing to do. It prevents the missing date from causing problems in calulations - average days or years, for example. If 1% of the dates are missing and you put in real date in place of a null, it is possible for counts of days to be incorrect.
I think there may be different solutions for different situations with the missing date.
By the way, whenever I see 12/31/9999 used as the default, I can't help but think about the Y10K problems that it's going to cause :).
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dates as NULLS in Fact Table
Newer Kimball methodology has replaced integer surrogate keys with actual SQL date fields as the primary key/surrogate key field in the date dimension. The original designers you inherited from seem to have adopted this new thinking.
Having said that, in the past I've used specific dates to represent high and low dates of date ranges as well as missing dates. For example, 1/1/1900 as a low date. 12/31/9999 as a high date. 12/31/1899 as a missing date.
It's true this could mess up some date calculations. What I've done in the past is set up logic to filter out the bad dates. For example, you might want to calculate a customer's current age. Your code would take the customer's birthday and count the number of days or years between today. Just place a filter in the logic then that says "WHERE BIRTHDATE != '1899-12-31'" or something similar. Maybe you trap these and force the age to zero because it's obviously missing.
Either way, it doesn't matter. The point is that if you're consistent with the dates that represent high/low/missing then you can always trap and handle them. You'd have to do it anyway, even if you had a surrogate key in your table. Since it's a foreign key in your fact table, the fact table field should never be null. You'll likely assign a -1 (missing) surrogate key to that field, and that -1 row in the date dim is going to have some kind of date in it that you'll have to deal with.
Hope this helps.
Having said that, in the past I've used specific dates to represent high and low dates of date ranges as well as missing dates. For example, 1/1/1900 as a low date. 12/31/9999 as a high date. 12/31/1899 as a missing date.
It's true this could mess up some date calculations. What I've done in the past is set up logic to filter out the bad dates. For example, you might want to calculate a customer's current age. Your code would take the customer's birthday and count the number of days or years between today. Just place a filter in the logic then that says "WHERE BIRTHDATE != '1899-12-31'" or something similar. Maybe you trap these and force the age to zero because it's obviously missing.
Either way, it doesn't matter. The point is that if you're consistent with the dates that represent high/low/missing then you can always trap and handle them. You'd have to do it anyway, even if you had a surrogate key in your table. Since it's a foreign key in your fact table, the fact table field should never be null. You'll likely assign a -1 (missing) surrogate key to that field, and that -1 row in the date dim is going to have some kind of date in it that you'll have to deal with.
Hope this helps.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Dates as NULLS in Fact Table
BrianJarrett wrote:Newer Kimball methodology has replaced integer surrogate keys with actual SQL date fields as the primary key/surrogate key field in the date dimension.
Can you point us to any published books/articles/design tips that advocate using a date field as the key of the date dimension?
Last year I considered switching to this approach when we adopted SQL 2008 due to the new 3-byte Date data type. I started this thread:
http://forum.kimballgroup.com/t436-sql-server-2008-date-data-type-as-dimension-key
The consensus of that discussion clearly leaned in favor of using an integer SK for the date dimension. So I took the "safe" route and kept my YYYYMMDD key.
However, as has been pointed out, even when using an integer SK for the date dimension, you still need to come up with "special" date values to populate the Date data type field in the date dimension for non-date rows. So from that perspective it really doesn't matter whether the key to the date dimension is YYYYMMDD or a date data type field. Either way you need special values for NA, Unspecified, Hasn't Happend Yet, etc.
It seems like the big advantage of using a date data type as the dimension key is that any date field in the fact table can work either as a DD (when used by itself) or as the key to the date dimension--particularly handy for those fact tables with numerous dates which most of the time are not used for slice-and-dice. At any point any one of those dates could be used as the key to the date dimension--instant role-playing dimension--at least in the hand-coded SQL world. But this advantage largely disappears if working with a cube or BI semantic layer where the join from the fact table date field to the date dimension would need to be defined in advance in order to use that date field in a dimensional manner.
Also, I am still a little nervous about making the switch to a date type key myself for fear that I might later encounter a BI tool that doesn't properly support a non-integer key to the date dimension.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Dates as NULLS in Fact Table
This was communicated to us by Margy Ross in the Kimball Data Warehouse Lifecycle Training in Chicago, October 2007. I haven't looked for any articles but I would presume the Kimball Group has something formally written on it.VHF wrote:Can you point us to any published books/articles/design tips that advocate using a date field as the key of the date dimension?
The bottom line is that a meaningless surrogate key works, a "smart" integer works, and a date field works. It's a judgement call really; I tend to lean toward using the date. If you never do any hand-coded SQL against your fact tables then the value tends to disappear. In the past I've taken a hybrid approach; I used a meaningless surrogate key for the primary key of the date dimension table (and the corresponding foreign key in the fact table) but also included the date itself in a separate field on the fact table. It eats more disk, but it provides maximum flexibility.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Thanks for the excellent replies!
Sorry for the late reply. I didn't set up my question to notify me when a reply was posted!
Thanks for the excellent replies and thank you VHF! This is the way I will go in the situation I have. I will add 12/31/9999 to represent nulls. That is only 'pseudo' date I need right now and I believe it will solve my current issue.
Thanks!
Thanks for the excellent replies and thank you VHF! This is the way I will go in the situation I have. I will add 12/31/9999 to represent nulls. That is only 'pseudo' date I need right now and I believe it will solve my current issue.
Thanks!
abbi- Posts : 8
Join date : 2010-10-11
Keep it simple
Your current model seems to have the principle that date fields are not null. In that case, I would act accordingly and keep the time dimension intact and use a 'sysdate' for you 'Expiration date' when it's null. In addition to this, you must add a column to your fact table, for instance, 'expiration is null' and you should fill it with a boolean or just 'yes' and 'no'.
The advantage is that:
1. you can always make a link to your time dimension
2. you can easily filter out the empty expiration dates
3. most important, you keep you time dimension intact, as it is
The advantage of the last point has a lot to do with reporting. Your application has been around for some time (if I understand correctly), so if you would add an artificial date like '31-12-2999', it might compromise reports that use things like 'maxdate'. Of course you can filter that too. But I think my solution is more elegant. And what's more important, you stick to the general principle that date fields are not null.
The advantage is that:
1. you can always make a link to your time dimension
2. you can easily filter out the empty expiration dates
3. most important, you keep you time dimension intact, as it is
The advantage of the last point has a lot to do with reporting. Your application has been around for some time (if I understand correctly), so if you would add an artificial date like '31-12-2999', it might compromise reports that use things like 'maxdate'. Of course you can filter that too. But I think my solution is more elegant. And what's more important, you stick to the general principle that date fields are not null.
Model still intact
Andre,
Yes, the current model does have the principle that date fields are not null except for this first case, expiration date. So I do like the idea of adding a flag when it is null for reporting purposes.
Thanks,
Abbi
Yes, the current model does have the principle that date fields are not null except for this first case, expiration date. So I do like the idea of adding a flag when it is null for reporting purposes.
Thanks,
Abbi
abbi- Posts : 8
Join date : 2010-10-11
Re: Dates as NULLS in Fact Table
You are not supposed to throw attributes into a fact table like that, even if it's a flag. Adding attributes directly in fact table will unnecessarily increase the size of fact table for almost nothing but performance overhead. If you need a flag, include it in the relevant dimension or combine it into a junk dimension.
However, sometimes a simple solution could be the best solution but unfortunately always overlooked. As far as this topic goes, there are a few easy and good options but none of them is perfect. Our job as a dimensional modeller is to pick up a reasonable one without over-engineering and major impact to the existing workable design. To me two options fit into this category if you leave date attribute as a self contained degenerate dimension in the fact table. Using 31/12/9999 to replace null expiry date field will make most of your queries work simpler and faster, and it's unlikely you ever need to do max on this attribute. The second reasonable and simple option is to allow null value for an open expiry date in the fact table and utilise COALESCE function in your query. So you might end up something like COALESCE(ExpiryDate, '31/12/9999'). You would ask why not just use the option 1 which seems even simpler. However I would leave it to you to make your choice after understanding a less known but powerful feature of the function demonstrated by this example, COALESCE(ExpiryDate1, ExpiryDate2, ExpiryDate3, '31/12/9999').
However, sometimes a simple solution could be the best solution but unfortunately always overlooked. As far as this topic goes, there are a few easy and good options but none of them is perfect. Our job as a dimensional modeller is to pick up a reasonable one without over-engineering and major impact to the existing workable design. To me two options fit into this category if you leave date attribute as a self contained degenerate dimension in the fact table. Using 31/12/9999 to replace null expiry date field will make most of your queries work simpler and faster, and it's unlikely you ever need to do max on this attribute. The second reasonable and simple option is to allow null value for an open expiry date in the fact table and utilise COALESCE function in your query. So you might end up something like COALESCE(ExpiryDate, '31/12/9999'). You would ask why not just use the option 1 which seems even simpler. However I would leave it to you to make your choice after understanding a less known but powerful feature of the function demonstrated by this example, COALESCE(ExpiryDate1, ExpiryDate2, ExpiryDate3, '31/12/9999').
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Virtual solution
What we're not supposed to do either is changing a time dimension for the purpose of one fact table when it works perfectly fine for all the others.
It's arguable if adding an extra flag item to a fact table results in such a thing as performance overhead. But if a junk dimension already exists then you can add the flag there.
However, if you're using an Oracle 11g database think about using a virtual column based on your "expire date". You can use the virtual column to implement the foreign key relation to your time dimension. Of course, you still need a "not null" value for your "expire date" but you don't need a flag column because you can refer to the real "expiration date" column to select the null values.
It's arguable if adding an extra flag item to a fact table results in such a thing as performance overhead. But if a junk dimension already exists then you can add the flag there.
However, if you're using an Oracle 11g database think about using a virtual column based on your "expire date". You can use the virtual column to implement the foreign key relation to your time dimension. Of course, you still need a "not null" value for your "expire date" but you don't need a flag column because you can refer to the real "expiration date" column to select the null values.
Excellent thread
WOW...what a great thread/dialogue. I have nothing really to add other than we role play our time dimension for those date columns that need to be used in OLAP, we use SK in the Date DIMS, and i really want to know how it all worked out for Abbi??? (the one who initially posted)
sheridan06- Posts : 1
Join date : 2011-05-24
Re: Dates as NULLS in Fact Table
I ended up going with the simplest solution so there is not a major redesign by adding a row to the Date Dimension for '12/31/9999' to represent a null date as suggested by the first response, VHF. Did not add a Flag to the Fact table for expiration is null... So far we haven't had any issues...
Thanks!
Thanks!
abbi- Posts : 8
Join date : 2010-10-11
Similar topics
» How to manage nulls in Fact talble
» Fact Table that is mostly dates
» Accumulating Fact Table Dates as Role Play Dimension and Descriptions
» Combination of SCD2, partial snowflake, Effective and end dates in fact table
» Fact with eff / exp dates referencing dimension with eff / exp dates
» Fact Table that is mostly dates
» Accumulating Fact Table Dates as Role Play Dimension and Descriptions
» Combination of SCD2, partial snowflake, Effective and end dates in fact table
» Fact with eff / exp dates referencing dimension with eff / exp dates
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum