How to manage nulls in Fact talble
3 posters
Page 1 of 1
How to manage nulls in Fact talble
Hello, I am new with data warehousing. I need some help with following questions
1. should the nulls be left as nulls in fact table or should they be replaced with some number that might never be used like -7 or -1. If they should be replced with -1/-7 what's the logic behind this?
2.Secondly how to handle nulls in a fact table with datatype as date . We can not insert a number like -7. Is it a good idea to change the date data type columns present in fact table to varchar2 or char as they are just there for display purpose this way we will be able to insert -7/-1.
Thaks in advance.
1. should the nulls be left as nulls in fact table or should they be replaced with some number that might never be used like -7 or -1. If they should be replced with -1/-7 what's the logic behind this?
2.Secondly how to handle nulls in a fact table with datatype as date . We can not insert a number like -7. Is it a good idea to change the date data type columns present in fact table to varchar2 or char as they are just there for display purpose this way we will be able to insert -7/-1.
Thaks in advance.
wizard- Posts : 13
Join date : 2010-11-30
How to manage nulls in the Fact table
First, we have to be clear about what columns in the fact table might have nulls. If the column is a foreign key column to one of the dimensions, it must have a value in it. Leaving it null means the fact row will not be included in the results set in a standard inner join between the fact table and the dimension. The -1 idea comes from the way we generally manage surrogate keys in our dimension tables. We usually start with 1 and increment by 1. That leaves the negative numbers available. You would add a row to the dimension with the key of -1 and fill in the descriptive attributes with appropriate entries, like "Missing Product", and "Missing Region".
Here's a link to an old design tip on this topic: Dealing With Nulls In The Dimensional Model
If, on the other hand, the Null values are measures in the fact table, you have to be very careful what you do with them. -1, or -7 would not be appropriate because they look like real measures and would be included in any aggregates. For example, if the field is SalesQty and you put in -7, clearly any user query will get the wrong answer for Total SalesQty. These should probably be left null. Only put a real number in a measure field after careful deliberation with your business folks.
The third possibility is you have various values in your fact table that are not measures and do not join to dimensions. You mention a date as an example. I would argue that you should try to convert the date to an integer surrogate key and join it back to your Date dimension. That way, you can add a -1 row to your Date dimension as described in the first paragraph above.
If you don't have a dimension join to translate the meaning of a -1 row, substituting any value for Nulls will most likely lead to confusion.
--Warren
Here's a link to an old design tip on this topic: Dealing With Nulls In The Dimensional Model
If, on the other hand, the Null values are measures in the fact table, you have to be very careful what you do with them. -1, or -7 would not be appropriate because they look like real measures and would be included in any aggregates. For example, if the field is SalesQty and you put in -7, clearly any user query will get the wrong answer for Total SalesQty. These should probably be left null. Only put a real number in a measure field after careful deliberation with your business folks.
The third possibility is you have various values in your fact table that are not measures and do not join to dimensions. You mention a date as an example. I would argue that you should try to convert the date to an integer surrogate key and join it back to your Date dimension. That way, you can add a -1 row to your Date dimension as described in the first paragraph above.
If you don't have a dimension join to translate the meaning of a -1 row, substituting any value for Nulls will most likely lead to confusion.
--Warren
warrent- Posts : 41
Join date : 2008-08-18
How to manage nulls in the Fact table
Thanks for you prompt reply. This clears lots of ambiguities in my mind.
Regarding third possibility, values in your fact table that are not measures and do not join to dimensions, you have suggested to join these values back to the dimension so that -1 one can be inserted. Here is why I decided to goahead without joining to them to dimension.
These values will always only be displayed and never be queried upon meaning they will never be in a where caluse. Joining them to a dimension will always create an extra join. Secondly granularity level of these dates is to a min which will make date dimension huge as we have dates from 1850 till 2015.
Please let me know if I am missing out anything.
Thanks Again
--Wizard
Regarding third possibility, values in your fact table that are not measures and do not join to dimensions, you have suggested to join these values back to the dimension so that -1 one can be inserted. Here is why I decided to goahead without joining to them to dimension.
These values will always only be displayed and never be queried upon meaning they will never be in a where caluse. Joining them to a dimension will always create an extra join. Secondly granularity level of these dates is to a min which will make date dimension huge as we have dates from 1850 till 2015.
Please let me know if I am missing out anything.
Thanks Again
--Wizard
wizard- Posts : 13
Join date : 2010-11-30
Re: How to manage nulls in Fact talble
I would still use date key for these date fields in the fact as they are not measures and degenerated dimensions, so they are normal dimensions. Using smart surrogate integer key yyyymmdd and -1 for nonexistence date can make the key self deriving and avoid null date value. The advantage of FK to the date dimension is that you can show more descriptive information for any odd dates stored in much smaller dimension instead of repeating textual values in the fact.
If 1850 and 2015 are a valid years in the fact table, then extending the date dimension to cover enough history and future dates is quite reasonable approach. A date dimension with tens of thousand records is still a small dimension compared to what we refer to as big dimensions. You don't have to populate all the date attributes for those out-of-range dates as long as referential integrity is ensured. You could incrementally extend your date dimension based on the dates in the fact if that makes the date dimension significantly smaller.
If 1850 and 2015 are a valid years in the fact table, then extending the date dimension to cover enough history and future dates is quite reasonable approach. A date dimension with tens of thousand records is still a small dimension compared to what we refer to as big dimensions. You don't have to populate all the date attributes for those out-of-range dates as long as referential integrity is ensured. You could incrementally extend your date dimension based on the dates in the fact if that makes the date dimension significantly smaller.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Dates as NULLS in Fact Table
» NULLs in source lead to duplicates in Fact table
» Question - Nulls as Dimension Attributes
» How to manage Conformed dimensions
» Bridge table to manage customer multi interests
» NULLs in source lead to duplicates in Fact table
» Question - Nulls as Dimension Attributes
» How to manage Conformed dimensions
» Bridge table to manage customer multi interests
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum