Avoiding Nulls in Dimension Tables
4 posters
Page 1 of 1
Avoiding Nulls in Dimension Tables
I am trying to prevent any of the columns in a dimension table from containing nulls and came across an interesting situation.
I have First Name, Last Name, Suffix. The Suffix is be null most of the time. If the 3 fields have to be concatenated in an report or ad hoc query and the user isn't careful, the results of concatenating the 3 fields can be null.
I should probably add in a field called full name and have the concatenation already done, but besides that, what's the best way of handling nulls in fields that have meaning such as a Suffix or Prefix in a Name? I would like to avoid, "NA. John Doe, NA".
Is it OK to enter in a space? Doing so makes my skin crawl but it's got to be better than "NA. John Doe, NA".
I have First Name, Last Name, Suffix. The Suffix is be null most of the time. If the 3 fields have to be concatenated in an report or ad hoc query and the user isn't careful, the results of concatenating the 3 fields can be null.
I should probably add in a field called full name and have the concatenation already done, but besides that, what's the best way of handling nulls in fields that have meaning such as a Suffix or Prefix in a Name? I would like to avoid, "NA. John Doe, NA".
Is it OK to enter in a space? Doing so makes my skin crawl but it's got to be better than "NA. John Doe, NA".
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Avoiding Nulls in Dimension Tables
In my way of thinking, any good methodology or best practice could be just as detrimental as bad ones if used dogmatically. NULL value is generally bad for query, but it still has its place in database in general and DW in particular, otherwise it would be deprecated in all new versions of database systems.
I can come up a number of reasons why NULL is irreplaceable by any other values. For numeric field, you could use 0 to replace NULL, but we all know they are totally different things. NULL value can make aggregate functions, including AVG, work properly, 0 will not. In your query, NULL can make calculations like A/NULL work properly, however A/0 will spoil the whole query.
Another example is for strings, again NULL will never be the same as blank string. NULL has its unambiguous meaning, while a blank string could be what, empty string, or one space or many spaces. And it is even more confusing to SQL, as spaces will be automatically trimmed off in some operations.
The last example, but not the least important, is the powerful ANSI standard function COALESCE(), aimed to replace vender specific functions, DECODE in Oracle and ISNULL in SQL Server, and sometimes awkward CASE statements. Bear in mind, COALESCE takes variable number of parameters which can replace a messy nested CASE statements much more elegantly and efficiently. However by throwing out the concept of NULL, you basically give up leveraging the power of COALESCE function within DW. Trust me, once you have started using it, giving it up is a huge pain.
In any types of project developments, I don’t like people set rules using the best practice as excuse. I tend to refer to rules as guidelines and try not to rule out giving a go to different approaches. So my question is, have you ever tried relaxing non NULL rule and see how bad it makes your DW. Based on my experience, it would only make my life a lot easier.
I can come up a number of reasons why NULL is irreplaceable by any other values. For numeric field, you could use 0 to replace NULL, but we all know they are totally different things. NULL value can make aggregate functions, including AVG, work properly, 0 will not. In your query, NULL can make calculations like A/NULL work properly, however A/0 will spoil the whole query.
Another example is for strings, again NULL will never be the same as blank string. NULL has its unambiguous meaning, while a blank string could be what, empty string, or one space or many spaces. And it is even more confusing to SQL, as spaces will be automatically trimmed off in some operations.
The last example, but not the least important, is the powerful ANSI standard function COALESCE(), aimed to replace vender specific functions, DECODE in Oracle and ISNULL in SQL Server, and sometimes awkward CASE statements. Bear in mind, COALESCE takes variable number of parameters which can replace a messy nested CASE statements much more elegantly and efficiently. However by throwing out the concept of NULL, you basically give up leveraging the power of COALESCE function within DW. Trust me, once you have started using it, giving it up is a huge pain.
In any types of project developments, I don’t like people set rules using the best practice as excuse. I tend to refer to rules as guidelines and try not to rule out giving a go to different approaches. So my question is, have you ever tried relaxing non NULL rule and see how bad it makes your DW. Based on my experience, it would only make my life a lot easier.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
add in a field called full name and have the concatenation already done
Hi Jeff,
I'd definitely recommend: "add in a field called full name and have the concatenation already done". I think the dimension designer should be making these decisions and not leaving them to various query/report authors to come up with their own logic. The coding/technical issue you highlight is only one of several - what if one author prefers a "Lastname, Prefix Firstname" format? You have eroded the consistency that your DW/DM is supposed to encourage.
FWIW I rely on "Unspecified" or similar to replace NULLs. If that makes it hard for report authors to code their own derivations, that's a good thing in my view - it will drive them back to the DM/DW to use the data provided or request new derivations in that layer.
Good luck!
Mike
I'd definitely recommend: "add in a field called full name and have the concatenation already done". I think the dimension designer should be making these decisions and not leaving them to various query/report authors to come up with their own logic. The coding/technical issue you highlight is only one of several - what if one author prefers a "Lastname, Prefix Firstname" format? You have eroded the consistency that your DW/DM is supposed to encourage.
FWIW I rely on "Unspecified" or similar to replace NULLs. If that makes it hard for report authors to code their own derivations, that's a good thing in my view - it will drive them back to the DM/DW to use the data provided or request new derivations in that layer.
Good luck!
Mike
Re: Avoiding Nulls in Dimension Tables
I agree with Jeff and Mike. Showing bad data forces the business to clean up its act.
If you want to get around adding an "Unknown" value, I'd go with a computed column for the FullName.
If you want to get around adding an "Unknown" value, I'd go with a computed column for the FullName.
Similar topics
» Fact and dimension tables - avoiding same number of rows in both
» Nulls vs Blanks in dimension tables
» Question - Nulls as Dimension Attributes
» Storing Date Keys in dimension tables versus fact tables
» Number of Columns in Fact Tables vs. Dimension Tables
» Nulls vs Blanks in dimension tables
» Question - Nulls as Dimension Attributes
» Storing Date Keys in dimension tables versus fact tables
» Number of Columns in Fact Tables vs. Dimension Tables
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum