Unified Method for Handling Unknown Attributes
2 posters
Page 1 of 1
Unified Method for Handling Unknown Attributes
Hello,
I am having trouble deciding on how to handle unknown dimensional attribute values universally between an RDBMS (SQL Server) and OLAP (SQL Server Analysis Services 2008 R2) environment within a dimensional data mart environment.
As a common scenario, assume I will have an EDW environment (RDBMS) that is populated first with source data, which is then fed into an RDBMS dimensional database environment. A SQL Server Analysis Services database (MOLAP) would then reference this RDBMS data and build the needed OLAP cubes.
I originally was proposing to adhere to the following business rules when formatting “unknown” attribute values that are referenced from the source system:
Dimensional Data Mart - RDBMS:
Dimensional Data Mart - OLAP Cubes
Now, within the SQL Server Analysis Services OLAP environment, I am proposing the following business rules for presenting unknown attribute values:
The problem I am having is that if I have a conformed dimension that contains both numerical (stored as numerical data types) and text based attributes, if the user wishes to browse this dimension, there are 2 instances of the value “Unknown” being shown. It appears SSAS only lets you configure the “Unknown” member at the dimension level, and so if one or more dimensions use this special “Unknown” member, all dimensions will display it.
This can prove confusing to users if they see 2 instances of “Unknown” being displayed as attribute values even if one ties to actual data.
In an effort to standardize things, I was thinking of proposing the following:
Dimensional Data Mart - RDBMS:
Dimensional Data Mart - OLAP Cubes
I would like to ask if this is a sensible, scalable, and good practice oriented approach for unifying “Unknown” values across 2 physical environments encompassing my dimensional data mart (RDBMS and OLAP).
I would think this approach has the following benefits:
Please let me know if you agree with this approach, and/or any other suggestions that would be considered good practice for this business scenario.
Thanks!
I am having trouble deciding on how to handle unknown dimensional attribute values universally between an RDBMS (SQL Server) and OLAP (SQL Server Analysis Services 2008 R2) environment within a dimensional data mart environment.
As a common scenario, assume I will have an EDW environment (RDBMS) that is populated first with source data, which is then fed into an RDBMS dimensional database environment. A SQL Server Analysis Services database (MOLAP) would then reference this RDBMS data and build the needed OLAP cubes.
I originally was proposing to adhere to the following business rules when formatting “unknown” attribute values that are referenced from the source system:
Dimensional Data Mart - RDBMS:
- Text Attributes: If the source EDW value is unknown (null, empty, missing), set the value being inserted into the RDBMS tables to a string literal value of “Unknown”
- This business rule would hopefully conform to design tip# 43: (“Instead, we recommend substituting an appropriately descriptive string, like “Unknown” or “Not provided”)
- Numerical Attributes (Dates, Integers, Decimals): If the source EDW value is unknown (null, empty, missing), set the value being inserted into the RDBMS tables to NULL (ANSI null).
- Hopefully this business rule would conform to design tip# 128:
Dimensional Data Mart - OLAP Cubes
Now, within the SQL Server Analysis Services OLAP environment, I am proposing the following business rules for presenting unknown attribute values:
- Text Attributes: If the source RDBMS value is set to a string literal value of “Unknown”, carry over the value in its original, unmodified form.
- If the value is not applicable, we will be setting the value to a string literal value of “Not Applicable”.
- Numerical Attributes (Dates, Integers, Decimals): If the source RDBMS value is null, use the special SSAS “Unknown” member attribute value.
- The special SSAS “Unknown” member attribute is discussed here: http://msdn.microsoft.com/en-us/library/ms170707(v=sql.105).aspx
The problem I am having is that if I have a conformed dimension that contains both numerical (stored as numerical data types) and text based attributes, if the user wishes to browse this dimension, there are 2 instances of the value “Unknown” being shown. It appears SSAS only lets you configure the “Unknown” member at the dimension level, and so if one or more dimensions use this special “Unknown” member, all dimensions will display it.
This can prove confusing to users if they see 2 instances of “Unknown” being displayed as attribute values even if one ties to actual data.
In an effort to standardize things, I was thinking of proposing the following:
Dimensional Data Mart - RDBMS:
- Within the RDBMS data mart environment, set all unknown attribute values within the core dimension DB tables to ANSI NULL, regardless of the data type.
- Create a SQL view over the tables which would convert NULL values tied to text data type columns to a string literal of “Unknown” which would be presented to end users.
Dimensional Data Mart - OLAP Cubes
- All NULL values referenced from the RDBMS data mart tables would be set to the special SSAS “Unknown” member.
- Being that no string literal values of “Unknown” are being published within the RDBMS data mart environment, there would only be one instance of the value “Unknown” being present within the dimensional attribute in the SSAS cube.
I would like to ask if this is a sensible, scalable, and good practice oriented approach for unifying “Unknown” values across 2 physical environments encompassing my dimensional data mart (RDBMS and OLAP).
I would think this approach has the following benefits:
- I am following Ralph Kimball’s recommendations (design tips 43 and 128) with ensuring that text based attributes are never published to end users as null within the RDBMS environment.
- Users browsing dimensional attributes within the OLAP environment would be presented with the “Unknown” member for unknown values that originated from the source system.
- When processing potentially millions of dimension rows, the SSAS cube reprocessing task would reference dimension attribute columns as is without needing to format them back to null in order for SSAS to recognize the values as null, and to utilize the special “Unknown” member value.
- I would hate to incur the performance penalty of placing SQL CASE statements for each dimension column if I were to set the core RDBMS table column values to a string literal value of “Unknown” for unknown/missing values.
- A value of “Unknown” is theoretically unified between both RDBMS and OLAP environments.
Please let me know if you agree with this approach, and/or any other suggestions that would be considered good practice for this business scenario.
Thanks!
BICorvette- Posts : 7
Join date : 2012-03-04
Re: Unified Method for Handling Unknown Attributes
That should work fine. I, however, don't go to great lengths to address non-identifying attributes that are null. That is, if I receive a null for an attribute that is a natural key for a dimension, then the fact FK gets assigned to the unknown dimension row. All other non-identifying null dimension attributes go in as nulls. Fact metric attributes (i.e. numerics) that come in null also get loaded as null.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Unified Information System
» dimension table design question for around 100 attributes and higher level calculated attributes
» Null value for dates
» Best BI Software for Kimball Method?
» Date dimension, "unknown" entry
» dimension table design question for around 100 attributes and higher level calculated attributes
» Null value for dates
» Best BI Software for Kimball Method?
» Date dimension, "unknown" entry
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum