Unified Method for Handling Unknown Attributes

Go down

Unified Method for Handling Unknown Attributes

Post  BICorvette on Mon Jun 16, 2014 6:46 pm


I am having trouble deciding on how to handle unknown dimensional attribute values universally between an RDBMS (SQL Server) and OLAP (SQL Server Analysis Services 2008 R2) environment within a dimensional data mart environment.

As a common scenario, assume I will have an EDW environment (RDBMS) that is populated first with source data, which is then fed into an RDBMS dimensional database environment.  A SQL Server Analysis Services database (MOLAP) would then reference this RDBMS data and build the needed OLAP cubes.

I originally was proposing to adhere to the following business rules when formatting “unknown” attribute values that are referenced from the source system:

Dimensional Data Mart - RDBMS:

  • Text Attributes: If the source EDW value is unknown (null, empty, missing), set the value being inserted into the RDBMS tables to a string literal value of “Unknown”
  • This business rule would hopefully conform to design tip# 43: (“Instead, we recommend substituting an appropriately descriptive string, like “Unknown” or “Not provided”)  

  • Numerical Attributes (Dates, Integers, Decimals): If the source EDW value is unknown (null, empty, missing), set the value being inserted into the RDBMS tables to NULL (ANSI null).
  • Hopefully this business rule would conform to design tip# 128:

Dimensional Data Mart - OLAP Cubes

Now, within the SQL Server Analysis Services OLAP environment, I am proposing the following business rules for presenting unknown attribute values:

  • Text Attributes: If the source RDBMS value is set to a string literal value of “Unknown”, carry over the value in its original, unmodified form.
  • If the value is not applicable, we will be setting the value to a string literal value of “Not Applicable”.

The problem I am having is that if I have a conformed dimension that contains both numerical (stored as numerical data types) and text based attributes, if the user wishes to browse this dimension, there are 2 instances of the value “Unknown” being shown.  It appears SSAS only lets you configure the “Unknown” member at the dimension level, and so if one or more dimensions use this special “Unknown” member, all dimensions will display it.

This can prove confusing to users if they see 2 instances of “Unknown” being displayed as attribute values even if one ties to actual data.

In an effort to standardize things, I was thinking of proposing the following:

Dimensional Data Mart - RDBMS:

  1. Within the RDBMS data mart environment, set all unknown attribute values within the core dimension DB tables to ANSI NULL, regardless of the data type.
  2. Create a SQL view over the tables which would convert NULL values tied to text data type columns to a string literal of “Unknown” which would be presented to end users.

Dimensional Data Mart - OLAP Cubes

  1. All NULL values referenced from the RDBMS data mart tables would be set to the special SSAS “Unknown” member.
  2. Being that no string literal values of “Unknown” are being published within the RDBMS data mart environment, there would only be one instance of the value “Unknown” being present within the dimensional attribute in the SSAS cube.

I would like to ask if this is a sensible, scalable, and good practice oriented approach for unifying “Unknown” values across 2 physical environments encompassing my dimensional data mart (RDBMS and OLAP).

I would think this approach has the following benefits:

  1. I am following Ralph Kimball’s recommendations (design tips 43 and 128) with ensuring that text based attributes are never published to end users as null within the RDBMS environment.
  2. Users browsing dimensional attributes within the OLAP environment would be presented with the “Unknown” member for unknown values that originated from the source system.
  3. When processing potentially millions of dimension rows, the SSAS cube reprocessing task would reference dimension attribute columns as is without needing to format them back to null in order for SSAS to recognize the values as null, and to utilize the special “Unknown” member value.
  4.                I would hate to incur the performance penalty of placing SQL CASE statements for each dimension column if I were to set the core RDBMS table column values to a string literal value of “Unknown” for unknown/missing values.
  5. A value of “Unknown” is theoretically unified between both RDBMS and OLAP environments.

Please let me know if you agree with this approach, and/or any other suggestions that would be considered good practice for this business scenario.



Posts : 7
Join date : 2012-03-04

View user profile

Back to top Go down

Re: Unified Method for Handling Unknown Attributes

Post  BoxesAndLines on Tue Jun 17, 2014 8:29 am

That should work fine. I, however, don't go to great lengths to address non-identifying attributes that are null. That is, if I receive a null for an attribute that is a natural key for a dimension, then the fact FK gets assigned to the unknown dimension row. All other non-identifying null dimension attributes go in as nulls. Fact metric attributes (i.e. numerics) that come in null also get loaded as null.

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum