Modeling for adhoc queries across measures and using nonconforming dimensions
3 posters
Page 1 of 1
Modeling for adhoc queries across measures and using nonconforming dimensions
We are in the early planning stages for creating the necessary data models for a webbased app that will allow users to query our data in a guided adhoc fashion.
What makes this interesting is that the data consists in a large number of measurements where each measurement has its own set of applicable dimensions and where only a few dimensions (mainly date/time) are shared among all measurements.
For example measurement M_a has dimensions Date, D_1 and D_2 while measurement M_b has dimensions Date, D_3 and D_4.
Users would e.g. want to create a report that shows the monthly sums of measurements M_a and M_b side by side where D_1 is restricted to some value and D_3 and D_4 are restricted to some other value.
Moreover, each measurement may be described by sets of nonconforming dimensions. So there will be many combinations where for measurement M_a the dimensions D_1 has some value, but there is no applicable value for D_2, and viceversa.
We have hundreds of different dimensions and dozens of measurements and for each measurement, several dozens of dimensions may apply but overall, only a few dimensions are fully conforming for all or several of the measurements.
We are now puzzled by how this data should get best modeled into first a set of fact tables with associated dimensions in a star shape and subsequenetly maybe into cubes to take the biggest advantage of preexisting tools or approaches to implement the web GUI for interactively creating the queries, specifying how to tabulate measures down and across etc.
The obvious way to do this seems to be to essentially factor this into all combinations of measures and dimensions such that each fact table only contains compatible measures and compatible dimensions  this will result in many hundreds of tables though.
But then, which preexisting tools would be available for dealing with query generation, drilling down, drilling across etc. especially when it has to be done across different fact tables or involves noncompatible dimension?
This is really two questions, I suppose:
 what is a stateofthe art, clean and scaling way to model data like this?
 how can one benefit from as many preexisting tools and solutions as possible for implementing a query app based on such data?
What makes this interesting is that the data consists in a large number of measurements where each measurement has its own set of applicable dimensions and where only a few dimensions (mainly date/time) are shared among all measurements.
For example measurement M_a has dimensions Date, D_1 and D_2 while measurement M_b has dimensions Date, D_3 and D_4.
Users would e.g. want to create a report that shows the monthly sums of measurements M_a and M_b side by side where D_1 is restricted to some value and D_3 and D_4 are restricted to some other value.
Moreover, each measurement may be described by sets of nonconforming dimensions. So there will be many combinations where for measurement M_a the dimensions D_1 has some value, but there is no applicable value for D_2, and viceversa.
We have hundreds of different dimensions and dozens of measurements and for each measurement, several dozens of dimensions may apply but overall, only a few dimensions are fully conforming for all or several of the measurements.
We are now puzzled by how this data should get best modeled into first a set of fact tables with associated dimensions in a star shape and subsequenetly maybe into cubes to take the biggest advantage of preexisting tools or approaches to implement the web GUI for interactively creating the queries, specifying how to tabulate measures down and across etc.
The obvious way to do this seems to be to essentially factor this into all combinations of measures and dimensions such that each fact table only contains compatible measures and compatible dimensions  this will result in many hundreds of tables though.
But then, which preexisting tools would be available for dealing with query generation, drilling down, drilling across etc. especially when it has to be done across different fact tables or involves noncompatible dimension?
This is really two questions, I suppose:
 what is a stateofthe art, clean and scaling way to model data like this?
 how can one benefit from as many preexisting tools and solutions as possible for implementing a query app based on such data?
josmi Posts : 3
Join date : 20121113
Re: Modeling for adhoc queries across measures and using nonconforming dimensions
Hi josmi,
I'm imagining you could combine all your measurements into one fact, with an additional Measurement Type dimension (to keep track of what each row means).
I'd probably keep the dimensions separate, but look for opportunities to consolidate multiple dimensions into one dimension with multiple attributes, if they can be related in some way (even as a Junk dimension). Presenting a flat list of "hundreds" of choices will not be useful in any UI.
Each dimension needs an "Unknown" member row. Default your "Unknown" SK value (e.g 1) into the fact FK columns for the dimensions which are not relevant to that fact row. Then the one fact table can be joined to all the dimensions.
My favourite tech for this would be the Microsoft BI stack i.e. SQL Server Analysis Services cubes with Excel (published to Excel Web Services and Power View) and SQL Server Reporting Services as the analysis and reporting tools.
Good luck!
Mike
I'm imagining you could combine all your measurements into one fact, with an additional Measurement Type dimension (to keep track of what each row means).
I'd probably keep the dimensions separate, but look for opportunities to consolidate multiple dimensions into one dimension with multiple attributes, if they can be related in some way (even as a Junk dimension). Presenting a flat list of "hundreds" of choices will not be useful in any UI.
Each dimension needs an "Unknown" member row. Default your "Unknown" SK value (e.g 1) into the fact FK columns for the dimensions which are not relevant to that fact row. Then the one fact table can be joined to all the dimensions.
My favourite tech for this would be the Microsoft BI stack i.e. SQL Server Analysis Services cubes with Excel (published to Excel Web Services and Power View) and SQL Server Reporting Services as the analysis and reporting tools.
Good luck!
Mike
Re: Modeling for adhoc queries across measures and using nonconforming dimensions
Hi Mike,
many thanks for this answer! To keep everything is as few fact tables (or even a single one) would have been what we originally would have liked most, but is this not creating restrictions as to what kind of reports one is able to create? As I pointed out, Dimensions are not conformant and we have measurements which are subdivided by some dimension A but not B and viceversa.
Example1:
For date YYYYMMDD we have a measurement A which is identified by dimension DA1 having value va1 and DA2 having va2. For the same date YYYYMMDD we have a measurement B which is identified by dimension DB1 having value vb1. Both A and B also have for the conformant dimension C the value vc.
If I split this into two fact tables I would get (showing directly the dimension value instead of the corresponding key of the dimension table)
One report would require to show, for each year accumulated in the rows, the measurements A where DA1=va1 und DA2=va2 in comparison with measurement B where DB1=vb1, possible that comparison done for all values of DC in the columns like so:
How could one get the original data into a single fact table and still be able to create this report?
One way to create the fact table that I see is this:
With this method, all measurements in a row where we have dimensions that do not apply to this measurement would be null and all dimensions that apply to null measurements would be N/A (not applicable).
Is there another way to do this that would solve all my problems?
Excuse me if I am overlooking the obvious here (i somehow have the feeling that I must be overlooking something basic here), but how does one construct cubes that deal with this so that we can get reports as illustrated with the sample report?
many thanks for this answer! To keep everything is as few fact tables (or even a single one) would have been what we originally would have liked most, but is this not creating restrictions as to what kind of reports one is able to create? As I pointed out, Dimensions are not conformant and we have measurements which are subdivided by some dimension A but not B and viceversa.
Example1:
For date YYYYMMDD we have a measurement A which is identified by dimension DA1 having value va1 and DA2 having va2. For the same date YYYYMMDD we have a measurement B which is identified by dimension DB1 having value vb1. Both A and B also have for the conformant dimension C the value vc.
If I split this into two fact tables I would get (showing directly the dimension value instead of the corresponding key of the dimension table)
YYYYMMDD  A  va1  va2  vc1 
YYYYMMDD  B  vb1  vc1 
One report would require to show, for each year accumulated in the rows, the measurements A where DA1=va1 und DA2=va2 in comparison with measurement B where DB1=vb1, possible that comparison done for all values of DC in the columns like so:
Year  DC=vc1  DC=vc1  DC=vc2  DC=vc2 
A  B  A  B  
2000  12  13  22  27 
2001  10  11  21  21 
How could one get the original data into a single fact table and still be able to create this report?
One way to create the fact table that I see is this:
Date  MeasA  MeasB  DimDA1  DimA2  DimB1  DimC 
YYYYMMDD  A  null  va1  va2  N/A  vc1 
YYYYMMDD  null  B  N/A  N/A  vb1  vc1 
Is there another way to do this that would solve all my problems?
Excuse me if I am overlooking the obvious here (i somehow have the feeling that I must be overlooking something basic here), but how does one construct cubes that deal with this so that we can get reports as illustrated with the sample report?
josmi Posts : 3
Join date : 20121113
Re: Modeling for adhoc queries across measures and using nonconforming dimensions
A fundamental premise of a relational or dimensional model is there exists attributes by which data can be associated with other data.
In a dimensional framework, its call conformance. Without it, all you have are independent measures.
Forget about using cubes. They require conforming attributes, otherwise they are not of much use.
You basically have two options: write a whole bunch of ugly SQL queries, or go back to the drawing board and rethink the situation. One fact table or a hundred fact tables isn't going to make any difference.
In a dimensional framework, its call conformance. Without it, all you have are independent measures.
Forget about using cubes. They require conforming attributes, otherwise they are not of much use.
You basically have two options: write a whole bunch of ugly SQL queries, or go back to the drawing board and rethink the situation. One fact table or a hundred fact tables isn't going to make any difference.
Re: Modeling for adhoc queries across measures and using nonconforming dimensions
Thank you for this advice!
It has already dawned on us that because of the many nonconforming dimenions, we won't be able to create useful cubes or use many of the standard tools for analysis and reports.
It is still a bit hard to believe that there is essentially nothing apart from doing it all ourselves that would support analysis, comparison, drill down, aggregation etc. in the case of multiple independet measures. In our domain, comparing those is exactly what is needed to get the required insights.
I am a bit puzzled about your second option:
It has already dawned on us that because of the many nonconforming dimenions, we won't be able to create useful cubes or use many of the standard tools for analysis and reports.
It is still a bit hard to believe that there is essentially nothing apart from doing it all ourselves that would support analysis, comparison, drill down, aggregation etc. in the case of multiple independet measures. In our domain, comparing those is exactly what is needed to get the required insights.
I am a bit puzzled about your second option:
In what way could one rethink the situation? Are there any tools or documented approaches how to best model data in such a situation or is the only way to make progress to sit down, think hard and essentially reinvent all the wheels necessary (figure out the data model, write tools for ETL, querying, reporting, managing metadata etc.)?ngalemmo wrote:
You basically have two options: write a whole bunch of ugly SQL queries, or go back to the drawing board and rethink the situation. One fact table or a hundred fact tables isn't going to make any difference.
josmi Posts : 3
Join date : 20121113
Similar topics
» Modeling Help: SemiAdditive Measures?
» Measures having non applicable dimensions
» Measures Additive Over Some Dimensions
» Measures as Dimensions: Banding Approach Help
» Modeling two dimensions
» Measures having non applicable dimensions
» Measures Additive Over Some Dimensions
» Measures as Dimensions: Banding Approach Help
» Modeling two dimensions
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum

