Hierarchies in fact tables.
2 posters
Page 1 of 1
Hierarchies in fact tables.
I've recently had to create dimensional schemas where there are no numerically additive facts. The pertinent information is usually an event, and the reporting filters are usually lots of dates. So, a Service_Call has a created date, scheduled date, actual date, etc. In most of these cases, there is a hierarchy of events. So, an Service_Order may contain several Install_Jobs, and an Install_Job several Equipment_Installs. A 1:m hierarchy in most cases. And MOST of the same dimensions and same fact table columns are used at all levels of the hierarchy. And the dates at each level are distinct, and don't follow down the hierarchy (the Service_date at the Service_Oder may not be the same as the same date at the Install_Job, Equipment_Install, etc. All distinct values of basically the same attributes at each level.
My design options seem to be two:
1. Create a separate fact table at each level (so Service_Order_Fact, Install_Job_Fact, and Equipment_Install_Fact). For most reporting, all three will have to be joined together.
And 80% of the columns are the same at each level, so all three facts look very much alike).
2. Create one denormalized Fact table to contain all rows at all levels, and use a type indicator to separate them (S, I, and E say). Date columns would be reused (there would be one Service_Date column, populated for the appropriate level of that particular row).
I find most modelers used to normalized modeling tend toward 1, since it's more normalized in structure. But I hate joining facts, so I often start with approach 2.
What do you all think? One multilevel fact or separate facts for each level, each with most of the same columns?
My design options seem to be two:
1. Create a separate fact table at each level (so Service_Order_Fact, Install_Job_Fact, and Equipment_Install_Fact). For most reporting, all three will have to be joined together.
And 80% of the columns are the same at each level, so all three facts look very much alike).
2. Create one denormalized Fact table to contain all rows at all levels, and use a type indicator to separate them (S, I, and E say). Date columns would be reused (there would be one Service_Date column, populated for the appropriate level of that particular row).
I find most modelers used to normalized modeling tend toward 1, since it's more normalized in structure. But I hate joining facts, so I often start with approach 2.
What do you all think? One multilevel fact or separate facts for each level, each with most of the same columns?
spthomas- Posts : 3
Join date : 2011-11-16
Re: Hierarchies in fact tables.
Separate fact tables seem more sound to me. 80% common dimensionality does not warrant mixing the facts in a single table which is prone to confusion and double counting. I guess you would have service order as degenerate dimension in all the relevant fact tables as a key conformance point.
Interestingly, the option 2 is what is called normalized facts, and it is really over normalized by many modelers. Fact normalization would apply to many (>10) types transactions that are mixed in a business event.
Interestingly, the option 2 is what is called normalized facts, and it is really over normalized by many modelers. Fact normalization would apply to many (>10) types transactions that are mixed in a business event.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Hierarchies in fact tables.
My concern about separate fact tables is that most of the reporting will involve all levels. A Service Call report needs the details mostly, or at least counts of the lower level items. So we're building fact tables that will be joined 80% of the time. But, just the idea of multi-level facts seems wrong. And if there were any additive facts I'd agree. Does this situation (an event fact with multi-level hierarchies) fall between the cracks? It's not a conventional transaction fact due to the multi-levels. It's not a snapshot fact because it describes an ongoing chain of events. But it's not an accumulating snapshot fact because any futher work on the event might result in updating several rows.
The anomaly here seems to be the level indicator. to query this fact, you have to constrain on a level indicator for that row, because all rows have mostly the same columns, just at different levels (the Scheduled Date appears at all three levels, as does the completed date, etc.). But is this different than constraining on a dimension key value? I guess for a more pure implementation I could create a LEVEL_DIM with 3 rows in it, and then it would look like all the other dimension keys, but that seems a bit overdone.
I have found that modelers tend to normalize things too much. It seems inelegant to have mixed levels and redundancy in a dimension, so they snowflake, often 2 or 3 levels deep. While a dimensional model is based on the theoretical framework of the relational model (facts are related to dimensions), the normalization rules don't always fit.
So thanks for the comments. Still thinking about this one.
The anomaly here seems to be the level indicator. to query this fact, you have to constrain on a level indicator for that row, because all rows have mostly the same columns, just at different levels (the Scheduled Date appears at all three levels, as does the completed date, etc.). But is this different than constraining on a dimension key value? I guess for a more pure implementation I could create a LEVEL_DIM with 3 rows in it, and then it would look like all the other dimension keys, but that seems a bit overdone.
I have found that modelers tend to normalize things too much. It seems inelegant to have mixed levels and redundancy in a dimension, so they snowflake, often 2 or 3 levels deep. While a dimensional model is based on the theoretical framework of the relational model (facts are related to dimensions), the normalization rules don't always fit.
So thanks for the comments. Still thinking about this one.
spthomas- Posts : 3
Join date : 2011-11-16
Similar topics
» Relationship between scd hierarchies and then between fact tables
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» Single fact table vs multiple fact tables - what is the right way in a dimensional model?
» Storing Date Keys in dimension tables versus fact tables
» Multiple Fact Tables vs. Consolidated Fact Table
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» Single fact table vs multiple fact tables - what is the right way in a dimensional model?
» Storing Date Keys in dimension tables versus fact tables
» Multiple Fact Tables vs. Consolidated Fact Table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum