Factless fact table versus Dimension (downstream implications)
3 posters
Page 1 of 1
Factless fact table versus Dimension (downstream implications)
Factless fact table versus Dimension (downstream implications)
Let's keep this simple for discussion purposes (I’m leaving off dimensions and surrogate key fields on fact table, it really isn’t this small)
We are trying to build a dimensional DW for a source system which
tracks developer tasks and the time being logged to those tasks.
Example (OLTP tables)
TASK
TaskId
ProjectId
TaskClassification
TaskSection
TaskDescription
PROJECT
ProjectId
ProjectType
ProjectName
ProjectOwnerEmpId
TASK_HOURS_WORKED
TaskId
DeveloperId
WorkedDt
WorkedHours
Dimension design option #1 – Treat the TASK as a factless task table itself.
FactTask
ProjectKey
TaskClassificationKey
TaskSectionKey
DimProject
ProjectKey
ProjectTypeDesc
ProjectOwnerName
ProjectOwnerDepartment
ProjectName
DimTaskClassification
TaskClassKey
TaskClassificationId
ClassificationType
DimSection
SectionKey
SectionGroup
SectionType
SectionName
ClassificationGroup
ClassificationName
Above seems correct / workable when treating the instance of a task as a “fact”
However, When I add in the FactTaskWorkedHours table I find myself reproducing/supporting the same dimensional keys on that fact table which already exist on the FactTask. Leads me to believe that FactTask, DimSection and DimClassification should not exist and should be part of a DimTask table to store those attributes and thus treat DimTask as a fact table when needed.
The down side to easier maintenance that goes along with a “DimTask” as both a dimension and a Fact table would be; I don’t have a “master list” of available Classifications and can only rely on the classifications that are actually assigned to a task, not to mention the expense of going thru a larger “dim table” just to get an small attribute like classification (100,000 tasks versus 20-30 classifications)
Let's keep this simple for discussion purposes (I’m leaving off dimensions and surrogate key fields on fact table, it really isn’t this small)
We are trying to build a dimensional DW for a source system which
tracks developer tasks and the time being logged to those tasks.
Example (OLTP tables)
TASK
TaskId
ProjectId
TaskClassification
TaskSection
TaskDescription
PROJECT
ProjectId
ProjectType
ProjectName
ProjectOwnerEmpId
TASK_HOURS_WORKED
TaskId
DeveloperId
WorkedDt
WorkedHours
Dimension design option #1 – Treat the TASK as a factless task table itself.
FactTask
ProjectKey
TaskClassificationKey
TaskSectionKey
DimProject
ProjectKey
ProjectTypeDesc
ProjectOwnerName
ProjectOwnerDepartment
ProjectName
DimTaskClassification
TaskClassKey
TaskClassificationId
ClassificationType
DimSection
SectionKey
SectionGroup
SectionType
SectionName
ClassificationGroup
ClassificationName
Above seems correct / workable when treating the instance of a task as a “fact”
However, When I add in the FactTaskWorkedHours table I find myself reproducing/supporting the same dimensional keys on that fact table which already exist on the FactTask. Leads me to believe that FactTask, DimSection and DimClassification should not exist and should be part of a DimTask table to store those attributes and thus treat DimTask as a fact table when needed.
The down side to easier maintenance that goes along with a “DimTask” as both a dimension and a Fact table would be; I don’t have a “master list” of available Classifications and can only rely on the classifications that are actually assigned to a task, not to mention the expense of going thru a larger “dim table” just to get an small attribute like classification (100,000 tasks versus 20-30 classifications)
dkbunkers- Posts : 3
Join date : 2010-04-06
Re: Factless fact table versus Dimension (downstream implications)
However, When I add in the FactTaskWorkedHours table I find myself reproducing/supporting the same dimensional keys on that fact table which already exist on the FactTask. Leads me to believe that FactTask, DimSection and DimClassification should not exist and should be part of a DimTask table to store those attributes and thus treat DimTask as a fact table when needed.
What do you need to accompish? There is nothing inherently wrong with using dimensions in more than one fact table.
From what I understand, the FactTask table exists to record all possible tasks associated with a project, while the other fact table is to record hours posted against specific tasks for the project. If you need both to support user requirements, so be it. You are tracking two different things relating to the same set of dimensions... that's just a normal day at the office. Two fact tables is the correct approach.
Re: Factless fact table versus Dimension (downstream implications)
Thanks for input. Trying to weigh the pros/cons of dimensional theory and shared dimensions between the 2 facts (Task itself and logged work time on the task) against the the maintenance of having 10-15 dimensions hanging off FactTask that also need to hang off of FactTaskWorkedHours since I care about those dimensions as attributes of the worked hours (since they are attributes of the task itself). The relational oltp side of me things life is simpler if DimTask exist with all the "descriptors" within DimTask itself versus de-normalizing those same attributes as part of many dimensions any time I want to measure something that is related to the task as well). So you would agree that dim theory says that if/when a new Task Attritbute(s) is created in OLTP system that I'd be pulling an new dimension and adding keys back to dimension on both FactTask and FactTaskWorkedHours?
dkbunkers- Posts : 3
Join date : 2010-04-06
Re: Factless fact table versus Dimension
I'm not quite sure this was covered above so I'll put this out for discussioni:
What about the scenario where one business requirement might be best satisfied by defining a factless table, yet another best be satisfied by a dimension?
Example: Policy. I could go either way with this one. It may be more efficient to decompose all elements of a policy (status, broker, etc) into various dimensions. It equally makes sense to make Policy a conformed dimension, allowing you to then attach it to multiple fact tables when there's interest in measuring premium (or case counts, or elig lives counts, etc).
* * *
1. Any general guidelines that would tell me when a concept (like Policy) be a factless table or a dim?
2. Can both a Policy_Factless_Fact and a Policy_Dim coexist in a system, depending on their usage? Pros and cons?
-Seadog
What about the scenario where one business requirement might be best satisfied by defining a factless table, yet another best be satisfied by a dimension?
Example: Policy. I could go either way with this one. It may be more efficient to decompose all elements of a policy (status, broker, etc) into various dimensions. It equally makes sense to make Policy a conformed dimension, allowing you to then attach it to multiple fact tables when there's interest in measuring premium (or case counts, or elig lives counts, etc).
* * *
1. Any general guidelines that would tell me when a concept (like Policy) be a factless table or a dim?
2. Can both a Policy_Factless_Fact and a Policy_Dim coexist in a system, depending on their usage? Pros and cons?
-Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Factless fact table versus Dimension (downstream implications)
Exactly the scenario I was trying to futher understand regarding best practices or guidelines , although worded much better.
dkbunkers- Posts : 3
Join date : 2010-04-06
Re: Factless fact table versus Dimension (downstream implications)
Thanks for the feedback.
Most dimensions to me are clear-cut - the static, 'reference' ones - status, priority, jurisdiction, occupation, and the 'master data' ones - customer, product, etc.
The ones I struggle with (those that have been implemented prior to my joining the company) are the ones more 'transactional' in nature, implemented as dimensions, and are quite volatile and thereform FCD's. It might just be the way they were modeled that is causing them to be FCD's (i.e., they contain multiple cols (paid-to-date, policy status date) that to me are transactional and therefore very volatile. The dimensional models were orig built by developers (not architects) familiar with relational, not dimensional, modeling, so I'm having to undo years of muddy design and understanding. When all dates are put into dimensions, you have little choice but to link dimension to dimension (the dreaded snowflake) in order to take advantage of the robust attributes of the Date dimension.
To me, a policy is a transactional concept. Pulling the various components together - customer, product, benefits, broker, plan, etc - seems to be the purpose of a factless table. So many of these components rely on dates (prod eff date, broker eff date, benefit eff date, etc) that are constantly in flux. On the other hand, it could also be considered a living entity, like a Customer, that changes over time.
Yes, another day at the office. I just want to get a better sense of the 'Great Taste - Less Filling' debate of when a concept (like Policy) is a transaction and when is it a dimension..... and whether the two can coexist when they may represent the same thing.
-Seadog
Most dimensions to me are clear-cut - the static, 'reference' ones - status, priority, jurisdiction, occupation, and the 'master data' ones - customer, product, etc.
The ones I struggle with (those that have been implemented prior to my joining the company) are the ones more 'transactional' in nature, implemented as dimensions, and are quite volatile and thereform FCD's. It might just be the way they were modeled that is causing them to be FCD's (i.e., they contain multiple cols (paid-to-date, policy status date) that to me are transactional and therefore very volatile. The dimensional models were orig built by developers (not architects) familiar with relational, not dimensional, modeling, so I'm having to undo years of muddy design and understanding. When all dates are put into dimensions, you have little choice but to link dimension to dimension (the dreaded snowflake) in order to take advantage of the robust attributes of the Date dimension.
To me, a policy is a transactional concept. Pulling the various components together - customer, product, benefits, broker, plan, etc - seems to be the purpose of a factless table. So many of these components rely on dates (prod eff date, broker eff date, benefit eff date, etc) that are constantly in flux. On the other hand, it could also be considered a living entity, like a Customer, that changes over time.
Yes, another day at the office. I just want to get a better sense of the 'Great Taste - Less Filling' debate of when a concept (like Policy) is a transaction and when is it a dimension..... and whether the two can coexist when they may represent the same thing.
-Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Factless fact table versus Dimension (downstream implications)
ngalemmo;
any comments on the above?
-Seadog
any comments on the above?
-Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Factless fact table versus Dimension (downstream implications)
The basic function of a fact table is to record business activity or business states. The relationship between policy and broker, for example, is a business state. If policy is a conforming dimension, then if the intent is to track things, such as, which policies a broker carries, then a factless fact table makes sense.
But, if 'policy' actually represents a unique combination of products, deductables, riders and other conditions specific to a sale, then from a sales perspective, policy (like order number) is a degenerate dimension. The sales fact would contain a combination of dimensions and facts relating to what makes up the policy. In such a situation, tying the definition of a policy to a sale (or quote) make sense because a policy (in this context) does not exist unless a sale is made.
But, if 'policy' actually represents a unique combination of products, deductables, riders and other conditions specific to a sale, then from a sales perspective, policy (like order number) is a degenerate dimension. The sales fact would contain a combination of dimensions and facts relating to what makes up the policy. In such a situation, tying the definition of a policy to a sale (or quote) make sense because a policy (in this context) does not exist unless a sale is made.
Re: Factless fact table versus Dimension (downstream implications)
The statement that a fact table functions to “record business activity or business states” is key.
Most certainly a Policy represents a unique combination of products, deductibles, riders, etc. It also contains strong transactional aspects as well (changing status, billing dates, etc). Your note seems to suggest that Policy could be modeled either way, as a factless table or as a conformed dimension. Could it be both?
I'm leanding towards Policy as a data mart, with fact table and surrounding component dimensions, rather than as an individual conformed dimension. If used as a DD in other fact tables, Policy ID could be used to drill-across to this Policy Data Mart for all info about that Policy. This takes care of the volatile dates within the dimension. Would you agree with this approach?
But, are there ever times where it makes sense to model Policy as both a data mart (factless or with facts, like premium) and as a conformed dimension, for purposes depending on the query? Or is this overkill in your mind?
Thanks.
-Seadog
Most certainly a Policy represents a unique combination of products, deductibles, riders, etc. It also contains strong transactional aspects as well (changing status, billing dates, etc). Your note seems to suggest that Policy could be modeled either way, as a factless table or as a conformed dimension. Could it be both?
I'm leanding towards Policy as a data mart, with fact table and surrounding component dimensions, rather than as an individual conformed dimension. If used as a DD in other fact tables, Policy ID could be used to drill-across to this Policy Data Mart for all info about that Policy. This takes care of the volatile dates within the dimension. Would you agree with this approach?
But, are there ever times where it makes sense to model Policy as both a data mart (factless or with facts, like premium) and as a conformed dimension, for purposes depending on the query? Or is this overkill in your mind?
Thanks.
-Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Factless fact table versus Dimension (downstream implications)
If a policy is the insurance equivalent of an order, trying to make a dimension of it, to represent all the characteristics of a policy, would drive you crazy... literally. Sure, there is a policy number, but that is about it. Treat it as a degenerate dimension for the fact tables.
What matters is the fact tables and what they represent. It would seem to me there would be multiple facts at different grains representing different events and states to cover the full lifecycle. I wouldn't try for some uber-table that condenses the entire lifecycle until I have at least built all the components.
What matters is the fact tables and what they represent. It would seem to me there would be multiple facts at different grains representing different events and states to cover the full lifecycle. I wouldn't try for some uber-table that condenses the entire lifecycle until I have at least built all the components.
Re: Factless fact table versus Dimension (downstream implications)
Thanks, I think I'm straight on this now.
-Seadog
-Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Similar topics
» Fact Table or Factless Table: Please Suggest
» Factless Fact table or 1:M Dimension Relation
» Wide and large Dimension or Survey Factless Fact Table
» Help with design of Factless Fact Table SCD
» Factless Fact table or 1:M Dimension Relation
» Wide and large Dimension or Survey Factless Fact Table
» Help with design of Factless Fact Table SCD
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum