Design Tip #25 Modeling Issue
3 posters
Page 1 of 1
Design Tip #25 Modeling Issue
I have some questions related to Design Tip #25 and Design Tip #95. I have a scenario almost near to Order & Order Items scenario but with different issues.
1. We have a system that has multiple parent-child relationships between many tables, sometimes parent tables are transaction tables and sometimes are look-up tables. The parent key of the transaction table is used in other transaction tables like the scenario mentioned in book DWH Toolkit page #93 section "Complementary Procurement Snapshot”. But in our case the different transaction tables are not related in business process (is not an accumulation processes) only joined together with the parent key to refer to the main source. For example, Transaction table "Customs_Declaration" has a relation with other transaction table related to other business process like "Inspections".
2. Main transaction tables has multiple attributes that reach 30 attribute
3. I am attaching a sample of the relations between tables that we have in the source data in image "Source DB.jpg"
and my suggested star schema in image "Star Schema.jpg" as per Kimball's recommendation in book DWH Toolkit page #94 section "Complementary Procurement Snapshot" Figure 4.2.
** My main idea is to store the main primary key (Natural key) of the transaction table as a "Degenerate Dimension" in other related Fact tables and when querying 2 facts together we have to specify the DD to filter other fact records.
** My colleague's concern on this idea is that:
1. Order – which includes some other attributes - is parent for other child business process (ex. Order Shipping) and in most cases we’ll have to query Order information (such as Vendor, Product, Price … etc.) to describe any of the related child business processes. Consequently, we’ll have a complex query to retrieve Order Shipping information plus information of the Order itself. i.e. when we need to query both fact tables to aggregate the measures ,we will need to filter the parent Fact on the "Natural Key" and filter the related fact on the "Degenerate Dimension" and also we will have multiple attributes as separate dimensions duplicated (Shared) between facts which will make the query more complex .
2. Most of the attributes related to main transaction table “More than 30 attribute” will be separate shared dimensions between other fact tables and this will duplicate dimensions more than needed.
My colleague has different thinking of modeling the DWH as following:
1. He suggest to have an Order dimension – in addition to Order Fact table - where Order Date, Vendor, Product and Store are represented as Outriggers and Order Price and Order Cost are represented as attributes for Order Dimension. Order Dimension will be joined to other business processes facts like Order Shipping Fact table
So now, I and my colleague have a conflict and we need your help for a resolution what is your recommendation on the following:
1. Dimensional Model: What is the best practice for modeling this DWH?
1. We have a system that has multiple parent-child relationships between many tables, sometimes parent tables are transaction tables and sometimes are look-up tables. The parent key of the transaction table is used in other transaction tables like the scenario mentioned in book DWH Toolkit page #93 section "Complementary Procurement Snapshot”. But in our case the different transaction tables are not related in business process (is not an accumulation processes) only joined together with the parent key to refer to the main source. For example, Transaction table "Customs_Declaration" has a relation with other transaction table related to other business process like "Inspections".
2. Main transaction tables has multiple attributes that reach 30 attribute
3. I am attaching a sample of the relations between tables that we have in the source data in image "Source DB.jpg"
and my suggested star schema in image "Star Schema.jpg" as per Kimball's recommendation in book DWH Toolkit page #94 section "Complementary Procurement Snapshot" Figure 4.2.
** My main idea is to store the main primary key (Natural key) of the transaction table as a "Degenerate Dimension" in other related Fact tables and when querying 2 facts together we have to specify the DD to filter other fact records.
** My colleague's concern on this idea is that:
1. Order – which includes some other attributes - is parent for other child business process (ex. Order Shipping) and in most cases we’ll have to query Order information (such as Vendor, Product, Price … etc.) to describe any of the related child business processes. Consequently, we’ll have a complex query to retrieve Order Shipping information plus information of the Order itself. i.e. when we need to query both fact tables to aggregate the measures ,we will need to filter the parent Fact on the "Natural Key" and filter the related fact on the "Degenerate Dimension" and also we will have multiple attributes as separate dimensions duplicated (Shared) between facts which will make the query more complex .
2. Most of the attributes related to main transaction table “More than 30 attribute” will be separate shared dimensions between other fact tables and this will duplicate dimensions more than needed.
My colleague has different thinking of modeling the DWH as following:
1. He suggest to have an Order dimension – in addition to Order Fact table - where Order Date, Vendor, Product and Store are represented as Outriggers and Order Price and Order Cost are represented as attributes for Order Dimension. Order Dimension will be joined to other business processes facts like Order Shipping Fact table
So now, I and my colleague have a conflict and we need your help for a resolution what is your recommendation on the following:
1. Dimensional Model: What is the best practice for modeling this DWH?
a_sherbeeny- Posts : 15
Join date : 2009-02-04
Re: Design Tip #25 Modeling Issue
For all practical purposes both models would work BUT, which one will provide the users with quicker analytics or reporting? Outriggers tend to slow things down by requiring more joins. From the user’s performance standpoint, fewer joins are better. I’d suggest reducing facts, bridges, and outriggers where ever possible.
Going off the deep end with this suggestion… If there’s any possible way to reduce the amount of fact tables, do it! Though, possibly confusing, Shipping Notices and Warehouse Receipts could be housed in the same fact with an indicator column describing if the row is a shipping or warehouse transaction. This has potentially heavy repercussions though… but it is an option.
Two ideas about the illustrate model.
First, in the proposed model there is not a dimension to naturally tie the fact tables together (at their most granular levels). This issue could possibly be alleviated by creating a dimension housing all valid permutations of Purchase Req Num, Shipping Not Num, and Purchase Order Num. This would create some conformity among the facts and capacity to naturally drill across related fact tables. (This could come in handy in the BI layer.) BUT… this is not a necessary dimension… things may work fine without it.
Second, for conformity among the fact tables dates should all be looked up against the same dimension.
Hope this helps!
Going off the deep end with this suggestion… If there’s any possible way to reduce the amount of fact tables, do it! Though, possibly confusing, Shipping Notices and Warehouse Receipts could be housed in the same fact with an indicator column describing if the row is a shipping or warehouse transaction. This has potentially heavy repercussions though… but it is an option.
Two ideas about the illustrate model.
First, in the proposed model there is not a dimension to naturally tie the fact tables together (at their most granular levels). This issue could possibly be alleviated by creating a dimension housing all valid permutations of Purchase Req Num, Shipping Not Num, and Purchase Order Num. This would create some conformity among the facts and capacity to naturally drill across related fact tables. (This could come in handy in the BI layer.) BUT… this is not a necessary dimension… things may work fine without it.
Second, for conformity among the fact tables dates should all be looked up against the same dimension.
Hope this helps!
KS_EDW- Posts : 20
Join date : 2011-09-07
Age : 49
Location : Kansas
Re: Design Tip #25 Modeling Issue
Your model looks fine, may need some additional dimensions (store, warehouse, etc), but is headed in the right direction. You may want to consider an aggregate fact table to provide an end-to-end view if query performance becomes an issue.
Don't like the order dimension idea. The order dimension, as described, just sounds like a fat fact table.
Don't like the order dimension idea. The order dimension, as described, just sounds like a fat fact table.
Re: Design Tip #25 Modeling Issue
Thank you KS_EDW for your reply,I really appreciate it.
Thank you ngalemmo valuable reply ,but I have some questions :
-- When should Order be considered as dimensions instead of Fact ??
-- In my model ,What about querying different measures from different Facts?,in this case I have to put filter on all Degenerate Dimensions in all fact tables selected ? right ?? Is there a workaround for this part ?
Thanks you again for your help.
Thank you ngalemmo valuable reply ,but I have some questions :
-- When should Order be considered as dimensions instead of Fact ??
-- In my model ,What about querying different measures from different Facts?,in this case I have to put filter on all Degenerate Dimensions in all fact tables selected ? right ?? Is there a workaround for this part ?
Thanks you again for your help.
a_sherbeeny- Posts : 15
Join date : 2009-02-04
Re: Design Tip #25 Modeling Issue
The basic modeling principle is that facts represent business events or state and dimensions provide context to the facts. An order is an business event, the context of that event are attributes related to the order.
The issue is how do you represent the order attributes? As an order dimension or as multiple dimensions? Most attributes carried on a order header more often than not are covered by existing dimensions... customer, address, dates, leaving a few leftovers. The leftovers are then placed in dimensions of their own or junk dimensions as appropriate. There are two reasons for this, one is performance as an order dimension would be very large relative to the facts. the second is dimensional conformance with other facts. If you have an order dimension which snowflaked customer, how do you integrate order facts with other facts about the customer? You wind up have to perform a fairly convoluted query to get everything together and your model lacks the clarity and simplicity a proper star schema provides.
As far as combining facts go, there are two basic methods:
1. Separate aggregate queries on both tables, grouping on common dimensions, then joining the two sets on those dimensions. or
2. A union query.
The issue is how do you represent the order attributes? As an order dimension or as multiple dimensions? Most attributes carried on a order header more often than not are covered by existing dimensions... customer, address, dates, leaving a few leftovers. The leftovers are then placed in dimensions of their own or junk dimensions as appropriate. There are two reasons for this, one is performance as an order dimension would be very large relative to the facts. the second is dimensional conformance with other facts. If you have an order dimension which snowflaked customer, how do you integrate order facts with other facts about the customer? You wind up have to perform a fairly convoluted query to get everything together and your model lacks the clarity and simplicity a proper star schema provides.
As far as combining facts go, there are two basic methods:
1. Separate aggregate queries on both tables, grouping on common dimensions, then joining the two sets on those dimensions. or
2. A union query.
Re: Design Tip #25 Modeling Issue
Dear ngalemmo , Thank you very much for your wonderful answer it was very clear and descriptive..Thank you again.
a_sherbeeny- Posts : 15
Join date : 2009-02-04
Similar topics
» Modeling issue
» User fields design issue
» Unusual(?) parent/child Fact table design issue
» Data Vault v's Dimensional Model
» Dimensional Modeling design
» User fields design issue
» Unusual(?) parent/child Fact table design issue
» Data Vault v's Dimensional Model
» Dimensional Modeling design
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum