Fact Table(s) for Invoice Lines with Further Granularity
2 posters
Page 1 of 1
Fact Table(s) for Invoice Lines with Further Granularity
I’ve been struggling while developing a fact table for invoice transactions and could use some guidance on how to proceed.
Our report users have two basic types of questions:
1. What is the profitability of our invoiced sales?
2. Which items were sold for invoiced sales?
The answers to these are complicated by the fact that our business definition of an “item sold” is different from the definition in our ERP system where the invoice transactions originate.
Our ERP system allows for the creation of assembly items that incorporate other items as components. When an assembly is completed prior to a sale, the individual components cease to exist and inventory costs are rolled up into the final assembly. The only item recorded on invoice transactions is the finished assembly. As far as the ERP system is concerned, components are never invoiced and there is no robust relationship between invoice lines for assembly items and the components that made up those assembly items at the time an invoice was created. Accordingly, no price is recorded for these components which makes profit calculations impossible unless prices are allocated from the invoice line level.
My problem is that our organization makes no distinction between items sold outright on invoices and items that are sold as components of assemblies. Many of our assemblies are really just kits of parts that are often sold together. There is a certain type of order transaction (production orders) for which the component items associated with invoice lines can be determined with reasonable reliability. Report users want as many components as are known to be included together with the actual invoice items when evaluating facts like quantity, profit, price, and cost for invoiced sales.
I’m considering the following approaches to building fact tables. I hope that these examples will help to further clarify the problems I’m trying to overcome
--------------------------------------------
Option 1: Use two fact tables. For the sake of discussion, I’ve only shown one foreign key (item_key) for a dimension with a grain of product SKU.
fact_sales_invoice_line (Grain: Invoice Line)
• item_key
• invoice_number
• invoice_line_number
• quantity_shipped
• price
• cost
• profit_amount
fact_sales_invoice_line_component (Grain: Component)
• item_key
• invoice_number
• invoice_line_number
• invoice_line_component_number (This is an invented column that does not exist anywhere outside of this table)
• quantity_shipped
• cost
In this model, the fact_sales_invoice_line table is a faithful representation of the real-world invoice lines generated by our ERP system. The fact_sales_invoice_line_component would be populated with one rowfor every component associated with an invoice line item. In addition, a row will also be inserted that relates to the actual invoice line item, and this row will have an invoice_line_component_number of zero. Price and calculated profit columns are omitted from the second table because this information does not exist at the fact_sales_invoice_line_component table’s granularity.
The invoice line fact table can be used to answer profitability questions, while the component fact table can be used to answer questions about which items were sold, how many of them were there, and what the cost was.
--------------------------------------------
Option 2: Single fact table with NULL values
fact_sales_invoice_line_component (Grain: Invoice Line Component)
• item_key
• invoice_number
• invoice_line_number
• invoice_line_component_number (derived column)
• quantity_shipped
• price (nullable)
• cost (nullable)
• profit_amount (nullable)
The fact columns would be populated with NULL values to allow for correct aggregation as shown below:
item_key invoice_number invoice_line_number invoice_line_component_number quantity price cost profit_amount
1 10001 1 0 1 1000 NULL 970
2 10001 1 1 1 NULL 10 NULL
3 10001 1 2 1 NULL 20 NULL
1 10002 1 0 1 3000 6 2994
4 10002 2 0 1 4000 16 3984
4 10003 1 0 1 5000 NULL 4945
5 10004 1 0 1 2000 NULL 1967
3 10004 1 1 1 NULL 33 NULL
1 10004 2 0 1 9000 77 8923
It looks odd to have price and profit values in a row with NULL cost, but I think what I’ve shown above causes the math to work out correctly in all dimensions.
--------------------------------------------
Option 3: Allocate price to components per page 121 of The Data Warehouse Toolkit book
I ruled this out because I do not believe that a proper parent-child relationship exists between invoice lines and components. Our users need invoice lines to be considered in calculations at the same grain as the components. There is a derived parent-child relationship in our ERP system data, but the business relationship places them at the same level.
--------------------------------------------
How would you handle this situation?
Do you prefer any one option above over the others?
Do you know of an alternative that would work better?
Is there anything important I've overlooked?
Our report users have two basic types of questions:
1. What is the profitability of our invoiced sales?
2. Which items were sold for invoiced sales?
The answers to these are complicated by the fact that our business definition of an “item sold” is different from the definition in our ERP system where the invoice transactions originate.
Our ERP system allows for the creation of assembly items that incorporate other items as components. When an assembly is completed prior to a sale, the individual components cease to exist and inventory costs are rolled up into the final assembly. The only item recorded on invoice transactions is the finished assembly. As far as the ERP system is concerned, components are never invoiced and there is no robust relationship between invoice lines for assembly items and the components that made up those assembly items at the time an invoice was created. Accordingly, no price is recorded for these components which makes profit calculations impossible unless prices are allocated from the invoice line level.
My problem is that our organization makes no distinction between items sold outright on invoices and items that are sold as components of assemblies. Many of our assemblies are really just kits of parts that are often sold together. There is a certain type of order transaction (production orders) for which the component items associated with invoice lines can be determined with reasonable reliability. Report users want as many components as are known to be included together with the actual invoice items when evaluating facts like quantity, profit, price, and cost for invoiced sales.
I’m considering the following approaches to building fact tables. I hope that these examples will help to further clarify the problems I’m trying to overcome
--------------------------------------------
Option 1: Use two fact tables. For the sake of discussion, I’ve only shown one foreign key (item_key) for a dimension with a grain of product SKU.
fact_sales_invoice_line (Grain: Invoice Line)
• item_key
• invoice_number
• invoice_line_number
• quantity_shipped
• price
• cost
• profit_amount
fact_sales_invoice_line_component (Grain: Component)
• item_key
• invoice_number
• invoice_line_number
• invoice_line_component_number (This is an invented column that does not exist anywhere outside of this table)
• quantity_shipped
• cost
In this model, the fact_sales_invoice_line table is a faithful representation of the real-world invoice lines generated by our ERP system. The fact_sales_invoice_line_component would be populated with one rowfor every component associated with an invoice line item. In addition, a row will also be inserted that relates to the actual invoice line item, and this row will have an invoice_line_component_number of zero. Price and calculated profit columns are omitted from the second table because this information does not exist at the fact_sales_invoice_line_component table’s granularity.
The invoice line fact table can be used to answer profitability questions, while the component fact table can be used to answer questions about which items were sold, how many of them were there, and what the cost was.
--------------------------------------------
Option 2: Single fact table with NULL values
fact_sales_invoice_line_component (Grain: Invoice Line Component)
• item_key
• invoice_number
• invoice_line_number
• invoice_line_component_number (derived column)
• quantity_shipped
• price (nullable)
• cost (nullable)
• profit_amount (nullable)
The fact columns would be populated with NULL values to allow for correct aggregation as shown below:
item_key invoice_number invoice_line_number invoice_line_component_number quantity price cost profit_amount
1 10001 1 0 1 1000 NULL 970
2 10001 1 1 1 NULL 10 NULL
3 10001 1 2 1 NULL 20 NULL
1 10002 1 0 1 3000 6 2994
4 10002 2 0 1 4000 16 3984
4 10003 1 0 1 5000 NULL 4945
5 10004 1 0 1 2000 NULL 1967
3 10004 1 1 1 NULL 33 NULL
1 10004 2 0 1 9000 77 8923
It looks odd to have price and profit values in a row with NULL cost, but I think what I’ve shown above causes the math to work out correctly in all dimensions.
--------------------------------------------
Option 3: Allocate price to components per page 121 of The Data Warehouse Toolkit book
I ruled this out because I do not believe that a proper parent-child relationship exists between invoice lines and components. Our users need invoice lines to be considered in calculations at the same grain as the components. There is a derived parent-child relationship in our ERP system data, but the business relationship places them at the same level.
--------------------------------------------
How would you handle this situation?
Do you prefer any one option above over the others?
Do you know of an alternative that would work better?
Is there anything important I've overlooked?
jonc- Posts : 2
Join date : 2012-10-25
Re: Fact Table(s) for Invoice Lines with Further Granularity
The design makes sense. However, I'm not sure you need the extra row for the actual product in your component table, unless the product sold was a component (i.e. has no other components), and would carry quantity and cost.
Re: Fact Table(s) for Invoice Lines with Further Granularity
Thanks for the reply ngalemmo.
I assume you are referring to the tables described in Option#1. About 20% of the items we sale are components of assemblies, and the remainder have no components. The same SKU can be sold as a component of something else or sold by itself, and both scenarios happen quite frequently.
It is my intention to place rows in the component table for all of these items, including the ones that don’t have components. The reason is to allow aggregations such as average cost by SKU to be calculated over a set consisting of all instances when an item was sold.
For the same reason, I’m also considering placing the assembly header item (the item on the real invoice transaction) in this table because it is desirable to analyze these sales along with the others. Any given SKU that is used as an assembly with components on one invoice line can also be purchased and sold directly without components on another invoice line. If I go this route, I’m unsure whether it would be correct to place the actual cost on a header item row or set it to zero because the cost is already represented by the components; setting it to anything other than zero would not allow cost to be correctly rolled up using the invoice_number degraded dimension or various other dimensions.
I’m trying to decide between Option #1 and Option #2.
Option #1 is beneficial because invoice line data is recorded in a way that corresponds to the actual business process surrounding invoicing. Any component data that is thought to be related to these invoice lines is placed in a separate fact table with a different purpose. The disadvantage is that it would force our users to run two separate analyses to get the information they need. They can know the profitability of a sale, or they can know the items sold, but they won’t be able to know both at once.
Option #2 attempts to overcome the drawbacks of the first option by recording price and profit while forcing an invoice line into the same fact table as an invoice line component. NULL values are placed in fact columns that are not applicable to a particular row. For instance, price and profit will be populated for invoice lines that have components, but cost will be NULL. Cost will be populated for components of invoice lines, but price and profit will be NULL. The only rows that will have all values populated are items that are sold outright with no components.
The first option seems more correct to me.
I assume you are referring to the tables described in Option#1. About 20% of the items we sale are components of assemblies, and the remainder have no components. The same SKU can be sold as a component of something else or sold by itself, and both scenarios happen quite frequently.
It is my intention to place rows in the component table for all of these items, including the ones that don’t have components. The reason is to allow aggregations such as average cost by SKU to be calculated over a set consisting of all instances when an item was sold.
For the same reason, I’m also considering placing the assembly header item (the item on the real invoice transaction) in this table because it is desirable to analyze these sales along with the others. Any given SKU that is used as an assembly with components on one invoice line can also be purchased and sold directly without components on another invoice line. If I go this route, I’m unsure whether it would be correct to place the actual cost on a header item row or set it to zero because the cost is already represented by the components; setting it to anything other than zero would not allow cost to be correctly rolled up using the invoice_number degraded dimension or various other dimensions.
I’m trying to decide between Option #1 and Option #2.
Option #1 is beneficial because invoice line data is recorded in a way that corresponds to the actual business process surrounding invoicing. Any component data that is thought to be related to these invoice lines is placed in a separate fact table with a different purpose. The disadvantage is that it would force our users to run two separate analyses to get the information they need. They can know the profitability of a sale, or they can know the items sold, but they won’t be able to know both at once.
Option #2 attempts to overcome the drawbacks of the first option by recording price and profit while forcing an invoice line into the same fact table as an invoice line component. NULL values are placed in fact columns that are not applicable to a particular row. For instance, price and profit will be populated for invoice lines that have components, but cost will be NULL. Cost will be populated for components of invoice lines, but price and profit will be NULL. The only rows that will have all values populated are items that are sold outright with no components.
The first option seems more correct to me.
Last edited by jonc on Fri Oct 26, 2012 9:05 am; edited 1 time in total (Reason for editing : typo)
jonc- Posts : 2
Join date : 2012-10-25
Similar topics
» Granularity - One Fact Table or Two
» Granularity of Fact table
» Defining the granularity for a Fact Table
» combine order, invoice, and backlog detail in one fact table
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Granularity of Fact table
» Defining the granularity for a Fact Table
» combine order, invoice, and backlog detail in one fact table
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum