Parent-Child FactTabletechnique
4 posters
Page 1 of 1
Parent-Child FactTabletechnique
Hello everyone,
I am just starting out building datawarehouses, but I can't get my head around the parent-child facttable.
My problem is that I have a parent-child structure(table1 -< table2) and according to design tip #25 you should merge these two tables into one fact table. My problem with this is that the parent values in a row won't be additves so how would you go about this ? Am I missing the point ? And what kind of relation will the dimensions have on the fact table ? A one-to-one ?
I'd Liked to get this technique down, since this will be essential if you'd want to create a datawarehouse design on something like table1 -< table2 -< and so on.
This is the kind of table I am expecting, but if this is the case the parent value (P_value) isn't an additive anymore. (ignore dots svp)
+-----+---------+------+---------+
| P_ID | P_value | C_ID | C_value |
+-----+---------+------+---------+
| 1.....| 2000......| 1......| 12........|
+-----+---------+------+---------+
| 1.....| 2000......| 2......| 23........|
+-----+---------+------+---------+
Any help will be appreciated !
I am just starting out building datawarehouses, but I can't get my head around the parent-child facttable.
My problem is that I have a parent-child structure(table1 -< table2) and according to design tip #25 you should merge these two tables into one fact table. My problem with this is that the parent values in a row won't be additves so how would you go about this ? Am I missing the point ? And what kind of relation will the dimensions have on the fact table ? A one-to-one ?
I'd Liked to get this technique down, since this will be essential if you'd want to create a datawarehouse design on something like table1 -< table2 -< and so on.
This is the kind of table I am expecting, but if this is the case the parent value (P_value) isn't an additive anymore. (ignore dots svp)
+-----+---------+------+---------+
| P_ID | P_value | C_ID | C_value |
+-----+---------+------+---------+
| 1.....| 2000......| 1......| 12........|
+-----+---------+------+---------+
| 1.....| 2000......| 2......| 23........|
+-----+---------+------+---------+
Any help will be appreciated !
ObjectiveC- Posts : 25
Join date : 2011-03-18
Parent-Child FactTabletechnique
Hi,
My suggestions are:
The most simple scenario. If the parent table contains a value that is the sum of the values held at the child level then you may not need to hold a separate column. An example of this would be where you are dealing with Order and Line item levels. E.g. Order 1 has two line items 1 + 2, the value on item 1 = 10 and the value on item 2 = 20. This would mean the value at the order (parent) level is 30. In this case the value at the parent level can be derived through aggregation from the child values.
If the values can't be aggregated up neatly here are two other options to consider.
Firstly, hold the data in two separate fact tables - A base level fact table and an aggregated parent level fact table. In this instance the base table would hold only C_Value, whilst the parent/ aggregated fact table would hold an aggregated C_Value and the P_Value. Obviously your parent fact table may lack the dimensional grain of the child table.
Secondly, apportion the parent level values down to the granularity of the child fact table. Using your example, P_Value would be apportioned to hold 1000 in record one and 1000 in record two. Be careful with this approach, your users would need to appreciate that P_Value is being used like this.
Cheers
My suggestions are:
The most simple scenario. If the parent table contains a value that is the sum of the values held at the child level then you may not need to hold a separate column. An example of this would be where you are dealing with Order and Line item levels. E.g. Order 1 has two line items 1 + 2, the value on item 1 = 10 and the value on item 2 = 20. This would mean the value at the order (parent) level is 30. In this case the value at the parent level can be derived through aggregation from the child values.
If the values can't be aggregated up neatly here are two other options to consider.
Firstly, hold the data in two separate fact tables - A base level fact table and an aggregated parent level fact table. In this instance the base table would hold only C_Value, whilst the parent/ aggregated fact table would hold an aggregated C_Value and the P_Value. Obviously your parent fact table may lack the dimensional grain of the child table.
Secondly, apportion the parent level values down to the granularity of the child fact table. Using your example, P_Value would be apportioned to hold 1000 in record one and 1000 in record two. Be careful with this approach, your users would need to appreciate that P_Value is being used like this.
Cheers
MJGascoyne- Posts : 2
Join date : 2011-02-03
Re: Parent-Child FactTabletechnique
Hello MJGascoyne,
First off, thanks for replying, I really appreciate it !
The first option is a no-go since that is not the case.
The second option is something I have my doubts with. Because if you would divide the parent value with the number of children rows, things get complex very quickly and a single parent value would mean nothing anymore.
Furthermore, how can the right choice be giving both dimensions there own fact_table ? Would this not introduce an extra join which, in the end, will give you poor performance ?
I mean, if I have a structure like table1 -< table2 -< table3 and all of them have additives, should that mean that I create a fact table for all of them (at a different granularity) ? If that is the case then I'll end up with six tables and that just can't be good on performance, or can it ?
Does anyone have any suggestions or design tips on this topic ?
Perhaps a design you used (or would use) ?
Any help will be appreciated !
First off, thanks for replying, I really appreciate it !
The first option is a no-go since that is not the case.
The second option is something I have my doubts with. Because if you would divide the parent value with the number of children rows, things get complex very quickly and a single parent value would mean nothing anymore.
Furthermore, how can the right choice be giving both dimensions there own fact_table ? Would this not introduce an extra join which, in the end, will give you poor performance ?
I mean, if I have a structure like table1 -< table2 -< table3 and all of them have additives, should that mean that I create a fact table for all of them (at a different granularity) ? If that is the case then I'll end up with six tables and that just can't be good on performance, or can it ?
Does anyone have any suggestions or design tips on this topic ?
Perhaps a design you used (or would use) ?
Any help will be appreciated !
ObjectiveC- Posts : 25
Join date : 2011-03-18
Re: Parent-Child FactTabletechnique
Here's a pattern I've used successfully in the past:
Essentially I split the data into a fact and a dimension table, then rely on a good OLAP tool to handle the aggregation issue.
So following your example, the Dimension would need a concatenated Key (to ensure uniqueness):
Then the fact table just has to list the values for each node, e.g.
The magic is in the OLAP tool (I use SQL Server Analysis Services) where you can define a true Parent-Child dimension and control the aggregation.
I find this method is very quick to implement and test, and doesn't rely on complex structures, code and "user understanding" - it just presents the right results every time.
You can query this data (with MDX) in various ways that cover most scenarios, e.g.:
http://hccmsbi.blogspot.com/2007/07/data-member-part-1.html
http://hccmsbi.blogspot.com/2007/07/data-member-part-2.html
Good luck!
Mike
Essentially I split the data into a fact and a dimension table, then rely on a good OLAP tool to handle the aggregation issue.
So following your example, the Dimension would need a concatenated Key (to ensure uniqueness):
My_Dim_ID | Reports_To_My_Dim_ID | My_Node_Name |
P1 | NULL | Parent 1 |
C1 | P1 | Child 1 |
C2 | P1 | Child 2 |
Then the fact table just has to list the values for each node, e.g.
My_Dim_ID | Value |
P1 | 2000 |
C1 | 12 |
C2 | 23 |
The magic is in the OLAP tool (I use SQL Server Analysis Services) where you can define a true Parent-Child dimension and control the aggregation.
I find this method is very quick to implement and test, and doesn't rely on complex structures, code and "user understanding" - it just presents the right results every time.
You can query this data (with MDX) in various ways that cover most scenarios, e.g.:
http://hccmsbi.blogspot.com/2007/07/data-member-part-1.html
http://hccmsbi.blogspot.com/2007/07/data-member-part-2.html
Good luck!
Mike
Re: Parent-Child FactTabletechnique
It would be nice if there was more background as it is difficult to come up with the proper solution discussing this generically. But...
If you have measures with different grains, as you represent, then you need to create two fact tables.
With that said, there are approaches that you can use to avoid this, particularly if there is nothing special about the header itself. In a sales order model, for example, you may have freight charges at the header. A common method to integrate it with the line would be to create a freight charge 'product' and place the charges on its own line. The product dimension it references would have appropriate flags to make it easy to segregate such charges when reporting.
If you have measures with different grains, as you represent, then you need to create two fact tables.
With that said, there are approaches that you can use to avoid this, particularly if there is nothing special about the header itself. In a sales order model, for example, you may have freight charges at the header. A common method to integrate it with the line would be to create a freight charge 'product' and place the charges on its own line. The product dimension it references would have appropriate flags to make it easy to segregate such charges when reporting.
Re: Parent-Child FactTabletechnique
@mikehoney
This is an awesome idea. This design will definitely help later on. Thanks !
@ngalemmo
You could compare it with a recipe that has many ingredients (the background will consist of a lot of domain specific knowledge).
I did another analysis today and came across a way to actually aggregrate some (important ones) of the parent values (a great breakthrough :-) ). Now my question is if I should aggregate the values from the fact_table or if I should store the values in the parent dimension. What is better on performance and what is considered to be best practice ? My first thought would be to aggregate them from the fact_table, but I don't know how this will effect performance.
What are your thoughts on this ?
Again, thanks for taking the time to help me out. Appreciate it !
This is an awesome idea. This design will definitely help later on. Thanks !
@ngalemmo
You could compare it with a recipe that has many ingredients (the background will consist of a lot of domain specific knowledge).
I did another analysis today and came across a way to actually aggregrate some (important ones) of the parent values (a great breakthrough :-) ). Now my question is if I should aggregate the values from the fact_table or if I should store the values in the parent dimension. What is better on performance and what is considered to be best practice ? My first thought would be to aggregate them from the fact_table, but I don't know how this will effect performance.
What are your thoughts on this ?
Again, thanks for taking the time to help me out. Appreciate it !
ObjectiveC- Posts : 25
Join date : 2011-03-18
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum