Parent-Child FactTabletechnique

ObjectiveC Fri Mar 18, 2011 7:21 pm

Hello everyone,

I am just starting out building datawarehouses, but I can't get my head around the parent-child facttable.

My problem is that I have a parent-child structure(table1 -< table2) and according to design tip #25 you should merge these two tables into one fact table. My problem with this is that the parent values in a row won't be additves so how would you go about this ? Am I missing the point ? And what kind of relation will the dimensions have on the fact table ? A one-to-one ?

I'd Liked to get this technique down, since this will be essential if you'd want to create a datawarehouse design on something like table1 -< table2 -< and so on.

This is the kind of table I am expecting, but if this is the case the parent value (P_value) isn't an additive anymore. (ignore dots svp)
+-----+---------+------+---------+
| P_ID | P_value | C_ID | C_value |
+-----+---------+------+---------+
| 1.....| 2000......| 1......| 12........|
+-----+---------+------+---------+
| 1.....| 2000......| 2......| 23........|
+-----+---------+------+---------+

Any help will be appreciated !

MJGascoyne Sat Mar 19, 2011 4:31 pm

Hi,

My suggestions are:

The most simple scenario. If the parent table contains a value that is the sum of the values held at the child level then you may not need to hold a separate column. An example of this would be where you are dealing with Order and Line item levels. E.g. Order 1 has two line items 1 + 2, the value on item 1 = 10 and the value on item 2 = 20. This would mean the value at the order (parent) level is 30. In this case the value at the parent level can be derived through aggregation from the child values.

If the values can't be aggregated up neatly here are two other options to consider.
Firstly, hold the data in two separate fact tables - A base level fact table and an aggregated parent level fact table. In this instance the base table would hold only C_Value, whilst the parent/ aggregated fact table would hold an aggregated C_Value and the P_Value. Obviously your parent fact table may lack the dimensional grain of the child table.
Secondly, apportion the parent level values down to the granularity of the child fact table. Using your example, P_Value would be apportioned to hold 1000 in record one and 1000 in record two. Be careful with this approach, your users would need to appreciate that P_Value is being used like this.

Cheers

ObjectiveC Sun Mar 20, 2011 2:12 pm

Hello MJGascoyne,

First off, thanks for replying, I really appreciate it !

The first option is a no-go since that is not the case.

The second option is something I have my doubts with. Because if you would divide the parent value with the number of children rows, things get complex very quickly and a single parent value would mean nothing anymore.

Furthermore, how can the right choice be giving both dimensions there own fact_table ? Would this not introduce an extra join which, in the end, will give you poor performance ?

I mean, if I have a structure like table1 -< table2 -< table3 and all of them have additives, should that mean that I create a fact table for all of them (at a different granularity) ? If that is the case then I'll end up with six tables and that just can't be good on performance, or can it ?

Does anyone have any suggestions or design tips on this topic ?
Perhaps a design you used (or would use) ?

Any help will be appreciated !

Mike Honey Sun Mar 20, 2011 7:58 pm

Here's a pattern I've used successfully in the past:

Essentially I split the data into a fact and a dimension table, then rely on a good OLAP tool to handle the aggregation issue.

So following your example, the Dimension would need a concatenated Key (to ensure uniqueness):

My_Dim_ID	Reports_To_My_Dim_ID	My_Node_Name
P1	NULL	Parent 1
C1	P1	Child 1
C2	P1	Child 2

Then the fact table just has to list the values for each node, e.g.

My_Dim_ID	Value
P1	2000
C1	12
C2	23

The magic is in the OLAP tool (I use SQL Server Analysis Services) where you can define a true Parent-Child dimension and control the aggregation.

I find this method is very quick to implement and test, and doesn't rely on complex structures, code and "user understanding" - it just presents the right results every time.

You can query this data (with MDX) in various ways that cover most scenarios, e.g.:
http://hccmsbi.blogspot.com/2007/07/data-member-part-1.html
http://hccmsbi.blogspot.com/2007/07/data-member-part-2.html

Good luck!
Mike

ngalemmo Mon Mar 21, 2011 11:39 am

It would be nice if there was more background as it is difficult to come up with the proper solution discussing this generically. But...

If you have measures with different grains, as you represent, then you need to create two fact tables.

With that said, there are approaches that you can use to avoid this, particularly if there is nothing special about the header itself. In a sales order model, for example, you may have freight charges at the header. A common method to integrate it with the line would be to create a freight charge 'product' and place the charges on its own line. The product dimension it references would have appropriate flags to make it easy to segregate such charges when reporting.

ObjectiveC Tue Mar 22, 2011 4:43 am

@mikehoney
This is an awesome idea. This design will definitely help later on. Thanks !

@ngalemmo
You could compare it with a recipe that has many ingredients (the background will consist of a lot of domain specific knowledge).

I did another analysis today and came across a way to actually aggregrate some (important ones) of the parent values (a great breakthrough :-) ). Now my question is if I should aggregate the values from the fact_table or if I should store the values in the parent dimension. What is better on performance and what is considered to be best practice ? My first thought would be to aggregate them from the fact_table, but I don't know how this will effect performance.

What are your thoughts on this ?

Again, thanks for taking the time to help me out. Appreciate it !

ngalemmo Wed Mar 23, 2011 12:17 pm

Do not store measures in a dimension table.

Sponsored content

Parent-Child FactTabletechnique

Parent-Child FactTabletechnique

Parent-Child FactTabletechnique

Re: Parent-Child FactTabletechnique

Re: Parent-Child FactTabletechnique

Re: Parent-Child FactTabletechnique

Re: Parent-Child FactTabletechnique

Re: Parent-Child FactTabletechnique

Re: Parent-Child FactTabletechnique