Dimension Modeling Design challenge - Help Needed
Page 1 of 1
Dimension Modeling Design challenge - Help Needed
Guys,
Help me out identifying what’s wrong with my model. I am able to get to the right data, but size of the fact table is arguably high and not gladly acceptable.
Let me show you the picture (tip of the iceberg)
Data table in transaction db is my meter data table, which records every hour's meters data.
All these meters actually spread like nodes in the tree, so tree has meters.
Meters are part of group and group can be inside other group that can be inside another group forming a tree structure. Tree is such that each of the Childs has one and only one parent.
SO
Let's assume A is parent of B which is parent of C which in turn has got 3 meters m1, m2, m3
in transaction db, groups are stored separate and readings are separate. For simplicity I have not included time but to understand spread i did include 1 extra reading for a day.
Group Table
child ------------parent------------ meter
A ------------ Null------------ null
B ------------ A ------------ null
C ------------ B ------------ m1
C ------------ B ------------ m2
C ------------ B ------------ m3
Meter data Table
Meter ------------ Reading ------------ date
m1 ------------ 100 ------------ d1
m2 ------------ 24 ------------ d1
m3 ------------ 50 ------------ d1
m3 ------------ 56 ------------ d2
work like charm,
now we plan to make a datawarehouse:
we can't have same structure in dimensional model.
on resolving Group Table from above (i.e. generating an exploded group table)
Also give this table a surrogated Unique key which will be used later.
bridge table
parent -----subsidiary child --------meter -------bridgeID
A ------------ A ------------ null ------------ 1
A ------------B ------------null ------------2
A ------------C ------------ m1------------ 3
A ------------C ------------m2 ------------4
A ------------C ------------m3 ------------5
B ------------B ------------null ------------6
B ------------C ------------m1 ------------7
B ------------C ------------m2 ------------8
B ------------C ------------m3 ------------9
C ------------C ------------m1 ------------10
C ------------C ------------m2 ------------11
C ------------C ------------m3 ------------12
So Group Table is connected to Bridge Table and bridge is connected to fact.
MeterData table from transaction db can’t become a fact because it has many to many relations on meterID.
Now we combine bridge table with the meter data table above and combining with bridge key generate the fact table.
The join is done with a condition such as WHERE BridgeDimension.Meter=MeterData.Meter
Fact Table comes out to be:
Fact Table
FKbridgeID -------Meter ------ reading --------- date
1 ------------999 ------------0 ------------d1
2 ------------999 ------------0 ------------d1
3 ------------m1 ------------100 ------------d1
4 ------------m2 ------------24 ------------d1
5 ------------m3 ------------50 ------------d1
6 ------------999 ------------0 ------------d1
7 ------------m1 ------------100 ------------d1
8 ------------m2 ------------24 ------------d1
9 ------------m3 ------------50 ------------d1
10 ------------m1 ------------100 ------------d1
11 ------------m2 ------------24 ------------d1
12 ------------m3 ------------50 ------------d1
1 ------------999 ------------0 ------------d2
2 ------------999 ------------0 ------------d2
3 ------------m1 ------------0 ------------d2
4 ------------m2 ------------0 ------------d2
5 ------------m3 ------------56 ------------d2
6 ------------999 ------------0 ------------d2
7 ------------m1 ------------0 ------------d2
8 ------------m2 ------------0 ------------d2
9 ------------m3 ------------56 ------------d2
10 ------------m1 ------------0 ------------d2
11 ------------m2 ------------0 ------------d2
12 ------------m3 ------------56 ------------d2
999 is represent fake meterid
So we clearly see that rows in fact are 6 times the rows in meter table. And I needed to break the recursive group table to resolve many to many relation so I made the bridge. Then I gave bridge a surrogated key to use in fact table and this helped me in getting rid of a many to many between fact and bridge.
Problem is my meterData table has 20 GB of records and fact has 10 times i.e. 200 GB and now if I index on any column the size of the table increase with a magnitude of 60 GB. Time taken to do this simple operation is few hours and server freezes for all the while.
If I establish parent child relation with Fact table in diagram and try to save it, system freezes again and needs overnight to say a yes or no for the relationship.
you can't imagine how frustrating is this !!!
Help me out identifying what’s wrong with my model. I am able to get to the right data, but size of the fact table is arguably high and not gladly acceptable.
Let me show you the picture (tip of the iceberg)
Data table in transaction db is my meter data table, which records every hour's meters data.
All these meters actually spread like nodes in the tree, so tree has meters.
Meters are part of group and group can be inside other group that can be inside another group forming a tree structure. Tree is such that each of the Childs has one and only one parent.
SO
Let's assume A is parent of B which is parent of C which in turn has got 3 meters m1, m2, m3
in transaction db, groups are stored separate and readings are separate. For simplicity I have not included time but to understand spread i did include 1 extra reading for a day.
Group Table
child ------------parent------------ meter
A ------------ Null------------ null
B ------------ A ------------ null
C ------------ B ------------ m1
C ------------ B ------------ m2
C ------------ B ------------ m3
Meter data Table
Meter ------------ Reading ------------ date
m1 ------------ 100 ------------ d1
m2 ------------ 24 ------------ d1
m3 ------------ 50 ------------ d1
m3 ------------ 56 ------------ d2
work like charm,
now we plan to make a datawarehouse:
we can't have same structure in dimensional model.
on resolving Group Table from above (i.e. generating an exploded group table)
Also give this table a surrogated Unique key which will be used later.
bridge table
parent -----subsidiary child --------meter -------bridgeID
A ------------ A ------------ null ------------ 1
A ------------B ------------null ------------2
A ------------C ------------ m1------------ 3
A ------------C ------------m2 ------------4
A ------------C ------------m3 ------------5
B ------------B ------------null ------------6
B ------------C ------------m1 ------------7
B ------------C ------------m2 ------------8
B ------------C ------------m3 ------------9
C ------------C ------------m1 ------------10
C ------------C ------------m2 ------------11
C ------------C ------------m3 ------------12
So Group Table is connected to Bridge Table and bridge is connected to fact.
MeterData table from transaction db can’t become a fact because it has many to many relations on meterID.
Now we combine bridge table with the meter data table above and combining with bridge key generate the fact table.
The join is done with a condition such as WHERE BridgeDimension.Meter=MeterData.Meter
Fact Table comes out to be:
Fact Table
FKbridgeID -------Meter ------ reading --------- date
1 ------------999 ------------0 ------------d1
2 ------------999 ------------0 ------------d1
3 ------------m1 ------------100 ------------d1
4 ------------m2 ------------24 ------------d1
5 ------------m3 ------------50 ------------d1
6 ------------999 ------------0 ------------d1
7 ------------m1 ------------100 ------------d1
8 ------------m2 ------------24 ------------d1
9 ------------m3 ------------50 ------------d1
10 ------------m1 ------------100 ------------d1
11 ------------m2 ------------24 ------------d1
12 ------------m3 ------------50 ------------d1
1 ------------999 ------------0 ------------d2
2 ------------999 ------------0 ------------d2
3 ------------m1 ------------0 ------------d2
4 ------------m2 ------------0 ------------d2
5 ------------m3 ------------56 ------------d2
6 ------------999 ------------0 ------------d2
7 ------------m1 ------------0 ------------d2
8 ------------m2 ------------0 ------------d2
9 ------------m3 ------------56 ------------d2
10 ------------m1 ------------0 ------------d2
11 ------------m2 ------------0 ------------d2
12 ------------m3 ------------56 ------------d2
999 is represent fake meterid
So we clearly see that rows in fact are 6 times the rows in meter table. And I needed to break the recursive group table to resolve many to many relation so I made the bridge. Then I gave bridge a surrogated key to use in fact table and this helped me in getting rid of a many to many between fact and bridge.
Problem is my meterData table has 20 GB of records and fact has 10 times i.e. 200 GB and now if I index on any column the size of the table increase with a magnitude of 60 GB. Time taken to do this simple operation is few hours and server freezes for all the while.
If I establish parent child relation with Fact table in diagram and try to save it, system freezes again and needs overnight to say a yes or no for the relationship.
you can't imagine how frustrating is this !!!
jaiveeru- Posts : 14
Join date : 2009-04-16
Similar topics
» Advice needed on modeling Partnership Dimension
» Abstract Generic Dimension - Help Needed
» credit card program design challenge
» Why not snowflake if the aggregate dimension is needed for a different fact table?
» Customer Dimension is always needed?
» Abstract Generic Dimension - Help Needed
» credit card program design challenge
» Why not snowflake if the aggregate dimension is needed for a different fact table?
» Customer Dimension is always needed?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum