How to model 1:N relation with daily history for milions of rows
3 posters
Page 1 of 1
How to model 1:N relation with daily history for milions of rows
Hi all,
imagine following simple case.
You have to model history relation between PARTY (user, client) and a SERVICE (product). One Party can have 0-N Products.
Your job is to be able to answer following question: give me the list of SERVICES for given PARTY in given day?
In 3NF, I would create a simple relation table:
ID_PARTY
ID_SERVICE
DATE_VALID_FROM
DATE_VALID_TO
The ETL loading would track changes per day and create a new record if something has changed.
Then we could answer the question with simple query:
select ID_SERVICE from RELATIONAL_TABLE_3NF where ID_PARTY = 334232 and 11.12.2012 between DATE_VALID_FROM and DATE_VALID_TO
But how would you model this relationship table in dimensional model?
There could be a factless fact table
ID_DAY_PERIOD
ID_PARTY
ID_SERVICE
and you would create a record of relation for each day (11.12.2012, 12.12.2012, 13.12.2012, etc) - even if the is no change. Then the answer would be given using select
select ID_SERVICE from FACTLESS_FACT_TABLE where ID_PARTY = 334232 and ID_DAY_PERIOD = 11.12.2012
Is this correct approach? But what if there is 5mil of PARTIES and each of them has 10 services. You would have 5mil x 10 records per day. With daily history, you would have to store 1500mil records per month, which is too much to store considering there is very little number of changes per day.
Thanks,
Marian
imagine following simple case.
You have to model history relation between PARTY (user, client) and a SERVICE (product). One Party can have 0-N Products.
Your job is to be able to answer following question: give me the list of SERVICES for given PARTY in given day?
In 3NF, I would create a simple relation table:
ID_PARTY
ID_SERVICE
DATE_VALID_FROM
DATE_VALID_TO
The ETL loading would track changes per day and create a new record if something has changed.
Then we could answer the question with simple query:
select ID_SERVICE from RELATIONAL_TABLE_3NF where ID_PARTY = 334232 and 11.12.2012 between DATE_VALID_FROM and DATE_VALID_TO
But how would you model this relationship table in dimensional model?
There could be a factless fact table
ID_DAY_PERIOD
ID_PARTY
ID_SERVICE
and you would create a record of relation for each day (11.12.2012, 12.12.2012, 13.12.2012, etc) - even if the is no change. Then the answer would be given using select
select ID_SERVICE from FACTLESS_FACT_TABLE where ID_PARTY = 334232 and ID_DAY_PERIOD = 11.12.2012
Is this correct approach? But what if there is 5mil of PARTIES and each of them has 10 services. You would have 5mil x 10 records per day. With daily history, you would have to store 1500mil records per month, which is too much to store considering there is very little number of changes per day.
Thanks,
Marian
msedlacek- Posts : 4
Join date : 2013-09-13
Re: How to model 1:N relation with daily history for milions of rows
A fact table is a simple relationship table. What's wrong with your first example?
Sure, you could blow it out to one row per relationship per day, but unless you have a good reason to do so, what's the point?
Sure, you could blow it out to one row per relationship per day, but unless you have a good reason to do so, what's the point?
Re: How to model 1:N relation with daily history for milions of rows
First example is fine. It's practical. But can I use it in dimensional model with star schemas? I thought there are only following types of fact tables:
- periodic snapshot
- accumulating snapshot
- detailed transactional fact table
which of these type would be the first example?
thanks
- periodic snapshot
- accumulating snapshot
- detailed transactional fact table
which of these type would be the first example?
thanks
msedlacek- Posts : 4
Join date : 2013-09-13
Re: How to model 1:N relation with daily history for milions of rows
The number of changes are irrelevant. Would you feel better if half the data changed? Other options are to partition the fact and keep only the month end snapshot (in addition to 31 days of daily snapshots). If changes are small, the business shouldn't care about intra month activity after the month is over. You can create a transaction fact that has a row for each "add service" and "remove service". This will allow you to sum up activity over time, identify churn, etc. You could partition by adds and disconnects. Lots of options based on what type of metrics you're trying to capture.msedlacek wrote:... You would have 5mil x 10 records per day. With daily history, you would have to store 1500mil records per month, which is too much to store considering there is very little number of changes per day.
Thanks,
Marian
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Handling History tables in Dimensional Model
» Complex Dimensional Model Help - With History Product to Part
» OLTP Dailty Pricing tables: How best to model ceiling, floor and average daily price data?
» Attribute data held as rows in source - how to dimensionally model???
» Model Design best practice - add columns or pivot data for multiple rows ?
» Complex Dimensional Model Help - With History Product to Part
» OLTP Dailty Pricing tables: How best to model ceiling, floor and average daily price data?
» Attribute data held as rows in source - how to dimensionally model???
» Model Design best practice - add columns or pivot data for multiple rows ?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum