How to model 1:N relation with daily history for milions of rows

msedlacek Wed Nov 20, 2013 8:31 am

Hi all,

imagine following simple case.

You have to model history relation between PARTY (user, client) and a SERVICE (product). One Party can have 0-N Products.

Your job is to be able to answer following question: give me the list of SERVICES for given PARTY in given day?

In 3NF, I would create a simple relation table:
ID_PARTY
ID_SERVICE
DATE_VALID_FROM
DATE_VALID_TO

The ETL loading would track changes per day and create a new record if something has changed.
Then we could answer the question with simple query:

select ID_SERVICE from RELATIONAL_TABLE_3NF where ID_PARTY = 334232 and 11.12.2012 between DATE_VALID_FROM and DATE_VALID_TO

But how would you model this relationship table in dimensional model?

There could be a factless fact table
ID_DAY_PERIOD
ID_PARTY
ID_SERVICE

and you would create a record of relation for each day (11.12.2012, 12.12.2012, 13.12.2012, etc) - even if the is no change. Then the answer would be given using select

select ID_SERVICE from FACTLESS_FACT_TABLE where ID_PARTY = 334232 and ID_DAY_PERIOD = 11.12.2012

Is this correct approach? But what if there is 5mil of PARTIES and each of them has 10 services. You would have 5mil x 10 records per day. With daily history, you would have to store 1500mil records per month, which is too much to store considering there is very little number of changes per day.

Thanks,
Marian

ngalemmo Wed Nov 20, 2013 3:36 pm

A fact table is a simple relationship table. What's wrong with your first example?

Sure, you could blow it out to one row per relationship per day, but unless you have a good reason to do so, what's the point?

msedlacek Thu Nov 21, 2013 2:13 am

First example is fine. It's practical. But can I use it in dimensional model with star schemas? I thought there are only following types of fact tables:
- periodic snapshot
- accumulating snapshot
- detailed transactional fact table

which of these type would be the first example?

thanks

ngalemmo Thu Nov 21, 2013 9:08 am

A snapshot.

BoxesAndLines Thu Nov 21, 2013 10:28 am

msedlacek wrote:... You would have 5mil x 10 records per day. With daily history, you would have to store 1500mil records per month, which is too much to store considering there is very little number of changes per day.

Thanks,
Marian

The number of changes are irrelevant. Would you feel better if half the data changed? Other options are to partition the fact and keep only the month end snapshot (in addition to 31 days of daily snapshots). If changes are small, the business shouldn't care about intra month activity after the month is over. You can create a transaction fact that has a row for each "add service" and "remove service". This will allow you to sum up activity over time, identify churn, etc. You could partition by adds and disconnects. Lots of options based on what type of metrics you're trying to capture.

Sponsored content

How to model 1:N relation with daily history for milions of rows

How to model 1:N relation with daily history for milions of rows

Re: How to model 1:N relation with daily history for milions of rows

Re: How to model 1:N relation with daily history for milions of rows

Re: How to model 1:N relation with daily history for milions of rows

Re: How to model 1:N relation with daily history for milions of rows

Re: How to model 1:N relation with daily history for milions of rows