Daily Snapshot question
5 posters
Page 1 of 1
Daily Snapshot question
Hi All
I have a snapshot requirement around Fleet Management. The metric is how many Vans are hired to Customers at a given point in time. So the measure is Van Count, the dimensions are Van, Customer, Date, Location.
There are two snapshot requirements. The first is for end of month reporting, how many Vans has each customer got. Then the second is the more difficult, its how many Vans has the customer got today. That means the report could be run on the 13th, 14th or say 20th of the month - it doesn't matter - the user just always wants to get the current Count of Vans today.
This is pointing me towards a daily snapshot. But the trouble is there that the data will not change very often. E.g. Customer A has 10 Vans today, it is likely that they will have 10 vans on every day until an ad-hoc point where they may add another van. So the fact table would look something like this:
CustomerKey | VanKey | DateKey | VanCount
7 |120 | 20150513 | 1
7 |121 | 20150513 | 1
7 |120 | 20150514 | 1
7 |121 | 20150514 | 1
7 |120 | 20150514 | 1
7 |121 | 20150514 | 1
It may end up being a factless fact table, but that's not my concern. My concern is that the unchanged data will be repeated every day as the actual number of vans changes very slowly.
Do you think a daily snapshot is the right approach here? The volumes are around 54,000 rows per day, so about 20 million per year. So we're really not talking very large volumes, but it seems like a bit of a waste.
Alternatively I'm considering holding a monthly and daily snapshot. I could write the balance of the month to the monthly snapshot and then truncate the daily snapshot fresh for the start of the month. Is that a reasonable approach do you think?
Thank you
I have a snapshot requirement around Fleet Management. The metric is how many Vans are hired to Customers at a given point in time. So the measure is Van Count, the dimensions are Van, Customer, Date, Location.
There are two snapshot requirements. The first is for end of month reporting, how many Vans has each customer got. Then the second is the more difficult, its how many Vans has the customer got today. That means the report could be run on the 13th, 14th or say 20th of the month - it doesn't matter - the user just always wants to get the current Count of Vans today.
This is pointing me towards a daily snapshot. But the trouble is there that the data will not change very often. E.g. Customer A has 10 Vans today, it is likely that they will have 10 vans on every day until an ad-hoc point where they may add another van. So the fact table would look something like this:
CustomerKey | VanKey | DateKey | VanCount
7 |120 | 20150513 | 1
7 |121 | 20150513 | 1
7 |120 | 20150514 | 1
7 |121 | 20150514 | 1
7 |120 | 20150514 | 1
7 |121 | 20150514 | 1
It may end up being a factless fact table, but that's not my concern. My concern is that the unchanged data will be repeated every day as the actual number of vans changes very slowly.
Do you think a daily snapshot is the right approach here? The volumes are around 54,000 rows per day, so about 20 million per year. So we're really not talking very large volumes, but it seems like a bit of a waste.
Alternatively I'm considering holding a monthly and daily snapshot. I could write the balance of the month to the monthly snapshot and then truncate the daily snapshot fresh for the start of the month. Is that a reasonable approach do you think?
Thank you
jryan- Posts : 33
Join date : 2010-09-27
Re: Daily Snapshot question
I would have two date keys - one for when the van was hired and one for when it was returned (with the returned one set to a date of your choosing for vans still on hire; picking a date far in the future probably makes queries easier)
You can then easily report of the hired state of the vans on any particular date.
Create a monthly snapshot if you want for performance reasons but the basic fact table will provide the information you need - just query for any fact record where the hire date <=month end and return date > month end.
Hope this helps?
You can then easily report of the hired state of the vans on any particular date.
Create a monthly snapshot if you want for performance reasons but the basic fact table will provide the information you need - just query for any fact record where the hire date <=month end and return date > month end.
Hope this helps?
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Daily Snapshot question
Thank you Nick
Do you mean a sort of transactional fact table, with a Start Date and End Date?
I guess that would help with the requirement for the Number of Vans on hand today, but wouldn't help with the "end of month reporting". They need to know (for example) that in April a customer had 10 Vans, but in May that increased to 12.
Maybe I could make a "transactional" fact table around the one that you're saying and then a Monthly snapshot fact table also?
Do you mean a sort of transactional fact table, with a Start Date and End Date?
I guess that would help with the requirement for the Number of Vans on hand today, but wouldn't help with the "end of month reporting". They need to know (for example) that in April a customer had 10 Vans, but in May that increased to 12.
Maybe I could make a "transactional" fact table around the one that you're saying and then a Monthly snapshot fact table also?
jryan- Posts : 33
Join date : 2010-09-27
Re: Daily Snapshot question
Hi,
yes to the transactional fact table.
Month End report: it depends what your requirements are. Do you want to know, for example,
How do you count:
Whatever your requirements, I suspect they can be supported by the Transactional Fact table; either directly or using the Txn Fact table to populate your monthly snapshot
yes to the transactional fact table.
Month End report: it depends what your requirements are. Do you want to know, for example,
- how many vans were on hire on the last day of the month
- how many vans were on hire at some point in the month
- how many van-days of hire there were in the month (e.g. 10 vans hired for the whole of a 30 day month = 300 van-days)
How do you count:
- Vans hired last month and returned this month?
- Vans hired and returned in the same month?
Whatever your requirements, I suspect they can be supported by the Transactional Fact table; either directly or using the Txn Fact table to populate your monthly snapshot
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Daily Snapshot question
Nick, why do you think transactional rather than accumulating snapshot?
A date-hired and date-returned, inserted when hired, updated when returned, would seem to be a good solution to this problem.
Interested in your thinking.
A date-hired and date-returned, inserted when hired, updated when returned, would seem to be a good solution to this problem.
Interested in your thinking.
Re: Daily Snapshot question
For monthly summaries a snapshot is fine if there is a need for it. It can be calculated from the detail, but if performance is an issue, you implement an aggregate, which the snapshot would be. Wither it is accumulating or not, is an implementation choice.
A daily snapshot to support the ad-hoc date ranges, would probably be overkill. I doubt the aggregate would be that much smaller than the detail, and the detail gives you the ability to analyze the data without dimensional compromise.
A daily snapshot to support the ad-hoc date ranges, would probably be overkill. I doubt the aggregate would be that much smaller than the detail, and the detail gives you the ability to analyze the data without dimensional compromise.
Re: Daily Snapshot question
Thanks all for the replies.
Nick - to answer your questions - for the month end reporting we do know how many vans were on hire on the last day of the month. And that's what they want to report. The users don't want to know the vans that were on hire at some point in the month, just the end month balance.
To answer your question about how we do the counting, its all done on contract start and end dates, as these vans are hired via a contract. Therefore you don't get a return date - the source data isn't great. Hired and returned in the same month they're not that interested in, as its more the month end balance that they want.
For the "current" on hire requirements they need to know the #vans that each customer has today. But they're not interested in comparing today with yesterday, for example.
I like the idea of a snapshot for ease of querying. This will be in a cube and to get the month end balances I think its not ideal for them to filter on contract start and end dates. But with a snapshot fact table you would get a "Snapshot Date" dimension or similar, which would allow you to just pick the month of "May" for example and see all of the vans that are hired out.
Does this extra info I've provided about how the calculations work change anything? Or do you still think its a transactional fact table with dimensions for Start and End Date?
Nick - to answer your questions - for the month end reporting we do know how many vans were on hire on the last day of the month. And that's what they want to report. The users don't want to know the vans that were on hire at some point in the month, just the end month balance.
To answer your question about how we do the counting, its all done on contract start and end dates, as these vans are hired via a contract. Therefore you don't get a return date - the source data isn't great. Hired and returned in the same month they're not that interested in, as its more the month end balance that they want.
For the "current" on hire requirements they need to know the #vans that each customer has today. But they're not interested in comparing today with yesterday, for example.
I like the idea of a snapshot for ease of querying. This will be in a cube and to get the month end balances I think its not ideal for them to filter on contract start and end dates. But with a snapshot fact table you would get a "Snapshot Date" dimension or similar, which would allow you to just pick the month of "May" for example and see all of the vans that are hired out.
Does this extra info I've provided about how the calculations work change anything? Or do you still think its a transactional fact table with dimensions for Start and End Date?
jryan- Posts : 33
Join date : 2010-09-27
Re: Daily Snapshot question
The transactional fact table will support your monthly snapshot - just query where the start date <= the last day of the month and end date >= last day of the month (adjust the = on each side as necessary for your particular requirements).
You can just run this query whenever you want this month-end information - or use run the query once a month to feed a snapshot/cube if you want.
You can just run this query whenever you want this month-end information - or use run the query once a month to feed a snapshot/cube if you want.
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Daily Snapshot question
Thanks Nick, Yes I do agree with that, although I'm not worried about how to calculate the snapshot - it will be relatively easy.
More my concern is what I present to the user. It looks like I will have to have 2 fact tables. 1)To do the end of month count (the Monthly Snapshot), then 2)something to get the current, up to the day count of vans.
Its "fact table 2" that I'm not sure about. But maybe it just needs to be dimensioned by Contract Start Date, Contract End Date, Van, Customer etc.
I don't like the user having to go to different places (snapshot for the month end balances, transactional for the current #vans on hand) to get what is the same measure. But maybe that's acceptable.
The alternative I suppose would be to update the monthly snapshot each day. Sort of a morph between an accumulating snapshot and a periodic snapshot. But that feels a bit wrong!
More my concern is what I present to the user. It looks like I will have to have 2 fact tables. 1)To do the end of month count (the Monthly Snapshot), then 2)something to get the current, up to the day count of vans.
Its "fact table 2" that I'm not sure about. But maybe it just needs to be dimensioned by Contract Start Date, Contract End Date, Van, Customer etc.
I don't like the user having to go to different places (snapshot for the month end balances, transactional for the current #vans on hand) to get what is the same measure. But maybe that's acceptable.
The alternative I suppose would be to update the monthly snapshot each day. Sort of a morph between an accumulating snapshot and a periodic snapshot. But that feels a bit wrong!
jryan- Posts : 33
Join date : 2010-09-27
Re: Daily Snapshot question
Sorry - I may not have been clear...
It's the transactional fact table "dimensioned by Contract Start Date, Contract End Date, Van, Customer etc." that you have to build and which will support both your daily and monthly queries. If you want to build a monthly snapshot by querying the transactional fact table that's up to you - but it's not required. The only benefit of the monthly snapshot is performance - it provides no additional functionality - and if querying the transactional fact table performs acceptably (and it should) then I wouldn't bother with the snapshot.
It's the transactional fact table "dimensioned by Contract Start Date, Contract End Date, Van, Customer etc." that you have to build and which will support both your daily and monthly queries. If you want to build a monthly snapshot by querying the transactional fact table that's up to you - but it's not required. The only benefit of the monthly snapshot is performance - it provides no additional functionality - and if querying the transactional fact table performs acceptably (and it should) then I wouldn't bother with the snapshot.
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Daily Snapshot question
I see, thanks, yes I slightly misunderstood.
However I go back to the point of Start and End Dates not being that user great for analysis, at least not in a cube. Imagine you have a contract that starts on 01/02/2015 and finishes on 31/08/2015.
If you wanted to know how many Vans/Contracts you had active today, then you have to pick an End Date of greater than today. You will basically have to ensure that every day/month after today is selected to get the number of Vans that are still part of active contract.
Maybe I'm worrying about that too much....maybe that's acceptable - it just feels a bit nicer to somehow have a single date, pick today, and then be told how many are active.
However I go back to the point of Start and End Dates not being that user great for analysis, at least not in a cube. Imagine you have a contract that starts on 01/02/2015 and finishes on 31/08/2015.
If you wanted to know how many Vans/Contracts you had active today, then you have to pick an End Date of greater than today. You will basically have to ensure that every day/month after today is selected to get the number of Vans that are still part of active contract.
Maybe I'm worrying about that too much....maybe that's acceptable - it just feels a bit nicer to somehow have a single date, pick today, and then be told how many are active.
jryan- Posts : 33
Join date : 2010-09-27
Re: Daily Snapshot question
Your user would pick the single date they want to report for.
It's up to you to configure your BI app so that this selected date is passed as a parameter to your query i.e. replace the '?' with the selected date in the following (example) WHERE clause
WHERE 'contract start date' <= ? and 'contact end date' >= ?
If you are allowing your users direct access to tables/cubes, rather than through a BI app, then presumably they have the skills to write this WHERE clause for themselves
It's up to you to configure your BI app so that this selected date is passed as a parameter to your query i.e. replace the '?' with the selected date in the following (example) WHERE clause
WHERE 'contract start date' <= ? and 'contact end date' >= ?
If you are allowing your users direct access to tables/cubes, rather than through a BI app, then presumably they have the skills to write this WHERE clause for themselves
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Daily Snapshot question
There's definitely a few ways to do it in SSAS, but you can't get them to pick a single date unfortunately. They will be able to select the appropriate filters though.
I'm just a bit surprised that the general consensus is to approach this with a start and end date, especially as I can't find a good example of this in the Kimball book. Although it does seem to be the consensus on this forum:
http://forum.kimballgroup.com/t2359-how-to-model-customers-having-contracts-factless-fact-table
Thanks again for your help
I'm just a bit surprised that the general consensus is to approach this with a start and end date, especially as I can't find a good example of this in the Kimball book. Although it does seem to be the consensus on this forum:
http://forum.kimballgroup.com/t2359-how-to-model-customers-having-contracts-factless-fact-table
Thanks again for your help
jryan- Posts : 33
Join date : 2010-09-27
Re: Daily Snapshot question
I guess it's due to the shear size of daily snapshot table with high data redundancy. Start/End date, as Nick suggested, is the most efficient approach to store daily snapshot without losing information if change tracking is not required. It's still quite effective for slow change tracking. Whenever you need a daily snapshot, you can always cross join it to date dimension to produce a view of daily snapshot or monthly snapshot and feed it to SSAS cube. In relational DB, you will never have a massive daily snapshot table. The snapshot may be stored in the cube, but in a highly compressed format.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Daily Snapshot or SCD 2
» snapshot facts: daily versus effective from & to
» Daily snapshot fact table-any chance to reduce data volume?
» Modeling Question - Accumulating Snapshot vs. Event Transactions
» Accumulating Snapshot and Transaction Fact tables : question to design and use them together
» snapshot facts: daily versus effective from & to
» Daily snapshot fact table-any chance to reduce data volume?
» Modeling Question - Accumulating Snapshot vs. Event Transactions
» Accumulating Snapshot and Transaction Fact tables : question to design and use them together
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum