Daily Snapshots Best Practices
4 posters
Page 1 of 1
Daily Snapshots Best Practices
We have some daily count fact tables where we store hardware or products owned by customer by region by date by category etc...This happens every day irrelevant of the change in data. Lets say if a customer owns 100 products today tomorrow and day after tomorrow, his record will be counted every day.
Above mentioned approach is making these fact tables grow like crazy. I was wondering what are the best practices to handle a scenerio like this.
Above mentioned approach is making these fact tables grow like crazy. I was wondering what are the best practices to handle a scenerio like this.
DilMustafa- Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada
Re: Daily Snapshots Best Practices
On our biggest snapshot tables you get five days of history. That's how I handle it. After that, you only get aggregated counts.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Daily Snapshots Best Practices
This is one of the challenges of snapshot tables. By their very nature they periodically represent the entire state of whatever it is you are trying to track.
There are physical accomodations you can do in the database, such as partitioning, that make dealing with tables like this more palatable, but the basic underlying question is how many snapshots does the business need? To keep such snapshots for a long period of time (at least on a daily basis) has questionable business value as the data gets older.
So, it begs the question: How long do you maintain daily snapshots? If the business insists on many years, so be it. If they are willing to compromise, there are various approaches you can take. One would be to maintain month-end snapshots for historical record and daily for some much shorter period of time (say, not more than 3 months).
Another, trickier, way to do it is to populate a table with monthly snapshots and daily changes, and a companion bridge table (sort of like a hierarchy bridge). The bridge table would contain date key pairs, with the desired date and the date(s) that need to be summed to get the appropriate point in time. For example, on 6/30/09 you have a full month-end snapshot. You have deltas for 7/1, 7/2 and 7/3. In the bridge table you have the following pairs:
6/30 - 6/30
7/1 - 6/30
7/1 - 7/1
7/2 - 6/30
7/2 - 7/1
7/2 - 7/2
7/3 - 6/30
7/3 - 7/1
7/3 - 7/2
7/3 - 7/3
Now, if someone wants a report for 7/2, they would join through the bridge and sum all days for 7/2, in other words, it would sum 6/30 (the last complete snapshot), 7/1 and 7/2. You trade off a slightly more complex query for a much smaller fact table. Most BI tools will hide this complexity, or you can simply create a view that joins the bridge to the facts and exposes the key date from the bridge instead of the fact table. Query performance shouldn't be much different if changes are not too great. If the change volume is high, causing performance issues, you could snapshot more frequently (say, weekly or twice a month).
There are physical accomodations you can do in the database, such as partitioning, that make dealing with tables like this more palatable, but the basic underlying question is how many snapshots does the business need? To keep such snapshots for a long period of time (at least on a daily basis) has questionable business value as the data gets older.
So, it begs the question: How long do you maintain daily snapshots? If the business insists on many years, so be it. If they are willing to compromise, there are various approaches you can take. One would be to maintain month-end snapshots for historical record and daily for some much shorter period of time (say, not more than 3 months).
Another, trickier, way to do it is to populate a table with monthly snapshots and daily changes, and a companion bridge table (sort of like a hierarchy bridge). The bridge table would contain date key pairs, with the desired date and the date(s) that need to be summed to get the appropriate point in time. For example, on 6/30/09 you have a full month-end snapshot. You have deltas for 7/1, 7/2 and 7/3. In the bridge table you have the following pairs:
6/30 - 6/30
7/1 - 6/30
7/1 - 7/1
7/2 - 6/30
7/2 - 7/1
7/2 - 7/2
7/3 - 6/30
7/3 - 7/1
7/3 - 7/2
7/3 - 7/3
Now, if someone wants a report for 7/2, they would join through the bridge and sum all days for 7/2, in other words, it would sum 6/30 (the last complete snapshot), 7/1 and 7/2. You trade off a slightly more complex query for a much smaller fact table. Most BI tools will hide this complexity, or you can simply create a view that joins the bridge to the facts and exposes the key date from the bridge instead of the fact table. Query performance shouldn't be much different if changes are not too great. If the change volume is high, causing performance issues, you could snapshot more frequently (say, weekly or twice a month).
Re: Daily Snapshots Best Practices
Thanks for the feedback. Can you please, give an example of the bridge table. If my Fact table has 20 SK's do I need to include all of them in the Bridge table.
DilMustafa- Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada
Re: Daily Snapshots Best Practices
The bridge would only contain date pairs (reporting date key/snapshot table date key). You would join the date key on the snapshot table to the snapshot table date key in the bridge and use the reporting date key to select the desired date. You would filter and summarize on other dimensions using the existing FKs in the snapshot table.
How much this approach helps is contigent on the sparsity of changes at the granularity of the snapshot. Since you have 20 FK's the granularity is probably high enough that the frequency of changes would be relatively small compared to a full snapshot.
How much this approach helps is contigent on the sparsity of changes at the granularity of the snapshot. Since you have 20 FK's the granularity is probably high enough that the frequency of changes would be relatively small compared to a full snapshot.
Re: Daily Snapshots Best Practices
Another way to store snapshot table only for dates when changes occurs. It`s intresting to compare that method with method offered by ngalemmo (bridge table)
kirillss- Posts : 2
Join date : 2010-02-01
Re: Daily Snapshots Best Practices
kirillss wrote:Another way to store snapshot table only for dates when changes occurs. It`s intresting to compare that method with method offered by ngalemmo (bridge table)
I'm not sure how well that would work. If you mean to create a complete snapshot of everthing when anything changes, I doubt it would be any different than simply doing a daily snapshot. If you mean to only create a new snapshot of those specific measures that changed, querying such a table would be a real challenge.
For example, take a simple inventory snapshot for a warehouse, product and date. Assume there is a product X in warehouses A, B & C and we only store a new inventory count at the end of the day when the inventory count changes. Such a snapshot table may look something like this:
WH Prd Date Cnt
A x 1/1 10
B x 1/1 20
C x 1/1 15
A x 1/3 30
C x 1/10 12
A x 1/10 23
B x 1/12 16
B x 1/15 8
C X 1/17 27
While such a table would be much smaller than a complete snapshot at the end of the day, how do you query it for a specific date? If I wanted to know the inventory count on 1/14 how do I do it? It is possible to create such a query (you need to use the count on the max date less than or equal to 1/14 for the given warehouse and product and any other dimension there may be), but it would be quite complex and probably slow, particularly as the number of dimensions in the table expand. (Quantites of X for A, B and C should be 23, 16 and 12 respectively on 1/14). The advantage of storing deltas against a base count is that the query is very simple, you just sum the days referenced by the bridge.
The delta version of the same events would look like this:
WH Prd Date Cnt
A x 1/1 10
B x 1/1 20
C x 1/1 15
A x 1/3 20
C x 1/10 -3
A x 1/10 -7
B x 1/12 -4
B x 1/15 -8
C X 1/17 15
To get the counts for 1/14 you simply sum Cnt where date is between 1/1 (base count) and 1/14, grouping by warehouse. The purpose of the bridge table is for operational flexibility. A base count can occur at any time and the bridge will provide the appropriate range of dates for any target date. If you can guarantee a base count is always taken at a specific point (such as the first day of the month) and can easily derive that date, you don't need the bridge table.
Re: Daily Snapshots Best Practices
ngalemmo wrote:
... Such a snapshot table may look something like this:
WH Prd Date Cnt
A x 1/1 10
B x 1/1 20
C x 1/1 15
A x 1/3 30
C x 1/10 12
A x 1/10 23
B x 1/12 16
B x 1/15 8
C X 1/17 27
While such a table would be much smaller than a complete snapshot at the end of the day, how do you query it for a specific date?
Let`s slightly change outline of the table
WH Prd DateFrom DateTo Cnt
A x 1/1 1/2 10
B x 1/1 1/11 20
C x 1/1 1/9 15
A x 1/3 1/9 30
C x 1/10 1/16 12
A x 1/10 1/1/2050 23
B x 1/12 1/14 16
B x 1/15 1/1/2050 8
C X 1/17 1/1/2050 27
We added column DateTo to mitigates problems with quering the table. Value 1/1/2050 in DateTo stands for current balance and depends on DBMS. Moreover, this version of snapshot avoid of any tables join. The only problem with that schema is more trickiest ETL (especially, when retrospective changes happens)
kirillss- Posts : 2
Join date : 2010-02-01
Similar topics
» Daily Snapshots of all facts and Dimensions
» Periodic snapshots - what is better?
» Snapshots and Date Dimensions
» Updating historic transactions and snapshots
» Accumulating Snapshots: Can we update the non-milestone dimensional data associated with the shapshot?
» Periodic snapshots - what is better?
» Snapshots and Date Dimensions
» Updating historic transactions and snapshots
» Accumulating Snapshots: Can we update the non-milestone dimensional data associated with the shapshot?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum