Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Daily Snapshots Best Practices

4 posters

Go down

Daily Snapshots Best Practices Empty Daily Snapshots Best Practices

Post  DilMustafa Fri Aug 07, 2009 4:40 pm

We have some daily count fact tables where we store hardware or products owned by customer by region by date by category etc...This happens every day irrelevant of the change in data. Lets say if a customer owns 100 products today tomorrow and day after tomorrow, his record will be counted every day.

Above mentioned approach is making these fact tables grow like crazy. I was wondering what are the best practices to handle a scenerio like this.

DilMustafa

Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  BoxesAndLines Fri Aug 07, 2009 10:07 pm

On our biggest snapshot tables you get five days of history. That's how I handle it. After that, you only get aggregated counts.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  ngalemmo Tue Aug 11, 2009 11:09 am

This is one of the challenges of snapshot tables. By their very nature they periodically represent the entire state of whatever it is you are trying to track.

There are physical accomodations you can do in the database, such as partitioning, that make dealing with tables like this more palatable, but the basic underlying question is how many snapshots does the business need? To keep such snapshots for a long period of time (at least on a daily basis) has questionable business value as the data gets older.

So, it begs the question: How long do you maintain daily snapshots? If the business insists on many years, so be it. If they are willing to compromise, there are various approaches you can take. One would be to maintain month-end snapshots for historical record and daily for some much shorter period of time (say, not more than 3 months).

Another, trickier, way to do it is to populate a table with monthly snapshots and daily changes, and a companion bridge table (sort of like a hierarchy bridge). The bridge table would contain date key pairs, with the desired date and the date(s) that need to be summed to get the appropriate point in time. For example, on 6/30/09 you have a full month-end snapshot. You have deltas for 7/1, 7/2 and 7/3. In the bridge table you have the following pairs:

6/30 - 6/30
7/1 - 6/30
7/1 - 7/1
7/2 - 6/30
7/2 - 7/1
7/2 - 7/2
7/3 - 6/30
7/3 - 7/1
7/3 - 7/2
7/3 - 7/3

Now, if someone wants a report for 7/2, they would join through the bridge and sum all days for 7/2, in other words, it would sum 6/30 (the last complete snapshot), 7/1 and 7/2. You trade off a slightly more complex query for a much smaller fact table. Most BI tools will hide this complexity, or you can simply create a view that joins the bridge to the facts and exposes the key date from the bridge instead of the fact table. Query performance shouldn't be much different if changes are not too great. If the change volume is high, causing performance issues, you could snapshot more frequently (say, weekly or twice a month).
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  DilMustafa Tue Jan 26, 2010 6:11 pm

Thanks for the feedback. Can you please, give an example of the bridge table. If my Fact table has 20 SK's do I need to include all of them in the Bridge table.

DilMustafa

Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  ngalemmo Tue Jan 26, 2010 6:29 pm

The bridge would only contain date pairs (reporting date key/snapshot table date key). You would join the date key on the snapshot table to the snapshot table date key in the bridge and use the reporting date key to select the desired date. You would filter and summarize on other dimensions using the existing FKs in the snapshot table.

How much this approach helps is contigent on the sparsity of changes at the granularity of the snapshot. Since you have 20 FK's the granularity is probably high enough that the frequency of changes would be relatively small compared to a full snapshot.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  kirillss Mon Feb 01, 2010 9:09 am

Another way to store snapshot table only for dates when changes occurs. It`s intresting to compare that method with method offered by ngalemmo (bridge table)

kirillss

Posts : 2
Join date : 2010-02-01

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  ngalemmo Tue Feb 02, 2010 5:00 pm

kirillss wrote:Another way to store snapshot table only for dates when changes occurs. It`s intresting to compare that method with method offered by ngalemmo (bridge table)

I'm not sure how well that would work. If you mean to create a complete snapshot of everthing when anything changes, I doubt it would be any different than simply doing a daily snapshot. If you mean to only create a new snapshot of those specific measures that changed, querying such a table would be a real challenge.

For example, take a simple inventory snapshot for a warehouse, product and date. Assume there is a product X in warehouses A, B & C and we only store a new inventory count at the end of the day when the inventory count changes. Such a snapshot table may look something like this:

WH Prd Date Cnt
A x 1/1 10
B x 1/1 20
C x 1/1 15
A x 1/3 30
C x 1/10 12
A x 1/10 23
B x 1/12 16
B x 1/15 8
C X 1/17 27

While such a table would be much smaller than a complete snapshot at the end of the day, how do you query it for a specific date? If I wanted to know the inventory count on 1/14 how do I do it? It is possible to create such a query (you need to use the count on the max date less than or equal to 1/14 for the given warehouse and product and any other dimension there may be), but it would be quite complex and probably slow, particularly as the number of dimensions in the table expand. (Quantites of X for A, B and C should be 23, 16 and 12 respectively on 1/14). The advantage of storing deltas against a base count is that the query is very simple, you just sum the days referenced by the bridge.

The delta version of the same events would look like this:

WH Prd Date Cnt
A x 1/1 10
B x 1/1 20
C x 1/1 15
A x 1/3 20
C x 1/10 -3
A x 1/10 -7
B x 1/12 -4
B x 1/15 -8
C X 1/17 15

To get the counts for 1/14 you simply sum Cnt where date is between 1/1 (base count) and 1/14, grouping by warehouse. The purpose of the bridge table is for operational flexibility. A base count can occur at any time and the bridge will provide the appropriate range of dates for any target date. If you can guarantee a base count is always taken at a specific point (such as the first day of the month) and can easily derive that date, you don't need the bridge table.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  kirillss Wed Feb 03, 2010 9:25 am

ngalemmo wrote:
... Such a snapshot table may look something like this:

WH Prd Date Cnt
A x 1/1 10
B x 1/1 20
C x 1/1 15
A x 1/3 30
C x 1/10 12
A x 1/10 23
B x 1/12 16
B x 1/15 8
C X 1/17 27

While such a table would be much smaller than a complete snapshot at the end of the day, how do you query it for a specific date?

Let`s slightly change outline of the table
WH Prd DateFrom DateTo Cnt
A x 1/1 1/2 10
B x 1/1 1/11 20
C x 1/1 1/9 15
A x 1/3 1/9 30
C x 1/10 1/16 12
A x 1/10 1/1/2050 23
B x 1/12 1/14 16
B x 1/15 1/1/2050 8
C X 1/17 1/1/2050 27

We added column DateTo to mitigates problems with quering the table. Value 1/1/2050 in DateTo stands for current balance and depends on DBMS. Moreover, this version of snapshot avoid of any tables join. The only problem with that schema is more trickiest ETL (especially, when retrospective changes happens)

kirillss

Posts : 2
Join date : 2010-02-01

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  ngalemmo Wed Feb 03, 2010 1:41 pm

Ok. Fair enough.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Daily Snapshots Best Practices Empty Re: Daily Snapshots Best Practices

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum