Summary Tables

kclark Wed Dec 01, 2010 5:18 pm

Hello,

I would like any advice for summary tables... Where are they built, in the database or in the BI Tool? What's it's purpose? Granularity limitations? etc.?

warrent Wed Dec 01, 2010 6:03 pm

Hi, kclark,

Summary tables, also known as aggregate tables, are physically instantiated summary queries against a detailed base fact table. You create them for performance reasons.

For example, if you have a 5 billion row Sales Fact table with 5 years worth of data in it, and someone asks for total Sales by year, it will take some time for the database to sum up all the rows in the detail to get the 5 rows requested. On the other had, if you had created an aggregate table that only had Sales Amount by Date, it would only have one row per day for the last 5 years (about 1,826 rows, or so). Getting total Sales by Year from this table would clearly be much faster.

Ideally, you want the system to redirect the query to the best possible available aggregate table automatically, without any user involvement or awareness. We call this query redirection aggregate navigation.

How you do all this depends on the technology you have available to you. Most relational databases have the concept of a materialized view which is a physically instantiated summary table that the optimizer can redirect the query to because it understands the relationship between the detail and summary table based on the view definition.

OLAP databases, such as Microsoft Analysis Services and Oracle's Hyperion Essbase (I'm not sure what they call it now), provide aggregate definition, maintenance, and navigation as part of their base engine. It happens by default within the engine and is part of how they get great performance.

Other tools provide the same functionality in a middle tier server. This "ROLAP" category includes MicroStrategy and IBM's OLAP Accelerator (name might have changed).

Whatever tool you use, you want the aggregates to be in a part of your architecture that is accessible transparently by all front end (BI) tools. It's not a great idea to be creating, distributing, and managing lots of redundant aggregates around the enterprise. The same aggregate table should be available to anyone who asks for total Sales by year. It should be tightly tied to the detail data so you keep the single version of the truth.

So, there are several ways to provide the performance enhancement of aggregates along with the transparency of aggregate navigation, so users won't have to figure out which aggregate table is the best one for each query.

Ralph wrote a detailed article on the underlying concepts of creating and navigating aggregates almost fifteen years ago. You can see it on the Kimball Group website. Also, try googling "data warehouse aggregate tables" for other articles.

--Warren

kclark Wed Dec 01, 2010 6:59 pm

Wow! Thanks Warren. My database is sql server 2008 and using a Cognos tool, Framework Manager. So creating a view in SSMS is the way to go. Excellent. I understand what you were saying about the sales BUT what if the summary would be in the combination of 3 fact tables...? Quick breakdown of the setup. We have on table...Orders: OrderID, ProductID, StockCollectionID, OrderType, QuantityRequested, Total, etc.
Then we have Agent StockCollection Efficency: AgentID, StockCollectionID, ProductID, ReorderThreshold, OptimalQuantity, etc. and Location StockCollection Efficiency: LocationID, StockCollectionID, ProductID, ReorderThreshold, OptimalQuantity, etc.
The difference between the last two is one is Agent and the other Location. The customer wants to know the # of Orders by StockCollectionType, Item, and OrderType.
So, I began working on creating a view that combines the two efficiency tables and includes the OrderID from teh Orders table to obtain this count. Is this correct???

ngalemmo Thu Dec 02, 2010 12:11 am

What you desribe is an aggregate table... aggregate is an interesting word... it can mean to combine disparate things into a single mass (such as concrete) or to summarize data. I tend to use aggregate in the former sense, and summary fact to describe a fact table that is simply a reduction in grain of a single atomic fact.

Creating aggregates is a common technique to improve performance when users typically combine multiple atomic facts.

kclark Mon Dec 06, 2010 11:51 am

Oh ok, I didn't know they were different terms. I was looking at some tips for aggregate tables and I am getting a little confused as to what to include in them. I just need only what the user is requesting...so no additional information like other dimensions or measures. Just the count the specified "categories"...is that right?

Sponsored content

Summary Tables

Summary Tables

Re: Summary Tables

Re: Summary Tables

Re: Summary Tables

Re: Summary Tables

Re: Summary Tables