Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...

kminboston Fri Nov 13, 2009 5:44 pm

Hi All -

This inquiry could actually be two questions, but the requirement is really presentation of row-wise high volume transactional data in a column-wise (columnar) FACT table.

Here is a sample transactional data set:

Interval Date Time	Unit ID	Channel	Value
7/2/2009 12:00:00.000000 AM	24781	1	43.75
7/2/2009 12:00:00.000000 AM	24781	3	52.17
7/2/2009 12:05:00.000000 AM	24781	1	52.35
7/2/2009 12:05:00.000000 AM	24781	3	51.98
7/2/2009 12:10:00.000000 AM	24781	1	51.96
7/2/2009 12:10:00.000000 AM	24781	3	56.78
7/2/2009 12:15:00.000000 AM	24781	1	59.77
7/2/2009 12:15:00.000000 AM	24781	3	62.75
7/2/2009 12:20:00.000000 AM	24781	1	65.44
7/2/2009 12:20:00.000000 AM	24781	3	65.53
7/2/2009 12:25:00.000000 AM	24781	1	65.75
7/2/2009 12:25:00.000000 AM	24781	3	64.78
7/2/2009 012:30:00.000000 AM	24781	1	65.76
7/2/2009 012:30:00.000000 AM	24781	3	65.76

Here are some data requirements:
1. Pivot Interval Date Time into columns, each column at a 5-minute increment for up to a day (288 columns); DIM levels in row;
2. Pivot Channel data into columns, Interval data in rows; Consider up to 47 measure columns;

1. Here is a sample of columnar interval dates with channel '1' values (requirement #1):

	7/02 12:00am	7/02 12:05am	7/02 12:10am	7/02 12:15am	7/02 12:20am	7/02 12:25am	7/02 12:30am
Level Value 1	43.75	52.35	51.96	59.77	65.44	65.75	65.76
Level Value 2	99.99	...	...	...	...	...	...

2. Here is a sample of columnar measures with channels, interval date time as rows (requirement #2):

	Channel 1 Amt	Channel 2 Amt	Channel 3 Amt	Channel 4 Amt	Channel 5 Amt	Channel 6 Amt	...
7/02 12:00am	43.75	52.35	51.96	59.77	65.44	65.75	65.76
7/02 12:05am	99.99	...	...	...	...	...	...

With regard to guidelines, I am trying to understand design thresholds with respect to FACT tables supported within a dimensional model (happens to be Oracle 11g RDBMS) - how many FACT measures = 'too much', etc. I understand my alternatives for columnar store DBs / appliances, had to ask this question.

If there are any high-volume gurus out there that have modeled for heavy pivot data and understand these thresholds, I would definitely owe you one! Any referenceable periodical would certainly help too (yes, I've read all of Dr. Kimball's work!).

Take care.

ngalemmo Fri Nov 13, 2009 6:15 pm

The right way to do this is in steps...

The first thing is an atomic fact table for the measures being collected. It would essentially mimic the transaction feed with a date dimension, time dimension, unit dimension, channel dimension and value measure.

The date dimension would be by day and the time dimension would be independent of day and at whatever level of granularity you need (for example, by minute). You may also consider storing the timestamp itself if you need the true time for other purposes.

The pivots, on the other hand, are aggregations of the facts. The question is, do you need to materialize these? Any decent BI tool will do the pivots and perform well with the base fact table as it is. The need to actually build aggregate tables will depend on service level requirements. But, if you have the atomic fact table, building aggregates is a simple task. If you are using Oracle, they have SQL extensions that will pivot the data for you. Building a table with 300 columns is no big deal from a database standpoint... but can be a problem from a BI tool/user interface point of view.

I don't get your comment: "I understand my alternatives for columnar store DBs / appliances, had to ask this question." ... These really don't make a significant difference from a modelling standpoint (other than physical considerations).

By the way, what is "high volume"? 288 data points/day/unit/channel for how many units and channels?

Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...

Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...

Re: Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...