Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...
2 posters
Page 1 of 1
Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...
Hi All -
This inquiry could actually be two questions, but the requirement is really presentation of row-wise high volume transactional data in a column-wise (columnar) FACT table.
Here is a sample transactional data set:
Here are some data requirements:
1. Pivot Interval Date Time into columns, each column at a 5-minute increment for up to a day (288 columns); DIM levels in row;
2. Pivot Channel data into columns, Interval data in rows; Consider up to 47 measure columns;
1. Here is a sample of columnar interval dates with channel '1' values (requirement #1):
2. Here is a sample of columnar measures with channels, interval date time as rows (requirement #2):
With regard to guidelines, I am trying to understand design thresholds with respect to FACT tables supported within a dimensional model (happens to be Oracle 11g RDBMS) - how many FACT measures = 'too much', etc. I understand my alternatives for columnar store DBs / appliances, had to ask this question.
If there are any high-volume gurus out there that have modeled for heavy pivot data and understand these thresholds, I would definitely owe you one! Any referenceable periodical would certainly help too (yes, I've read all of Dr. Kimball's work!).
Take care.
This inquiry could actually be two questions, but the requirement is really presentation of row-wise high volume transactional data in a column-wise (columnar) FACT table.
Here is a sample transactional data set:
Interval Date Time | Unit ID | Channel | Value |
7/2/2009 12:00:00.000000 AM | 24781 | 1 | 43.75 |
7/2/2009 12:00:00.000000 AM | 24781 | 3 | 52.17 |
7/2/2009 12:05:00.000000 AM | 24781 | 1 | 52.35 |
7/2/2009 12:05:00.000000 AM | 24781 | 3 | 51.98 |
7/2/2009 12:10:00.000000 AM | 24781 | 1 | 51.96 |
7/2/2009 12:10:00.000000 AM | 24781 | 3 | 56.78 |
7/2/2009 12:15:00.000000 AM | 24781 | 1 | 59.77 |
7/2/2009 12:15:00.000000 AM | 24781 | 3 | 62.75 |
7/2/2009 12:20:00.000000 AM | 24781 | 1 | 65.44 |
7/2/2009 12:20:00.000000 AM | 24781 | 3 | 65.53 |
7/2/2009 12:25:00.000000 AM | 24781 | 1 | 65.75 |
7/2/2009 12:25:00.000000 AM | 24781 | 3 | 64.78 |
7/2/2009 012:30:00.000000 AM | 24781 | 1 | 65.76 |
7/2/2009 012:30:00.000000 AM | 24781 | 3 | 65.76 |
Here are some data requirements:
1. Pivot Interval Date Time into columns, each column at a 5-minute increment for up to a day (288 columns); DIM levels in row;
2. Pivot Channel data into columns, Interval data in rows; Consider up to 47 measure columns;
1. Here is a sample of columnar interval dates with channel '1' values (requirement #1):
7/02 12:00am | 7/02 12:05am | 7/02 12:10am | 7/02 12:15am | 7/02 12:20am | 7/02 12:25am | 7/02 12:30am | ||
Level Value 1 | 43.75 | 52.35 | 51.96 | 59.77 | 65.44 | 65.75 | 65.76 | |
Level Value 2 | 99.99 | ... | ... | ... | ... | ... | ... |
2. Here is a sample of columnar measures with channels, interval date time as rows (requirement #2):
Channel 1 Amt | Channel 2 Amt | Channel 3 Amt | Channel 4 Amt | Channel 5 Amt | Channel 6 Amt | ... | ||
7/02 12:00am | 43.75 | 52.35 | 51.96 | 59.77 | 65.44 | 65.75 | 65.76 | |
7/02 12:05am | 99.99 | ... | ... | ... | ... | ... | ... |
With regard to guidelines, I am trying to understand design thresholds with respect to FACT tables supported within a dimensional model (happens to be Oracle 11g RDBMS) - how many FACT measures = 'too much', etc. I understand my alternatives for columnar store DBs / appliances, had to ask this question.
If there are any high-volume gurus out there that have modeled for heavy pivot data and understand these thresholds, I would definitely owe you one! Any referenceable periodical would certainly help too (yes, I've read all of Dr. Kimball's work!).
Take care.
kminboston- Posts : 2
Join date : 2009-11-13
Re: Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...
The right way to do this is in steps...
The first thing is an atomic fact table for the measures being collected. It would essentially mimic the transaction feed with a date dimension, time dimension, unit dimension, channel dimension and value measure.
The date dimension would be by day and the time dimension would be independent of day and at whatever level of granularity you need (for example, by minute). You may also consider storing the timestamp itself if you need the true time for other purposes.
The pivots, on the other hand, are aggregations of the facts. The question is, do you need to materialize these? Any decent BI tool will do the pivots and perform well with the base fact table as it is. The need to actually build aggregate tables will depend on service level requirements. But, if you have the atomic fact table, building aggregates is a simple task. If you are using Oracle, they have SQL extensions that will pivot the data for you. Building a table with 300 columns is no big deal from a database standpoint... but can be a problem from a BI tool/user interface point of view.
I don't get your comment: "I understand my alternatives for columnar store DBs / appliances, had to ask this question." ... These really don't make a significant difference from a modelling standpoint (other than physical considerations).
By the way, what is "high volume"? 288 data points/day/unit/channel for how many units and channels?
The first thing is an atomic fact table for the measures being collected. It would essentially mimic the transaction feed with a date dimension, time dimension, unit dimension, channel dimension and value measure.
The date dimension would be by day and the time dimension would be independent of day and at whatever level of granularity you need (for example, by minute). You may also consider storing the timestamp itself if you need the true time for other purposes.
The pivots, on the other hand, are aggregations of the facts. The question is, do you need to materialize these? Any decent BI tool will do the pivots and perform well with the base fact table as it is. The need to actually build aggregate tables will depend on service level requirements. But, if you have the atomic fact table, building aggregates is a simple task. If you are using Oracle, they have SQL extensions that will pivot the data for you. Building a table with 300 columns is no big deal from a database standpoint... but can be a problem from a BI tool/user interface point of view.
I don't get your comment: "I understand my alternatives for columnar store DBs / appliances, had to ask this question." ... These really don't make a significant difference from a modelling standpoint (other than physical considerations).
By the way, what is "high volume"? 288 data points/day/unit/channel for how many units and channels?
Similar topics
» Customer addresses in a high volume retail environment
» Fact table design: Sales Transaction with multiple Discount rows
» Transaction fact table and Transaction line item fact table
» Replacing High Date with database function returning current timestamp.
» How to create fact table with measures derived from comparing two fact table rows
» Fact table design: Sales Transaction with multiple Discount rows
» Transaction fact table and Transaction line item fact table
» Replacing High Date with database function returning current timestamp.
» How to create fact table with measures derived from comparing two fact table rows
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum