Kimball Dimensions
2 posters
Page 1 of 1
Kimball Dimensions
I got into a nice heated debate the other day. The item of discussion was flattened dimensions. I’d like to know your opinion (or facts) about the following, (Some of this is common sense… please still give me some feedback…). Any article citations would be greatly appreciated.
1.Flattening data (such as many [but limited] codes to describe one event) can entail creating columns to describe each code and populating the rows with a flag to indicate the value occurred. Correct?
2.Flattening data will bring the dimension’s data to the granularity of the related fact table (one row per fact row = 1 row in the dimension). Correct?
3.Flattening the data for all permutations can save data storage space and ETL processing time (when built for all permutations). Correct?
4.The flattened dimension could be considered friendlier for customers interacting directly with the fact and dimension tables (enterprise users working through Business Objects & SAS). Correct?
5.The “Kimball camp” prefers to flatten data (dimensions) unless there is a data, business, or technical requirement or limitation forcing an outrigger or bridge scenario. Correct?
6.All granularity of the dimensional model is determined by the fact tables. This is the level where the “measured amounts” occurred in the source system. Flattening the dimension’s data is not “changing the granularity” rather; it’s bringing dimension’s data to the established granularity. Correct?
In statements 1, 2, and 3 let’s say the codes are static – they don’t change – and are limited to six different codes. These codes are AA, AB, AC, BB, BC, BD. With no requirements for either format, which layout would the “Kimball camp” prefer?
Method A: (This table will have the same row count as the source system -millions of rows – and require an ETL for every DW update.)
Surrogate Code Code Description
1234 AA Entry sent to Account Sys
1234 AB Entry sent to Reconcile Sys
1234 AC Entry sent to Archive Sys
4321 BB Entry sent to Audit Sys
4321 BD Entry sent to Federal Review Sys
6677 BC Entry sent to State Review Sys
Method B: (This table will be built for all permutations – 64 rows – and requires no ETL updates)
Surrogate AA Account AB Reconcile AC Archive BB Audit BC State BD Federal
1234 Y Y Y N N N
4321 N N N Y N Y
6677 N N N N Y N
Thanks in advance for any feedback.
1.Flattening data (such as many [but limited] codes to describe one event) can entail creating columns to describe each code and populating the rows with a flag to indicate the value occurred. Correct?
2.Flattening data will bring the dimension’s data to the granularity of the related fact table (one row per fact row = 1 row in the dimension). Correct?
3.Flattening the data for all permutations can save data storage space and ETL processing time (when built for all permutations). Correct?
4.The flattened dimension could be considered friendlier for customers interacting directly with the fact and dimension tables (enterprise users working through Business Objects & SAS). Correct?
5.The “Kimball camp” prefers to flatten data (dimensions) unless there is a data, business, or technical requirement or limitation forcing an outrigger or bridge scenario. Correct?
6.All granularity of the dimensional model is determined by the fact tables. This is the level where the “measured amounts” occurred in the source system. Flattening the dimension’s data is not “changing the granularity” rather; it’s bringing dimension’s data to the established granularity. Correct?
In statements 1, 2, and 3 let’s say the codes are static – they don’t change – and are limited to six different codes. These codes are AA, AB, AC, BB, BC, BD. With no requirements for either format, which layout would the “Kimball camp” prefer?
Method A: (This table will have the same row count as the source system -millions of rows – and require an ETL for every DW update.)
Surrogate Code Code Description
1234 AA Entry sent to Account Sys
1234 AB Entry sent to Reconcile Sys
1234 AC Entry sent to Archive Sys
4321 BB Entry sent to Audit Sys
4321 BD Entry sent to Federal Review Sys
6677 BC Entry sent to State Review Sys
Method B: (This table will be built for all permutations – 64 rows – and requires no ETL updates)
Surrogate AA Account AB Reconcile AC Archive BB Audit BC State BD Federal
1234 Y Y Y N N N
4321 N N N Y N Y
6677 N N N N Y N
Thanks in advance for any feedback.
kdot- Posts : 3
Join date : 2010-07-20
Re: Kimball Dimensions
Sure, this will work if each attribute is well defined and the number of attributes are relatively small. It makes sense in the scenario you defined.
It doesn't work well when there are a lot of potential code values, all representing the same thing, and multiple values apply to the same fact.... i.e. a many-to-many relationship. Situations like that require a bridge table. A common example are diagnoses relating to a medical claim.
It doesn't work well when there are a lot of potential code values, all representing the same thing, and multiple values apply to the same fact.... i.e. a many-to-many relationship. Situations like that require a bridge table. A common example are diagnoses relating to a medical claim.
Similar topics
» too many dimensions - after trying all the golden Kimball rules
» dimension attribute denormalisation in fact table
» Kimball or not Kimball type 2 dimension (sk as primary key)
» Schema Roadblock: Dimension vs Fact [Dimensions w/FKs to other Dimensions?]
» Question about Dimensions with SKeys to join with other dimensions.
» dimension attribute denormalisation in fact table
» Kimball or not Kimball type 2 dimension (sk as primary key)
» Schema Roadblock: Dimension vs Fact [Dimensions w/FKs to other Dimensions?]
» Question about Dimensions with SKeys to join with other dimensions.
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|