Number of Columns in Fact Tables vs. Dimension Tables

mostafa_mahrous75 Mon Dec 01, 2014 5:37 am

Hi There

Fact Table is assumed to have large number of records, that is why it should include only numeric columns such as Foreign Keys, Measures and Degenerate Dimensions (if any) in order to minimize IO for accessing such large tables, for character columns we should move it to one of the surrounding dimensions in Star Schema (ex. in a Junk Dimension)

On the other hand, Dimension Table should theoretically include all the attributes that describe the entity (numeric, alpha-numeric and character columns) resulting in a wide dimension table and therefore less number of records can be stored in a single data block which consequently increases IO for accessing such wide Dimension Tables. This issue is obvious with Monster Dimensions such as Customer or Account Dimensions

Mini-dimension technique is a good practice to enhance Dimension Table IO for low cardinality attributes

Now I'm thinking of high cardinality attributes such as address, notes/comments in addition to the other less important attributes, what about if we split it into another table that is 1-1 joined to the main Dimension? This way, the number of attributes in the main dimension is less resulting in more rows to be stored in a single data block, however we "snow flaking" the dimension which results in more complex model and low performance query (in case we need to any attributes from the "extra" dimension table

Appreciating your feedback to assess the suggested technique and propose any suitable technique to resolve that dilemma

nick_white Mon Dec 01, 2014 7:33 am

Hi,
you seem to be discussing a theoretical situation rather than an actual situation - if that is the case then I would stick to the standard Kimball best practice and not worry about theoretical situations until they actually occur. Splitting some attributes out of the main dimension might improve query performance - but if both approaches result in sub-second queries then why spend time looking at different approaches? Keeps things as simple as possible unless you know you have to make them more complex.
If you do have query performance issues then before changing your dimensional design I would look at all the more usual performance tuning techniques: has the hardware been spec'd appropriately; is the partitioning/indexing strategy correct; have appropriate aggregates been built; etc?

If you do need to change your dimensional model then I doubt there is any best practice or template you could follow - any solution is going to be (experience-driven) trial and error on your part. The reason for this is that performance is so dependent on data structures, data volumes, hardware, types of queries, volumes of queries,etc. that what works for one implementation won't necessarily work for another.
One person might split some attributes out of a main dimension and see a massive improvement while another person do something that appears to be similar but then sees worse performance.

Number of Columns in Fact Tables vs. Dimension Tables

Number of Columns in Fact Tables vs. Dimension Tables

Re: Number of Columns in Fact Tables vs. Dimension Tables