Multiple different grain fact tables with lot of common dimensions.

Abhiraizada Wed Oct 12, 2011 5:38 am

Hi,

I am faced with one scenario where we have 3 fact tables -

1. Case fact - one record per case
2. Case status fact - one record per case per status change with time
3. Case status aggregate - one record per case per status ( kind of aggregate where data is rolled up from Case status fact however resulting in not so impressive compression ratio)

Apart from that we have large number of attributes called as "Case attributes" present in Case fact such as - Case type, Case state, Case owner user, Case Priority etc. ( 20 + such Case attributes), and all are implemented through Reference Code Dimension. (Let me know if more description is required about its design, for better understanding).

Problem -

We have customers creating reporting using common dimensions and "Case attributes" from Case fact and taking measures from Case status aggregate or Case status fact which is resulting in "No Data" as these case attributes (implemented through Reference Code Dimension) are not conformed across fact tables.

Hence to address such requirements (which are many !) we are planning to propagate these Case attributes to all the other fact tables, which will enable them to create such reports with out any problem. Doing this will not cause any grain mismatch as all listed fact tables have CASE_ID (PK for Case) and can accommodate these "Case attributes". Please provide your suggestions on this approach, any other suggestions are also welcomed.

Regards,
Abhiraizada

BoxesAndLines Wed Oct 12, 2011 11:08 am

When you say propogate, what exactly are you planning on doing? Adding dimensions or copying columns?

ngalemmo Wed Oct 12, 2011 11:29 am

Any fact table should contain references to its appropriate dimensions. Integration across facts can only occur where common dimensions and/or attributes exist. So, if a dimension is appropriate for a fact, it should be included.

I am not a fan of this "reference code" dimension you refer to. I assume what you are talking about is a generalized code/description table with an FK on the fact to reference the particular code and desription that applies. Where there may be 20+ codes involved, you have a corresponding 20+ foreign keys in the fact. Is my assumption correct?

hkandpal Wed Oct 12, 2011 8:18 pm

Hi,

if you want to measure the facts against a common dimension then it should be present in all the fact tables.
How many different dimension will you be including ?
Could you please give us more detail about how the different fact tables are related to each other(Case fact, Case status fact and Case status aggregate ).

Thanks

Himanshu

Abhiraizada Thu Oct 13, 2011 1:47 am

BoxesAndLines wrote:When you say propogate, what exactly are you planning on doing? Adding dimensions or copying columns?

We are planning to add more dimensions to the fact tables ( Case status aggregate and Case status Fact). However all of the new dimension fk's in fact tables will only refer to Reference Code Dimension.

Hope his help !

Regards,
Abhiraizada

Abhiraizada Thu Oct 13, 2011 1:53 am

ngalemmo wrote:Any fact table should contain references to its appropriate dimensions. Integration across facts can only occur where common dimensions and/or attributes exist. So, if a dimension is appropriate for a fact, it should be included.

I am not a fan of this "reference code" dimension you refer to. I assume what you are talking about is a generalized code/description table with an FK on the fact to reference the particular code and desription that applies. Where there may be 20+ codes involved, you have a corresponding 20+ foreign keys in the fact. Is my assumption correct?

Thanks for this quick response...now i can start working on implementing its logical design.

And yes the 2 fact tables - Case status Fact and Case status aggregate will have 20+ additional Fk's from Reference Code Dimension.

I can also understand why you don't like the common "Reference Code" dimension, but in our scenario its easy way out as something similar has been implemented in source system and having this simplifies things for me.

Regards,
Abhiraizada

hang Thu Oct 13, 2011 2:26 am

I would get rid of Reference Code Dimension and regroup the attributes into a number of junk dimensions for all the low cardinality attributes, and create a few stand alone dimensions for high cardinality attributes. Hopefully that will cut down the number of dimensions singnificantly so that you can conform your fact tables on minimal number of dimensions.

BoxesAndLines Thu Oct 13, 2011 9:26 am

Big, unified code tables are bad design in OLTP and dimensional modeling. +1 on what Hang said.

Abhiraizada Thu Oct 13, 2011 11:52 am

BoxesAndLines wrote:Big, unified code tables are bad design in OLTP and dimensional modeling. +1 on what Hang said.

What are major draw backs because of such code tables ? Where we have 3 major columns as - Code type, Code value and Code desc. Currently our reference code table has around 30000 records which is never going to increase as its configurable data.

The major advantage i see with them being present in OLTP/OLAP as it standardize all such Ref. Codes together which is better in terms of reducing number of tables and also make it more flexible in terms of extension/standardization/translation etc.

Let me know if i am missing any point here
Regards,
Abhiraizada

BoxesAndLines Thu Oct 13, 2011 5:17 pm

It removes the ability of the database to enforce business rules. One code is varchar, another code is numeric. I can no longer enforce the minimum basics of selecting the appropriate datatype. Additionally, I can no longer enforce RI, add hierarchies, and may introduce concurrency issues since everyone in the company now uses the same table.

ngalemmo Thu Oct 13, 2011 5:54 pm

From a performance standpoint, having a wide fact table will hurt. It can also be problematic for some BI tools to deal with it.

As it has been suggested, you are better off grouping correlated or low cardinality attributes into junk dimensions to significantly reduce the number of FKs in the fact table.

To make change detection easier, you could maintain the code table in the DW to support the ETL process, but do not expose it to users and do not reference it from fact tables. Another use for it would be to support 'list of values' type dropdowns in a BI tool.

hang Thu Oct 13, 2011 8:01 pm

Almost all the legacy systems have this type of generic code table in both OLTP and DW systems due to ignorance of industry best practice. As a dimensional modeler for DW system, it's crucial to not get influenced by OLTP design, let alone a bad one like code table. Instead, you should remodel the same data in dimensional mindset.

What do you gain by reducing many entities into one table? very little but at huge price. Firstly for OLTP system, it confuses model users as they have to guess how to use the table or rely on some documentation outside the ER diagram. The design goes against normalization principle which is supposed to break up big anomaly tables into more normalized tables. The only possible benefit is that it somehow could help front-end developer to centralize their code. However data modeling should not compromise its principle to just serve the front-end at price of an unclear model.

More importantly in dimensional modeling, this over-normalized 20+ dimension code table should be the very target that needs to be flatend/denormalized into fewer dimensions to achieve two major goals in a DW system. A ease of use, as the new dimensional model becomes clear. B performance, as number of FK in fact table is minimized hence saving the space and joins.

Sponsored content

Multiple different grain fact tables with lot of common dimensions.

Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.

Re: Multiple different grain fact tables with lot of common dimensions.