Centipede Ok in this situation?

makeitso Tue Apr 12, 2011 2:21 pm

Hi,

I am attempting to model Public Safety type incidents (span many years up thru current incidents) and have run into the "centipede" design issue. Incidents have about 30 "codes" that can have from 2 to 100 distinct values. I looked at using a junk dimension, but there are just too many possible combinations. Should I build a fact table (incident) with over 30 dimensions, one for each of the codes?

For example some of the codes associated with each incident are:
Heat source code
Area of origin code
Cause of ignition code
etc.

The fact table would then have a HeatSourceId which links to the HeatSourceType table where both the HeatSourceCode and HeatSourceDescription, etc. would be found (?)

Thanks for any help on this...

caderoux Tue Apr 12, 2011 4:02 pm

You could always use several junk dimensions (i.e. a compromise between one junk dimension and 30 dimensions). If you have 30 dimension attributes with 100 values each, then it is 10^60 (i.e. 100^30) combinations in a single dimension, but if you break that up into 3 dimensions, each with 10 attributes with 100 values, then it is 10^20 (100^10) in each - go to 6 and it's 10^10 (100^5).

And there many be combinations which never occur, which can severely cut down on the actual storage in any given junk dimension. Choice of which columns to go in which junk dimension would then also be important - and of course, the closer you start to ascribing any meaning to a particular dimension based upon what you've chosen to go in there, the less it looks like a junk dimension...

makeitso Tue Apr 12, 2011 4:39 pm

Thanks caderoux, that may be what I do. Here's something else I thought of doing....

Since there is not going to be any other information about these "codes" other than their description, I could just make both the "code" and its description attributes of the main incident fact table. Would this be a limit to BI tools when it comes to querying on these attributes from the fact table rather than having them in dimensions? I plan on using MS Analysis Services as the front end slice and dice tool.

John Simon Tue Apr 12, 2011 7:19 pm

Do not include these attributes in your fact table - it will adversely affect performance for both SQL and SSAS.
You will also need to create separate dimensions for these attributes anyway, so save yourself the heart-ache and do it the right way from the begininning.
Caderoux's advice sounds appropriate to me.

hang Tue Apr 12, 2011 7:59 pm

makeitso wrote:I could just make both the "code" and its description attributes of the main incident fact table.

I would not do that, simply because it’s not a good practice. although SSAS can be configured to use fact (degenerate) dimensions. It seems to me that all your “codes” are of low cardinality which are unnecessary waste of storage due to high data redundancy on the textual attributes in the fact table. You will eventually pay the price for bad performance of your fact table.

I agree with Caderous and John. Combine low cardinality (<10) attributes into 1 or several junk dimensions and put slightly high cardinality (>10) dimensions into separate dimensions and have FKs in the fact table. Only put attributes as degenerate dimension in the fact table that are off very high cardinality comparable to the level of fact itself.

makeitso Wed Apr 13, 2011 12:10 pm

Thanks for the recommendations John and hang. "Junk" dimensions seem to be the way to go.

BrianJarrett Sat Apr 16, 2011 2:25 am

Code:: SELECT HEAT_SOURCE_CODE_DIM.CODE_ID AS HEAT_SOURCE_CODE_ID ,HEAT_SOURCE_CODE_DIM.DESCRIPTION AS HEAT_SOURCE_CODE_DESCRIPTION ,AREA_OF_ORIGIN_DIM.DESCRIPTION AS AREA_OF_ORIGIN_DESCRIPTION ,FACT.QTY FROM YOUR_FACT_TABLE FACT INNER JOIN CODE_DIM HEAT_SOURCE_CODE_DIM ON HEAT_SOURCE_CODE_DIM.CODE_DIM_KEY = FACT.HEAT_SOURCE_CODE_DIM_PK INNER JOIN CODE_DIM AREA_OF_ORIGIN_CODE_DIM ON AREA_OF_ORIGIN_DIM.CODE_DIM_PK = FACT.AREA_OF_ORIGIN_CODE_DIM_PK WHERE ...

Then do this aliasing of CODE_DIM for however many different codes you have stored in your fact table. This avoids having to physically build and maintain 30 different code dimension tables. If you have a tool like Business Objects you can create these aliased tables as actual objects in the semantic layer, then it'll build the queries properly for you without having to write the SQL again. We do this where I work now and it works great.

Also where I work we have a fact table with almost 70 surrogate keys. It performs fine, but the reality is that we're not joining to every single surrogate key field in every single query. Centipede avoidance is a good rule of thumb, but in reality it's doable if necessary.

Hope this helps.

Sponsored content

Centipede Ok in this situation?

Centipede Ok in this situation?

Re: Centipede Ok in this situation?

Re: Centipede Ok in this situation?

Re: Centipede Ok in this situation?

Re: Centipede Ok in this situation?

Re: Centipede Ok in this situation?

Re: Centipede Ok in this situation?

Re: Centipede Ok in this situation?