Centipede Ok in this situation?
5 posters
Page 1 of 1
Centipede Ok in this situation?
Hi,
I am attempting to model Public Safety type incidents (span many years up thru current incidents) and have run into the "centipede" design issue. Incidents have about 30 "codes" that can have from 2 to 100 distinct values. I looked at using a junk dimension, but there are just too many possible combinations. Should I build a fact table (incident) with over 30 dimensions, one for each of the codes?
For example some of the codes associated with each incident are:
Heat source code
Area of origin code
Cause of ignition code
etc.
The fact table would then have a HeatSourceId which links to the HeatSourceType table where both the HeatSourceCode and HeatSourceDescription, etc. would be found (?)
Thanks for any help on this...
I am attempting to model Public Safety type incidents (span many years up thru current incidents) and have run into the "centipede" design issue. Incidents have about 30 "codes" that can have from 2 to 100 distinct values. I looked at using a junk dimension, but there are just too many possible combinations. Should I build a fact table (incident) with over 30 dimensions, one for each of the codes?
For example some of the codes associated with each incident are:
Heat source code
Area of origin code
Cause of ignition code
etc.
The fact table would then have a HeatSourceId which links to the HeatSourceType table where both the HeatSourceCode and HeatSourceDescription, etc. would be found (?)
Thanks for any help on this...
makeitso- Posts : 3
Join date : 2011-04-12
Re: Centipede Ok in this situation?
You could always use several junk dimensions (i.e. a compromise between one junk dimension and 30 dimensions). If you have 30 dimension attributes with 100 values each, then it is 10^60 (i.e. 100^30) combinations in a single dimension, but if you break that up into 3 dimensions, each with 10 attributes with 100 values, then it is 10^20 (100^10) in each - go to 6 and it's 10^10 (100^5).
And there many be combinations which never occur, which can severely cut down on the actual storage in any given junk dimension. Choice of which columns to go in which junk dimension would then also be important - and of course, the closer you start to ascribing any meaning to a particular dimension based upon what you've chosen to go in there, the less it looks like a junk dimension...
And there many be combinations which never occur, which can severely cut down on the actual storage in any given junk dimension. Choice of which columns to go in which junk dimension would then also be important - and of course, the closer you start to ascribing any meaning to a particular dimension based upon what you've chosen to go in there, the less it looks like a junk dimension...
caderoux- Posts : 8
Join date : 2009-02-03
Re: Centipede Ok in this situation?
Thanks caderoux, that may be what I do. Here's something else I thought of doing....
Since there is not going to be any other information about these "codes" other than their description, I could just make both the "code" and its description attributes of the main incident fact table. Would this be a limit to BI tools when it comes to querying on these attributes from the fact table rather than having them in dimensions? I plan on using MS Analysis Services as the front end slice and dice tool.
Since there is not going to be any other information about these "codes" other than their description, I could just make both the "code" and its description attributes of the main incident fact table. Would this be a limit to BI tools when it comes to querying on these attributes from the fact table rather than having them in dimensions? I plan on using MS Analysis Services as the front end slice and dice tool.
makeitso- Posts : 3
Join date : 2011-04-12
Re: Centipede Ok in this situation?
Do not include these attributes in your fact table - it will adversely affect performance for both SQL and SSAS.
You will also need to create separate dimensions for these attributes anyway, so save yourself the heart-ache and do it the right way from the begininning.
Caderoux's advice sounds appropriate to me.
You will also need to create separate dimensions for these attributes anyway, so save yourself the heart-ache and do it the right way from the begininning.
Caderoux's advice sounds appropriate to me.
Re: Centipede Ok in this situation?
I would not do that, simply because it’s not a good practice. although SSAS can be configured to use fact (degenerate) dimensions. It seems to me that all your “codes” are of low cardinality which are unnecessary waste of storage due to high data redundancy on the textual attributes in the fact table. You will eventually pay the price for bad performance of your fact table.makeitso wrote:I could just make both the "code" and its description attributes of the main incident fact table.
I agree with Caderous and John. Combine low cardinality (<10) attributes into 1 or several junk dimensions and put slightly high cardinality (>10) dimensions into separate dimensions and have FKs in the fact table. Only put attributes as degenerate dimension in the fact table that are off very high cardinality comparable to the level of fact itself.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Centipede Ok in this situation?
Thanks for the recommendations John and hang. "Junk" dimensions seem to be the way to go.
makeitso- Posts : 3
Join date : 2011-04-12
Re: Centipede Ok in this situation?
If the only attribute about these codes is their description, couldn't you push them all into the same dimension table, then alias them for each reference in the fact table? So you'd first mash all the codes into a single dimension, including the type. Let's call it CODE_DIM.
CODE_DIM_KEY | CODE_TYPE | CODE_ID | DESCRIPTION
================================
1 | HEAT_SOURCE | A | some a descr
2 | HEAT_SOURCE | B | some b descr
3 | HEAT_SOURCE | C | some c descr
4 | AREA_OF_ORIGIN | X | some x descr
5 | AREA_OF_ORIGIN | y | some y descr
...and so on.
(Your natural key on this table is CODE_TYPE and CODE_ID.)
Your fact table will have 30 discreet code dim key fields, named differently for each type of code. Then alias your code dim table in your queries for each fact table surrogate key:
Then do this aliasing of CODE_DIM for however many different codes you have stored in your fact table. This avoids having to physically build and maintain 30 different code dimension tables. If you have a tool like Business Objects you can create these aliased tables as actual objects in the semantic layer, then it'll build the queries properly for you without having to write the SQL again. We do this where I work now and it works great.
Also where I work we have a fact table with almost 70 surrogate keys. It performs fine, but the reality is that we're not joining to every single surrogate key field in every single query. Centipede avoidance is a good rule of thumb, but in reality it's doable if necessary.
Hope this helps.
CODE_DIM_KEY | CODE_TYPE | CODE_ID | DESCRIPTION
================================
1 | HEAT_SOURCE | A | some a descr
2 | HEAT_SOURCE | B | some b descr
3 | HEAT_SOURCE | C | some c descr
4 | AREA_OF_ORIGIN | X | some x descr
5 | AREA_OF_ORIGIN | y | some y descr
...and so on.
(Your natural key on this table is CODE_TYPE and CODE_ID.)
Your fact table will have 30 discreet code dim key fields, named differently for each type of code. Then alias your code dim table in your queries for each fact table surrogate key:
- Code:
SELECT
HEAT_SOURCE_CODE_DIM.CODE_ID AS HEAT_SOURCE_CODE_ID
,HEAT_SOURCE_CODE_DIM.DESCRIPTION AS HEAT_SOURCE_CODE_DESCRIPTION
,AREA_OF_ORIGIN_DIM.DESCRIPTION AS AREA_OF_ORIGIN_DESCRIPTION
,FACT.QTY
FROM
YOUR_FACT_TABLE FACT
INNER JOIN CODE_DIM HEAT_SOURCE_CODE_DIM ON
HEAT_SOURCE_CODE_DIM.CODE_DIM_KEY = FACT.HEAT_SOURCE_CODE_DIM_PK
INNER JOIN CODE_DIM AREA_OF_ORIGIN_CODE_DIM ON
AREA_OF_ORIGIN_DIM.CODE_DIM_PK = FACT.AREA_OF_ORIGIN_CODE_DIM_PK
WHERE
...
Then do this aliasing of CODE_DIM for however many different codes you have stored in your fact table. This avoids having to physically build and maintain 30 different code dimension tables. If you have a tool like Business Objects you can create these aliased tables as actual objects in the semantic layer, then it'll build the queries properly for you without having to write the SQL again. We do this where I work now and it works great.
Also where I work we have a fact table with almost 70 surrogate keys. It performs fine, but the reality is that we're not joining to every single surrogate key field in every single query. Centipede avoidance is a good rule of thumb, but in reality it's doable if necessary.
Hope this helps.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Similar topics
» What's the best dimentional design for this situation?
» How to handle the following situation? (Sorry, I cannot come up with an appropriate title for this...)
» Confused on how to model a certain situation.
» Modelling situation with Task, Person and Document in unpredictable business processes
» How to handle the following situation? (Sorry, I cannot come up with an appropriate title for this...)
» Confused on how to model a certain situation.
» Modelling situation with Task, Person and Document in unpredictable business processes
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum