How to handle multiple many-to-many dimensions in a single star schema
2 posters
Page 1 of 1
How to handle multiple many-to-many dimensions in a single star schema
Hello Everyone,
This is my first post here. I'm looking for an answer to the following question:
The bridge table technique can be used to model many-to-many dimensions. This probably makes queries a bit less efficient than when the dimension table is directly linked to the fact table. If there are only one or two such many-to-many dimensions, may be the resulting lower performance is acceptable. I'm wondering what if there are quite a few such dimensions (about a dozen).
Further what if the individual rows in the bridge table has multi valued descriptors that need to be included in the model?
What modeling techniques does one follow in such situations and how does one ensure performance?
Thanx for any answers.
DataAwareHouse
This is my first post here. I'm looking for an answer to the following question:
The bridge table technique can be used to model many-to-many dimensions. This probably makes queries a bit less efficient than when the dimension table is directly linked to the fact table. If there are only one or two such many-to-many dimensions, may be the resulting lower performance is acceptable. I'm wondering what if there are quite a few such dimensions (about a dozen).
Further what if the individual rows in the bridge table has multi valued descriptors that need to be included in the model?
What modeling techniques does one follow in such situations and how does one ensure performance?
Thanx for any answers.
DataAwareHouse
DataAwareHouse- Posts : 3
Join date : 2010-03-11
Re: How to handle multiple many-to-many dimensions in a single star schema
Many to many to many? Whew!
Can you give an example? It may be you need to rethink your approach... maybe your facts are not at the proper grain.
Can you give an example? It may be you need to rethink your approach... maybe your facts are not at the proper grain.
Re: How to handle multiple many-to-many dimensions in a single star schema
Thanx ngalemmo,
Let's consider a couple of scenarios where we might encounter 'many-to-many-to-many' (as you put it) situations.
Example 1. 'Patient visit' as a fact grain: Here the diagnosis being a many-to-many dimension is well known. But during the same 'visit' the patient could see multiple doctors, undergo multiple lab tests, interact with multiple 'other' health care professionals such as nurses, therapists, nutritionists, etc. One can think of many more such multi-valued dimensions. You have suggested that the grain of the fact table needs to be re-visited. I'm wondering how. Now, to the second part of the question. Each of the diagnoses could have multiple treatments or drugs/therapies administered, or each of the lab tests could have multiple test results, etc. How do we include these multi-valued descriptors of the multi-valued dimensions in the model?
Example 2: Adverse Event Report (AER) as a fact grain in drug safety domain: AERs are generated when patients take medicines and develop adverse reactions and this is reported to the manufacturer of the drug or the health care authorities. Each AER can have more than one drug, more than one adverse reaction, more than one disease/indication for which the drug(s) were prescribed in the first place, more than one persons processing the AER, etc. Similarly as above, each of these many-to-many dimensions could have multi-valued descriptors, for example, each administered drug can have more than one ingredient in it, or more than one dosage prescribed, etc.
Thanx in advance for any help.
Let's consider a couple of scenarios where we might encounter 'many-to-many-to-many' (as you put it) situations.
Example 1. 'Patient visit' as a fact grain: Here the diagnosis being a many-to-many dimension is well known. But during the same 'visit' the patient could see multiple doctors, undergo multiple lab tests, interact with multiple 'other' health care professionals such as nurses, therapists, nutritionists, etc. One can think of many more such multi-valued dimensions. You have suggested that the grain of the fact table needs to be re-visited. I'm wondering how. Now, to the second part of the question. Each of the diagnoses could have multiple treatments or drugs/therapies administered, or each of the lab tests could have multiple test results, etc. How do we include these multi-valued descriptors of the multi-valued dimensions in the model?
Example 2: Adverse Event Report (AER) as a fact grain in drug safety domain: AERs are generated when patients take medicines and develop adverse reactions and this is reported to the manufacturer of the drug or the health care authorities. Each AER can have more than one drug, more than one adverse reaction, more than one disease/indication for which the drug(s) were prescribed in the first place, more than one persons processing the AER, etc. Similarly as above, each of these many-to-many dimensions could have multi-valued descriptors, for example, each administered drug can have more than one ingredient in it, or more than one dosage prescribed, etc.
Thanx in advance for any help.
DataAwareHouse- Posts : 3
Join date : 2010-03-11
Re: How to handle multiple many-to-many dimensions in a single star schema
I kind of suspected this was related to health care...
For visits, you need multiple fact tables to record different events, and 'visit' may not be one of them. During a visit (basically a dimension, degenerate or otherwise) you have procedures and tests, both of which are related to similar dimensions (visit, patient, diagnoses (multi-valued), date, facility). The procedures fact would record each action taken and by whom and the testing fact would record each individual test. As for diagnosis, it is not clear if you would even know, from the visit record, which procedures were performed for which specific diagnosis. If you do, each fact would related to the appropriate group of diagnoses or if you don't, the whole group determined by the initial examination. There may also be another fact table containing facilities related charges relating to the visit (or in-patient stay if it was the case). This would have visit, patient, diagnosis, date, facility, procedure (CPT) as some of the dimensions.
As far as who was involved in a procedure, the procedure fact itself should refer to the primary care giver (which is usually the person of interest in most analysis). If you need to record secondary participants (for deeper, less frequent analysis), that could be a second multi-valued dimension, or it could be a factless fact table with visit, procedure, role, person, date, patient etc... as dimensions. I would include the primary care giver in this as well, with the appropriate role value. ("role" could be as simple as a Y/N flag indicating if the person is the primary or not).
In an AER you would have two related multivalued dimensions, the diagnoses and the list of drugs the patient is taking. The drugs would relate to the formulary dimension and the componds related to a drug would be a fact table containing drug, compond and quantity.
For visits, you need multiple fact tables to record different events, and 'visit' may not be one of them. During a visit (basically a dimension, degenerate or otherwise) you have procedures and tests, both of which are related to similar dimensions (visit, patient, diagnoses (multi-valued), date, facility). The procedures fact would record each action taken and by whom and the testing fact would record each individual test. As for diagnosis, it is not clear if you would even know, from the visit record, which procedures were performed for which specific diagnosis. If you do, each fact would related to the appropriate group of diagnoses or if you don't, the whole group determined by the initial examination. There may also be another fact table containing facilities related charges relating to the visit (or in-patient stay if it was the case). This would have visit, patient, diagnosis, date, facility, procedure (CPT) as some of the dimensions.
As far as who was involved in a procedure, the procedure fact itself should refer to the primary care giver (which is usually the person of interest in most analysis). If you need to record secondary participants (for deeper, less frequent analysis), that could be a second multi-valued dimension, or it could be a factless fact table with visit, procedure, role, person, date, patient etc... as dimensions. I would include the primary care giver in this as well, with the appropriate role value. ("role" could be as simple as a Y/N flag indicating if the person is the primary or not).
In an AER you would have two related multivalued dimensions, the diagnoses and the list of drugs the patient is taking. The drugs would relate to the formulary dimension and the componds related to a drug would be a fact table containing drug, compond and quantity.
Re: How to handle multiple many-to-many dimensions in a single star schema
Thanx for the insights,
The examples apart, if there are multiple many-to-many dimensions, is it always a case of wrong grain? My question was more on how to handle such situations using dimensional modeling, and what if the rows in the bridge tables had further multi valued descriptors. May be I don't have a good example scenario, and probably the question is hypothetical. Any answers and thoughts would be appreciated.
And, the bridge table makes use of a group key, for example diagnosis_group_key in the diagnosis bridge table. These group key values are not present in the corresponding dimension table (the diagnosis dimension in this case). Do we need a look up table to list all the occurrences of the group keys?
Thanx for any answers and thoughts.
The examples apart, if there are multiple many-to-many dimensions, is it always a case of wrong grain? My question was more on how to handle such situations using dimensional modeling, and what if the rows in the bridge tables had further multi valued descriptors. May be I don't have a good example scenario, and probably the question is hypothetical. Any answers and thoughts would be appreciated.
And, the bridge table makes use of a group key, for example diagnosis_group_key in the diagnosis bridge table. These group key values are not present in the corresponding dimension table (the diagnosis dimension in this case). Do we need a look up table to list all the occurrences of the group keys?
Thanx for any answers and thoughts.
DataAwareHouse- Posts : 3
Join date : 2010-03-11
Re: How to handle multiple many-to-many dimensions in a single star schema
if there are multiple many-to-many dimensions, is it always a case of wrong grain?
No, not always...but it needs to be considered when working out a model.
Do we need a look up table to list all the occurrences of the group keys?
It is my opinion that you do. The natural key to the lookup table would be a concatenation of the codes involved (make sure you put a delimiter between codes). You can use either the code values or the dimension surrogate keys for the natural key (whichever gives you the shortest string). If the sequence of codes does not matter, I sort the codes before building the string to reduce the total number of groups. In the case of diagnosis, the primary diagnosis has significiance, while in most cases the ordering of seconday diagnosis does not matter, so I would build the string with the primary diagnosis first followed by a sorted list of secondary diagnoses. Again, this is to keep the group table as small as possible. When ordering is significant, the bridge table should have a flag or sequence number attribute to reflect that significance.
It is my experience that group tables, after an initial period of rapid growth, stabilize and thereafter grow very slowly. And for something like diagnosis, the number of groups would be much smaller (< 5%) than the number of claims or encounters.
In most cases, the group table is there to support the ETL process and is not referenced in user queries. An exception would be a keyword/phrase structure where the phrases are the groups and the bridge is the relationship between the phrase and words in the phrase. The phrase table would be an available dimension should the user wish to see the text.
Similar topics
» Correlated - Separate Dimensions OR Single Dimensions ?
» Schema Roadblock: Dimension vs Fact [Dimensions w/FKs to other Dimensions?]
» Multiple dimensions Vs. Single dimension and hierarchy
» star schema designing
» multiple hierarchy : single dimension vs multiple
» Schema Roadblock: Dimension vs Fact [Dimensions w/FKs to other Dimensions?]
» Multiple dimensions Vs. Single dimension and hierarchy
» star schema designing
» multiple hierarchy : single dimension vs multiple
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum