How do you handle weighting when you have multiple mulitvalued dimensions?
3 posters
Page 1 of 1
How do you handle weighting when you have multiple mulitvalued dimensions?
I've been reading The Data Warehouse Toolkit (2 ed.) and I can't seem to find a section that discusses how to handle weighting when more than one multivalued dimension is included in the design.
I work with healthcare claims; we're in the process of developing a new data mart to replace our current mash of data tables. Chapter 13 of the DW Toolkit focuses on healthcare. It is suggested to develop a bridge table for multiple diagnoses with a weighting factor tucked in with a diagnosis group key and diagnosis key.
The catch, however, is that there can also be multiple ICD9 procedure codes as well as multiple modification codes to a single CPT Procedure code. What this all means is that a single healthcare claim contains multiple multivalued descriptions. Off the top of my head, I suspect that at least three multivalued dimensions are needed; that translates into multiple group bridge tables.
Example: If a single claim has a billed amount of $100, and the claim has three diagnoses, two ICD9 procedure codes, and three CPT modification codes, how do I ensure that I will correctly calculate the $100 billed amount when I use any combination of the multivalued dimensions in a query or BI solution like OLAP?
Thanks in advance.
-- pete
I work with healthcare claims; we're in the process of developing a new data mart to replace our current mash of data tables. Chapter 13 of the DW Toolkit focuses on healthcare. It is suggested to develop a bridge table for multiple diagnoses with a weighting factor tucked in with a diagnosis group key and diagnosis key.
The catch, however, is that there can also be multiple ICD9 procedure codes as well as multiple modification codes to a single CPT Procedure code. What this all means is that a single healthcare claim contains multiple multivalued descriptions. Off the top of my head, I suspect that at least three multivalued dimensions are needed; that translates into multiple group bridge tables.
Example: If a single claim has a billed amount of $100, and the claim has three diagnoses, two ICD9 procedure codes, and three CPT modification codes, how do I ensure that I will correctly calculate the $100 billed amount when I use any combination of the multivalued dimensions in a query or BI solution like OLAP?
Thanks in advance.
-- pete
pzajkowski- Posts : 31
Join date : 2009-08-10
Re: How do you handle weighting when you have multiple mulitvalued dimensions?
Do you have to allocate the $ to the procedure code level? Can you just carry the amount at the grain of the fact table (normally claim line level)?
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: How do you handle weighting when you have multiple mulitvalued dimensions?
Hmm, your response provides food for thought regarding the actual dimensional modeling of our claims data, but I'm still interested in understanding how to deal with multiple multivalued dimensions.
Seems to me that having more than one multivalued dimension will create utter mathematical chaos. I just don't see how facts will be calculated correctly when two or more bridge tables come into play. So, I'm hoping to learn from the community who have implemented this approach.
Seems to me that having more than one multivalued dimension will create utter mathematical chaos. I just don't see how facts will be calculated correctly when two or more bridge tables come into play. So, I'm hoping to learn from the community who have implemented this approach.
pzajkowski- Posts : 31
Join date : 2009-08-10
Re: How do you handle weighting when you have multiple mulitvalued dimensions?
Having multiple multivalued dimensions is no different than having one. You just have more of them. Each dimension stands on its own.
If you do have allocation factors on both (usually expressed as a number from 0 to 1... i.e. a percentage), then if you do use the combination of the two, the specific allocation factor would be the product of both ( factor a * factor b)... not chaos.
But, does any of this make sense in your case? I wonder. Can you reasonably allocate cost to multiple diagnoses at the claim level? Are there not itemized charges for procedures performed as part of the claim, and would they not predicate another fact table? Besides, you are not going to get ICD9 and CPT procedure codes on the same bill, correct? One is from the doctor, the other is from the facility, so you have only one multi-valued dimension (procedure) not two... the coding system used shouldn't necessarily predicate another dimension table.
If you do have allocation factors on both (usually expressed as a number from 0 to 1... i.e. a percentage), then if you do use the combination of the two, the specific allocation factor would be the product of both ( factor a * factor b)... not chaos.
But, does any of this make sense in your case? I wonder. Can you reasonably allocate cost to multiple diagnoses at the claim level? Are there not itemized charges for procedures performed as part of the claim, and would they not predicate another fact table? Besides, you are not going to get ICD9 and CPT procedure codes on the same bill, correct? One is from the doctor, the other is from the facility, so you have only one multi-valued dimension (procedure) not two... the coding system used shouldn't necessarily predicate another dimension table.
Re: How do you handle weighting when you have multiple mulitvalued dimensions?
Ah, indeed you caught my error of suggesting that a single claim could have both ICD9 procedure codes and a CPT code. Shouldn't happen. My typing was faster than my thinking during my previous post -- I was merely trying to identify the various items in our claims data that could end up as multivalued.
Here's some background: Our current data warehouse is essentially relational, but to simplify writing sql queries and boost performance we have a very wide single reporting table that contains all claims data from facility, professional, and pharmacy data sources. (This single table is built off of the source relational tables.) Even descriptive fields for all codes are tucked into this table -- completely denormalized. The upside is one-stop shopping: every field I need for reporting is there, no joins to other tables required, all facts exist on their own (i.e., $100 is a $100 -- no need to allocate). Our disease identification system relies on this reporting table.
The Director of IT Services wants us to move towards a true dimensional modeled data mart. As I have been charged with initially modeling the data mart, I need to understand how facts will be calculated in this new data mart. The dollar amounts for facility claims are at the claim level. The dollar amounts for professional claims, however, are at the services rendered level.
Any facility claim or professional services claim can have multiple diagnoses; facility claims can have also have multiple ICD9 procedure codes; professional claims will have a single CPT procedure code, but could have multiple CPT Modifier codes. So, three multivalued dimensions could come into play.
QUESTIONS:
1) In our new data mart world, is it feasible to have facility, professional, and pharmacy claims in a single fact table, similar to the single reporting table we use now? (Obviously, the fact table would not have descriptions embedded -- that's the point of having dimensions.)
2) I didn't quite understand what you meant by "the coding system used shouldn't necessarily predicate another dimension table." Can you explain?
3) It seems to me that allocation is required for all of these multivalued dimensions. Would you agree?
4) Here's a typical query that the new data mart needs to support: find any patient who has at least two office visits in six months for diabetes, or 1 ED visit and 1 hospital admission for diabetes in 4 months, or 2 ED visits in 3 months for diabetes who does not have a prescription for X. If a data mart can be built to support this type of query, I'll feel more at ease -- what do you think?
Here's some background: Our current data warehouse is essentially relational, but to simplify writing sql queries and boost performance we have a very wide single reporting table that contains all claims data from facility, professional, and pharmacy data sources. (This single table is built off of the source relational tables.) Even descriptive fields for all codes are tucked into this table -- completely denormalized. The upside is one-stop shopping: every field I need for reporting is there, no joins to other tables required, all facts exist on their own (i.e., $100 is a $100 -- no need to allocate). Our disease identification system relies on this reporting table.
The Director of IT Services wants us to move towards a true dimensional modeled data mart. As I have been charged with initially modeling the data mart, I need to understand how facts will be calculated in this new data mart. The dollar amounts for facility claims are at the claim level. The dollar amounts for professional claims, however, are at the services rendered level.
Any facility claim or professional services claim can have multiple diagnoses; facility claims can have also have multiple ICD9 procedure codes; professional claims will have a single CPT procedure code, but could have multiple CPT Modifier codes. So, three multivalued dimensions could come into play.
QUESTIONS:
1) In our new data mart world, is it feasible to have facility, professional, and pharmacy claims in a single fact table, similar to the single reporting table we use now? (Obviously, the fact table would not have descriptions embedded -- that's the point of having dimensions.)
2) I didn't quite understand what you meant by "the coding system used shouldn't necessarily predicate another dimension table." Can you explain?
3) It seems to me that allocation is required for all of these multivalued dimensions. Would you agree?
4) Here's a typical query that the new data mart needs to support: find any patient who has at least two office visits in six months for diabetes, or 1 ED visit and 1 hospital admission for diabetes in 4 months, or 2 ED visits in 3 months for diabetes who does not have a prescription for X. If a data mart can be built to support this type of query, I'll feel more at ease -- what do you think?
pzajkowski- Posts : 31
Join date : 2009-08-10
Similar topics
» Same attribute in multiple Dimensions ?
» One Dimension or Multiple Dimensions
» Do I need multiple fact tables or dimensions
» Multiple dimensions Vs. Single dimension and hierarchy
» Same attribute in multiple dimensions or Create new dimension?
» One Dimension or Multiple Dimensions
» Do I need multiple fact tables or dimensions
» Multiple dimensions Vs. Single dimension and hierarchy
» Same attribute in multiple dimensions or Create new dimension?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum