Provider Dimension Modeling
3 posters
Page 1 of 1
Provider Dimension Modeling
Hey, new here to the forum. Been lurking and reading a bit.
We're building a data warehouse around healthcare data. We are starting with providers (doctors). The first source is the NPI provider file which can be downloaded at cms.gov. It's 300+ columns of doctor attributes. Some columns are multiples of addresses, licenses, and specialties. The first objective is to expose a file of providers for searching. This will be a monthly load. Here's an example of the NPI file and some columns I'm concerned with.
NPI_Num
Name
Address
[Healthcare Provider Taxonomy Code_1] - these next 4 columns go 15 deep
[Provider License Number_1]
[Provider License Number State Code_1]
[Healthcare Provider Primary Taxonomy Switch_1]
... all the way to _15
[Other Provider Identifier_1] - these next 4 columns go 50 deep
[Other Provider Identifier Type Code_1]
[Other Provider Identifier State_1]
[Other Provider Identifier Issuer_1]
... all the way to _50
My questions are:
1. Some here are suggesting to (sort of forcing me to) build an ODS, which, reluctantly, I've begun doing. It's becoming a normalized mess. Is an ODS a valid option? What are some reasons to avoid doing it this way? Again, just provider info for searching, monthly load, no measurements, exporting a flat file. Seems like a done deal for a provider dim!
2. In a dim/fact setup, how would I handle those sets of 4 columns that go 15 and 50 rows deep? Snowflake or keep a wide dim?
I'm building a claims fact next, so I'll be building a provider dimension anyway. We just have some that are hell bent on having an ODS, which may very well be needed. I'm just not sure it suits this data and objective very well.
Thanks for any help!
We're building a data warehouse around healthcare data. We are starting with providers (doctors). The first source is the NPI provider file which can be downloaded at cms.gov. It's 300+ columns of doctor attributes. Some columns are multiples of addresses, licenses, and specialties. The first objective is to expose a file of providers for searching. This will be a monthly load. Here's an example of the NPI file and some columns I'm concerned with.
NPI_Num
Name
Address
[Healthcare Provider Taxonomy Code_1] - these next 4 columns go 15 deep
[Provider License Number_1]
[Provider License Number State Code_1]
[Healthcare Provider Primary Taxonomy Switch_1]
... all the way to _15
[Other Provider Identifier_1] - these next 4 columns go 50 deep
[Other Provider Identifier Type Code_1]
[Other Provider Identifier State_1]
[Other Provider Identifier Issuer_1]
... all the way to _50
My questions are:
1. Some here are suggesting to (sort of forcing me to) build an ODS, which, reluctantly, I've begun doing. It's becoming a normalized mess. Is an ODS a valid option? What are some reasons to avoid doing it this way? Again, just provider info for searching, monthly load, no measurements, exporting a flat file. Seems like a done deal for a provider dim!
2. In a dim/fact setup, how would I handle those sets of 4 columns that go 15 and 50 rows deep? Snowflake or keep a wide dim?
I'm building a claims fact next, so I'll be building a provider dimension anyway. We just have some that are hell bent on having an ODS, which may very well be needed. I'm just not sure it suits this data and objective very well.
Thanks for any help!
beatrixkiddo- Posts : 22
Join date : 2013-10-22
Re: Provider Dimension Modeling
Ok, so reading through my data warehouse toolkit, it looks like a multivalued dimension is what I have and a bridge table is what I need. I was really over-thinking this one.
beatrixkiddo- Posts : 22
Join date : 2013-10-22
Re: Provider Dimension Modeling
Another great source for this situation is the book "Star Schema: The complete reference" by Chris Adamson. Chapter 9 discusses multi-valued dimensions and multi-valued attributes with some great examples.
scabral- Posts : 58
Join date : 2012-05-02
Re: Provider Dimension Modeling
Awesome, thanks for the resource.
beatrixkiddo- Posts : 22
Join date : 2013-10-22
Re: Provider Dimension Modeling
Multi-valued dimension refers to relationships from a dimension to a fact table, as in there is more than one. What you have is something different which would require some sort of hierarchy to consolidate to a single dimension. I would start with identifying which provider attributes are important from a business perspective instead of I want all provider attributes from CMS. Most folks aren't interested in 300 attributes for a given provider.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Provider Dimension Modeling
Yea, I'm not interested in all of them. The ones I'm stuck on are like Specialty, up to 15 of them. The users do want to see all those. This is the hierarchy you're talking about?BoxesAndLines wrote:What you have is something different which would require some sort of hierarchy to consolidate to a single dimension. I would start with identifying which provider attributes are important from a business perspective instead of I want all provider attributes from CMS.
beatrixkiddo- Posts : 22
Join date : 2013-10-22
Re: Provider Dimension Modeling
So I'm in the old Normalized vs Dimensional debate. These people don't know about dimensional modeling. I've mentioned simple queries, less joins, performance, etc. They don't know much about querying and aren't convinced the performance is better. I have handled the one-to-many provider address issue by creating an address dim and a factless fact to relate providers to their addresses. Maybe our approach is unorthodox, starting with building a dimension. This is the data they want first, and it's very much a "let's see what you can do" sort of project.
I've only used RDBMS for app development. I'm having a hard time explaining to them why we should avoid normalization in the DW. Can you help me formulate a better argument?
Thanks!
I've only used RDBMS for app development. I'm having a hard time explaining to them why we should avoid normalization in the DW. Can you help me formulate a better argument?
Thanks!
beatrixkiddo- Posts : 22
Join date : 2013-10-22
Re: Provider Dimension Modeling
The solution is simple but time consuming. First build out the normalized version of the model and then build out the dimensional version. Then ask simple questions of the model. e.g. How many claims from provider Dr. Bob for condition xyz for the past (week, month, quarter, etc) at this provider location. Compare this provider cost to other claims at same provider location, etc. You get the idea. Then demonstrate how to navigate the different models to answer the questions.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Provider Dimension Modeling
Thanks. That sounds like a good strategy. Do you avoid normalization completely, or is there a good scenario for it? Like I said, I've only normalized for app development. The change tracking on a normalized system with audit tables has always seemed really messy, but it does seem to work for its intended purpose.
I'm looking for good points that explain why we shouldn't normalize. To me, the dimensional model really sells itself. Then again, they aren't writing queries or maintaining the system and have zero experience with dimensional modeling. They're actually thinking it's a "newer technology". I'm sure at some point they'll want users to be able to build reports. I've brought up that point as well.
I want to really hammer on why normalization isn't needed, and also why dimensional modeling is a perfect fit. Your advice should help with that, but if you can think of anything to add to the argument, please throw it out here. I've been on two teams supporting fairly large EDW's and neither had anything normalized. The fact that we're even debating this is so frustrating.
I'm looking for good points that explain why we shouldn't normalize. To me, the dimensional model really sells itself. Then again, they aren't writing queries or maintaining the system and have zero experience with dimensional modeling. They're actually thinking it's a "newer technology". I'm sure at some point they'll want users to be able to build reports. I've brought up that point as well.
I want to really hammer on why normalization isn't needed, and also why dimensional modeling is a perfect fit. Your advice should help with that, but if you can think of anything to add to the argument, please throw it out here. I've been on two teams supporting fairly large EDW's and neither had anything normalized. The fact that we're even debating this is so frustrating.
beatrixkiddo- Posts : 22
Join date : 2013-10-22
Re: Provider Dimension Modeling
You're not arguing against normalization vs not normalized. You're arguing dimensional vs. normalized. You pick dimensional because it is easier to query historically due to the fact dimension design pattern. It's also highly performant. That's normally enough to close the deal.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Provider Dimension Modeling
Ok. I think the main hurdle here is people are set in their normalized ways. Thanks b.
beatrixkiddo- Posts : 22
Join date : 2013-10-22
Similar topics
» Modeling an Employee Dimension to a Fact which has two columns relating to the Dimension
» Dimension Modeling
» Dimension - Fact Modeling
» Modeling the Customer Dimension
» Modeling help on dimension with many boolean value
» Dimension Modeling
» Dimension - Fact Modeling
» Modeling the Customer Dimension
» Modeling help on dimension with many boolean value
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum