Location in Each Dimension?

beatrixkiddo Thu Aug 14, 2014 1:56 pm

This seems so basic, but I can't wrap my head around it.

I have multiple Facilities thus a Hospital Dimension. I have other dims like Physician and Financial Class. I did not put Hospital in all my other dims. I have one hospital's data loaded into these dims and fact. Once I load the next Hospital's Financial Classes into DimFinancialClass, how would I query this dim for Financial Classes for only one location? Should DimFinancialClass FK to DimHospital.HospitalKey?

Seems like it would be common to query all dimensions by location. Should they all FK to DimHospital?

Thanks

Location in Each Dimension? 8dpsIEj

ngalemmo Thu Aug 14, 2014 2:04 pm

Facts represent business transactions (activity) or states. A physician's relationship with a facility is a state. It would be represented by another fact table, which may include other dimensions, such as the nature of the relationship and the effective period of the relationship.

A patient's relationship to a facility, on the other hand, is a matter of the transaction (encounter, out-patient or in-inpatient activity). The existing fact table is appropriate for such reporting.

beatrixkiddo Thu Aug 14, 2014 4:43 pm

Thanks, NG.

Is the state table a factless fact?

Is it reasonable to query dimension tables only, say for a list of Physicians from one Hospital. Are dimensional models not a good source for these types of queries, or is there a recommended way to incorporate Hospital into the star schema?

beatrixkiddo Fri Aug 15, 2014 10:21 am

Is what I'm talking about just a matter of creating a Location-level hierarchy in each dimension I would need to query by Location by itself, or should I be putting these relationships in a factless fact?

nick_white Fri Aug 15, 2014 11:03 am

If you are loading data from multiple hospitals then I would expect there to be an FK to your Hospital Dim in (potentially) every Fact table.

This then allows you to report on any combination of Hospital and Financial Class, that has been used together in an event, by querying any fact table that has both of these Dims associated to it.

Obviously this won't necessarily tell you all of the Financial classes associated to a Hospital as you can't guarantee that every allowable Financial class for a Hospital has appeared in a fact record - so if you do have this sort of reporting requirement then this is when you'd consider creating a factless fact table to model the relationships between Dims.
However if all you have is this one relationship (Hospital and Financial class) and there are no other Dims you could bring into the factless fact then creating a fact table seems a bit like overkill to me - especially if there are lots of similar relationships, outside of events, that you need to report on. In which case, I have two thoughts - but I'm not sure that either is the correct solution:

1) This doesn't really sound like an analytical query to me - you're really just querying the master data from you operational systems. In which case may be a dimensional model is not the right place to get this information. It's much easier just to query the operational/master data system(s). Remember a DM is not necessarily the solution to everything

2) You could add the Hospital Dim as a snowflake to the Dims that have hospital-specific data. Obviously only works if e.g. a Financial Class can only ever be associated to one Hospital. Snowflakes are usually a bad idea and if I'm thinking about including one I normally assume that there is a more fundamental issue with my design - but in this case there may be an argument for snowflaking

ngalemmo Fri Aug 15, 2014 11:10 am

Yes, it would be a factless fact table.

A dimensional model is a specific form and methodology to create data models for analytics. Entity Relationship modeling is another form and methodology (normalization) to crete data models for other purposes.

If you choose dimensional modeling, you should stick to the form.

beatrixkiddo Fri Aug 15, 2014 11:26 am

nick_white wrote:1) This doesn't really sound like an analytical query to me - you're really just querying the master data from you operational systems. In which case may be a dimensional model is not the right place to get this information. It's much easier just to query the operational/master data system(s). Remember a DM is not necessarily the solution to everything

2) You could add the Hospital Dim as a snowflake to the Dims that have hospital-specific data. Obviously only works if e.g. a Financial Class can only ever be associated to one Hospital. Snowflakes are usually a bad idea and if I'm thinking about including one I normally assume that there is a more fundamental issue with my design - but in this case there may be an argument for snowflaking

I hear you. Since one of the main requirements is to combine all this data into "the source of truth", I think I will go with factless facts, but only on an as-needed basis and see how that works out.

So is it not an analytical query to ask How many Physicians do we have at Hospital A compared to B?

Thanks for your input.

beatrixkiddo Fri Aug 15, 2014 11:28 am

ngalemmo wrote:Yes, it would be a factless fact table.

A dimensional model is a specific form and methodology to create data models for analytics. Entity Relationship modeling is another form and methodology (normalization) to crete data models for other purposes.

If you choose dimensional modeling, you should stick to the form.

Got it. Do you use factless facts or avoid them altogether?

ngalemmo Fri Aug 15, 2014 12:20 pm

You use them when you need to. In your case a Physician may have admitting privileges at a number of hospitals. If it is important you track those arrangements for analysis, then yes, you need one.

beatrixkiddo Fri Aug 15, 2014 12:38 pm

Ok, thanks.

beatrixkiddo Fri Aug 15, 2014 11:01 pm

I said location, but what I meant was source system. I found this old debate, and I am going with the unique NK combination of LocationID and NK. Most likely in each dimension.

http://forum.kimballgroup.com/t1916-dimension-with-different-sources-multiple-business-keys

So if I have Source A with payer BCBS and Source B with payer BC-BS, I'll have a row for each in the dim. Then, I create a MDM rule so each belong to BCBS or whatever we call it as another attribute in the dim. If for some reason someone wants a list of payers from Source B, I can query the dim by the LocationID portion of the unique NK.

On the right track?

nick_white Mon Aug 18, 2014 2:51 am

Hi,
if BCBS and BC-BS are, in business terms, the same record then you should be de-duping them before you bring them into your DM - you wouldn't create separate records in a Dim table.

Regards,

beatrixkiddo Mon Aug 18, 2014 9:18 am

That was my initial train of thought, too, but reading through that conversation (here), NG makes some valid points for keeping both rows for "BCBS".

If there is an error in standardizing the attributes, it is an easy matter of updating rows. If you only create standard rows (i.e. not source specific rows) you have to rekey facts if an error is found in the standardization (source rows mapped to the wrong standard version).
This is the very reason why you should keep a row for each source/natural key combination in the dimension. It allows you to construct and load the dimension, and field the data warehouse, without having to wait for the perfect cross reference. The dimension itself becomes the cross reference and it can be easily corrected and adjusted as errors are discovered or new sources are brought in.
The issue is if you have two or more disparate systems feeding the same dimension, the dimension members and the fact relating to them will have a different set of natural keys than the other system(s). I would argue that you want to retain these natural keys and the relationship between the fact and dimension the same. That is the way the relationship is defined in the source system, so it should also be good enough for the data warehouse.

I'm thinking through the load of the Financial Class dim using the de-dupe method and one row per Payer.

Load dim with Source A value of BC BS (looking up MDM rule to use BCBS)
Lookup Source B value BC-BS, get MDM value of BCBS
Query dim to see if BCBS exists
If yes, then no action
If no, insert BCBS into dim

This seems like more work and none of the benefits NG mentioned. Also, once we bring in the other 10-15 source systems, having to troubleshoot an issue with a payer but being confined to one row in the dim, thus one FK in whatever fact has the issue seems like a rabbit hole I don't want to go down.

If I keep each source systems "BCBS" row, the load of the dim consists of a MDM lookup, insert into dim, and done. The MDM rule value populates the common Payer attribute. So BC BS and BC-BS belong to BCBS.

Is the source system not a valid attribute of each dimension anyway? This seems very useful. You guys are light years ahead of me, so please keep providing feedback! Thanks.

nick_white Tue Aug 19, 2014 3:09 am

Regarding the MDM question, I guess both approaches have pros and cons...

If the 'same' record coming from two different source systems has identical attributes then combining them into a single record probably doesn't have much benefit. Also, if you decide that two records currently defined as the same aren't and you want to unmatch them for all your historic data then keeping them as separate records would make that much easier.

However, if you are getting different attributes for the same record from different systems or the same attribute can have different values depending on which system supplied it (and you need to decide which value takes precedence) then I think it is easier to do all this in your staging area and only have a single record in the Dimension. Also, if you need to unmatch records but only want to implement that going forward and not for historic data it's simple even if you have merged dimensional records.
The way I do this is to create a reference table that has columns for Source System, Source System PK and Dim Artificial NK. When I get a new source record I decide whether it matches an existing record, if it does I put a record in this table with the existing Dim NK and if it doesn't then I create a new Dim record and create a new record in the lookup table with this new Dim's NK. If I want to split up previously matched records then I change my MDM logic and delete the relevant record in this lookup table (or flag it as inactive) and the next time the record is loaded from the source it will create a new record in the Dim - as it won't find a record in the lookup and my changed MDM logic won't match it to an existing record.

If you have a lookup table like this for each Dim that can take data from multiple sources then you don't need the source in the Dim. If you are not merging your Dim records then I'd definitely put the source system as an attribute in each Dim.

beatrixkiddo Wed Aug 20, 2014 8:35 am

I like your reference table setup. I'll do some testing with that when I have a chance. Thanks Nick!

Sponsored content

Location in Each Dimension?

Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?

Re: Location in Each Dimension?