Help designing star schema

avgjoe Mon Aug 25, 2014 5:01 pm

Relational DB report writer swinging into dimensional modeling, so please be patient.

Essentially I am creating a Fact Table which holds the Numerator and Denominator for various healthcare Measures per Physician.
Each Measure has a Goal and a Score.
This will serve to populate a Scorecard which each Physician will receive. (In the report, we'll compare the % from Num/Den to the Goal and Score appropriately.)
A Physician has the usual demographic info (Name, Practice Name, etc.) and a Primary Specialty.

Modeling this seems relatively simple as I read up on Dimensions and Facts. The tricky part, which is causing me to stumble, is that there are different types of Scorecards. The Scorecard a Physician receives is dependent upon their Primary Specialty. The Scorecard itself has a subset of Measures.

So it's this relationship between Physician, Scorecard, and Measure that's throwing me off.
Physician to Scorecard is many to one. Scorecard to Measure is many to many.
While we may store some Measurements for some Physicians, they won't be part of the Scorecard report.

Before running into this, I imagined a Dimension for each of the following Quarter, Physician, Measure around the Fact Table for Measurements (Num and Denom).
Now I need help working in the Scorecard with its relationships in mind. Maybe I'm over-thinking, maybe I haven't come across the complexity in my reading, but any help would be appreciated!

Thank you in advance!
Joe

nick_white Tue Aug 26, 2014 9:03 am

If you have a many-to-many relationship between Dimensions then the solution is probably to implement a bridge table.
So in your case you'd create a Scorecard Dimension and then a bridge table that relates Measures to Scorecards

Regards

avgjoe Tue Aug 26, 2014 9:39 am

But the Measurement Fact Table holds information about Measures, not Scorecards. Not sure how a bridge helps accomplish this.

BoxesAndLines Tue Aug 26, 2014 9:55 am

Isn't a scorecard just a report? If so, why would you try to model it?

nick_white Tue Aug 26, 2014 10:05 am

avgjoe wrote:But the Measurement Fact Table holds information about Measures, not Scorecards. Not sure how a bridge helps accomplish this.

Sorry, I was assuming that a Physician could get many different scorecards; if they can...
The joins would be from your fact table to your Measures Dim to the Bridge to the Scorecard Dim.
If you want to report on a particular Scorecard then you would constrain your query to the appropriate Scorecard record, this would in turn only give you the measures related to that Scorecard and therefore only the Facts related to it.
Obviously you can also constrain by any other dimension e.g. The measures for a particular Scorecard for a particular Physician

If a Physician can only get one type of Scorecard, and will only be given measures relevant to that Scorecard then isn't the Scorecard just an attribute of the Physician?

Regards,

avgjoe Tue Aug 26, 2014 10:22 am

Yes, a Physician would only have one Scorecard. If I follow the line of thought that Scorecard is just a report, then the report itself would be responsible to use the Physician attribute Scorecard to determine which Measures to use, correct? Which would require lots of coding in the report.

nick_white Tue Aug 26, 2014 10:31 am

Can you clarify: do you have data for measures for a physician that don't appear on that physician's scorecard?

If the answer is no then you don't need to worry about which measures apply to which scorecard - as you don't have any measures that don't apply to the physician's scorecard.
If the answer is yes then my first question is why? If measures don't appear on a physician's scorecard why are they being collected for that physician?
Regardless of the reason why you would have measures unallocated to a scorecard, using a bridge table to link Scorecard to Measures would allow you to constrain a report to only those measures that apply to a specific scorecard

Feel free to keep asking questions if you don't think I have understood the issue correctly

BoxesAndLines Tue Aug 26, 2014 10:33 am

Maybe. Most fact tables don't have data driven facts. That is, the measures within a fact table generally apply to all dimensions. That in part drives the grain of the fact table. Here, your measures vary by dimension value, which strikes me as a performance hit in maintenance and reporting as well. I'm not sure I completely understand your situation and could be completely off base (happens a lot these days).

avgjoe Tue Aug 26, 2014 10:47 am

Most measures are available from the source (picture an excel spreadsheet with Physicians along the Y-axis and all Measurement Num and Denom across the X-axis), but maybe in ETL, we should only capture Measurements relevant to each Physician's Scorecard.

ngalemmo Tue Aug 26, 2014 11:15 am

You are mixing two things here. On one hand you have measures relating to physicians, and you have reports (scorecards) the present one or more measures. Is the intent to create a single fact table that can only produce predefined scorecards or do you want a fact table that can support general analysis and reporting as well as produce predefined scorecard reports?

If you want the former, the design is fine. You will need to repeat measures that appear on more than one scorecard.

If you want the latter, drop scorecard as a dimension. Have measure type as a dimension and use a bridge between measure type and scorecard to control which measures are used for a particular report.

avgjoe Wed Aug 27, 2014 8:14 am

Initially this is just to produce scorecards, but I think they will want to do other analysis down the road so I'd prefer to work towards the latter.

If I have:
DimPhysicians
DimQuarters
DimMeasures->bridgeMeasuresScorecards->DimScorecards
FactMeasurements

Is it permissible/wise to have a ForeignKey to DimScorecard within DimPhysicians?
My thinking: On the report, I need to show only those measurements associated with a physician's scorecard.

As an aside, I will need to to show all of those measures whether they have measurements or not. How is this accomplished?

avgjoe Thu Aug 28, 2014 4:15 pm

I think if I map this out in an ER Diagram, I have a circular reference going on.
Provider->Scorecard->Measurements->Measures->Provider
I probably need to fix that before trying to move to DM, if that's even appropriate.

BoxesAndLines Thu Aug 28, 2014 4:18 pm

That is a strategy I often use when I'm uncertain on the dimensional model design, build out the logical model so you can clearly identify the dependencies.

ngalemmo Thu Aug 28, 2014 8:02 pm

It's circular, but it works. You would join the fact to provider, provider to scorecard, scorecard to measure and measure to the fact. That will ensure you have facts related to the physician and scorecard. So, yes, two bridges: provider/scorecard and scorecard/measure.

avgjoe Wed Sep 03, 2014 3:23 pm

I modeled this as DimProvider<-BridgeProviderScorecard->DimScorecard<-BridgeScorecardMeasure->DimMeasure<-FactMeasurements(links also to DimProvider and DimQuarter).

Running a simple query of course gets only those Measures for an appropriate providers scorecard which have recordings. I need to also include ones without recordings. This makes me look into a factless table (which I think is called a Coverage Table.) This make me wonder if Scorecard itself should be instead modeled as a coverage table.

nick_white Fri Sep 05, 2014 10:12 am

Hi - I would say it depends on how many measures a physician can have. If you have a relatively small number of measures then I would just create a record for each one in your fact table and include a "Not recorded" flag somewhere. You'd need to make sure that these records didn't skew your queries incorrectly - i.e. you can exclude them when required.

A coverage table and a factless fact table are not necessarily the same thing - though a coverage fact table could be a subtype of a factless fact table if you decide not to include a measure in it.

The classic example of using a coverage table is a supermarket sales promotion - where 1000s of products in every store could be part of the promotion - so creating a dummy sales records for every possible promotion/store/product would generate a huge number of fact records which would clog up your sales fact table; so instead you move all the possible sales to a coverage table.

Nothing to stop you creating a coverage table for your measures but if you only have a few 10s (or possibly 100s) then the overhead of maintaining the coverage table and the additional query complexity of unioning your measure fact and your coverage fact queries probably means just having a single fact table is a better solution

Sponsored content

Help designing star schema

Help designing star schema

Re: Help designing star schema

re:suggestion

Re: Help designing star schema

Re: Help designing star schema

re:above

Re: Help designing star schema

Re: Help designing star schema

re:accumulating measures

Re: Help designing star schema

follow-up

sigh

Re: Help designing star schema

Re: Help designing star schema

Coverage table?

Re: Help designing star schema

Re: Help designing star schema