How to model Checks

Jeff Smith Tue Dec 15, 2009 5:40 pm

We are an Insurance Company. We mail out checks. I assume that Checks are a fact. The Check Fact Table would include the Check Number (a Degenerate Dimension), a Bank Account Dimension, and a Bank Dimension. The Check Number is not unigue as we have several Accounts from several banks. To identify a unique Check requires the Check Number, Bank Account Number, and Bank Number or routing number.

I also have a Claim Fact Table. I need to link the Check to the Claim. One method is to include the Check Number as a Degenerative Dimension on the Claim Fact Table along with the Bank Dimension Key and the Bank Account Dimension Key. However, the Claim Fact table is getting a little on the wide side and I would like to reduce the number of Dimension Keys.

I could create a Check dimension table that contained the Check Number, Account Number, and Routing Number, everything that makes the Check unique, but then the Check Dimension table has the same number of records as the Check Fact Table.

Another solution is to add a surrogate key on the Check Fact Table's original design (bank account dimension and bank dimension) and put the Check Fact Table's surrogate Key on the Claim Fact Table, essentially creating a snow flake.

Is it kosher to stick a surrogate Key on the Check Fact Table and use this key as dimension key on the Claim Fact Table? Warren Thornwaite once said that a Type 1 SCD is basically a fact table. If that's true, then can't a fact table act like a Type 1 SCD dimension table?

ngalemmo Tue Dec 15, 2009 6:47 pm

What you propose will work, but, you may want to rethink the problem.

Are you trying to connect the claim to the check or the check to the claim? Would not carrying the claim key in the check fact serve the same purpose as storing the check key in the claim fact?

Jeff Smith Wed Dec 16, 2009 10:44 am

Putting the Claim Key on the Check fact table doesn't work as one check can be for multiple claims. We send out checks to health care providers once a week. The check covers all of the claims processed for the provider during the week. By the way, this is similar to how mortgage companies pay property taxes. They will send one check to cover the Tax Payment on all of the mortgages they service in a City or County.

ngalemmo Wed Dec 16, 2009 12:53 pm

But, are there not line items on the check? How much to claim A, claim B and so on? Is it not possible to make additional payments or other adjustmens against a claim that may appear on future checks?

If you capture check information at the line (advice) level, it makes the load simpler (the fact load is focused to the specific business event and you don't need to go back and update the claim) and provides much more information as to what took place.

If you must, the idea of using a surrogate degenerate dimension for the check in the claims fact will work, but it would not be my first choice. My second choice would be to store the check number and bank account key and payment date in the claim if all you need to know is if the claim was paid and with what. This avoids having to join to the check facts at all, which, if it is as you describe, doesn't really provide any additional information useful to the claim.

VHF Wed Dec 16, 2009 1:19 pm

It sounds like Claims and Checks might both be dimensions rather than facts. Bank, Bank Account, Routing Number, etc. simpley become attributes of the Check dimension, so there is no need to snowflake.

Your fact table would then be "check disbursement". There would be as many fact records as needed to allocate the amount of a check across multiple claims. SUM(Amount) of the fact records for a given check would equal the amount of the check.

FactCheckDisbursement
-----------------------
CheckKey
ClaimKey
Amount

I realize both the Claims dimension and Checks dimension will grow rapidly, but the fact table will still grow faster (in terms of record count) and it is very narrow.

Given that there will likley be many thousands of checks written on a given account, it might be tempting to snowflake out Bank Account as its own dimension. You should run some estimates on storage size with each approach and also take into consideration how well your platform handles the joins requried for queries against a snowflaked schema. The traditional (and Kimball-recommended) approach is to avoid snowflaking when possible; duplicate data in the form of repeated attribute data is not a bad thing if it makes for fast queries. However, very large dimensions may benefit from a partialy snowflaked design (aka outrigger dimension.)

Jeff Smith Wed Dec 16, 2009 3:51 pm

Unfortunately, in the source data, the Check is linked to the Claim, not the line. The amount on the check is for multiple claims. But, the amount on the check may not necessarily equal the "Amount Paid" on the claims associated with the Check. There are situations in which the Provider owes us money, in which case we hold back funds. The total amount of 4 claims maybe $100, but if the provider owes us $30, the check will be for $70.

Linking the Claim to the Check is primarily needed for audit purposes, which happens about once per year.

In instances in which a Claim is re-opened and an adjustment is made, we clone the original record in the fact table, changing the paid date on the clone to the paid date to the new version of the claim and make all of the amounts negative values. We then add the new version of the claim to the fact table.

I appreciate the advice.

There are 3 rules that I always try to obey. 1) An attribute field never appears in more than one dimension table unless it's part of an identifier, 2) No snowflaking, and 3) keep the number of dimension keys on the fact table to no more than 10 or 11. Sometimes it's hard to obey both rules 2 and 3.

I may take a look at some of the other dimension tables for opportunities to combine a few into a larger junk dimension.

Sponsored content

How to model Checks

How to model Checks

Re: How to model Checks

Re: How to model Checks

Re: How to model Checks

Re: How to model Checks

Re: How to model Checks

Re: How to model Checks