Modeling invoice payment

wawanco Wed May 28, 2014 10:48 am

Hi there,

I am modeling a data warehouse that represents the payment of invoices. The fact table register the outstanding for each invoice (or in other word the amount 'yet-to-be-paid')
For instance, let's say I issue an invoice of $100.
The purchaser n°1 (the debtor) pays $10, then $20 and finally clear the debt with a $70 payment.
In this case I will have four facts in my fact table:

Code:: invoice_key purchaser_key date outstanding 1 1 yy/mm/dd 100 1 1 yy/mm/dd 90 1 1 yy/mm/dd 70 1 1 yy/mm/dd 0

I have a dimension with information about the invoice such as functional key, initial amount, due date etc..

I wonder if I should put the invoice information into the fact table rather than into a dimension table?

The reasons why I am asking are :
1. I have an amount in the invoice dimension which looks like a measure
2. Each invoice generate an average of 2 or 3 payment events, consequently my invoice dimension grows almost as fast as my fact tables
3. If I need to know the purchaser for a specific invoice I had to go through the fact table which costs me a join
4. On the other hand fact table is huge (1 millions lines) and I'd rather limit the number of column in it (particularly VARCHAR column like functionnal key)

What is your opinion on that point ?

nick_white Wed May 28, 2014 12:14 pm

My thoughts on your points:

1. Why not put the invoice amount, amount paid and outstanding amount on each fact record? You then have all the information available on each record to support any business queries
2. Your Dim grows at half or third of the rate of your fact - there's nothing wrong with this. Many examples that model an invoicing process have a grain of the invoice line rather than the invoice that you have and so have relatively much higher record counts
3. Fact tables are how you relate Dimensions in a dimensional model. Joining through the fact table is unlikely to have a significant impact assuming proper design, indexing, performance tuning. However, rules are there to be broken! If you want to start creating FKs between dimensions because you have a valid reason to and you fully understand the impact of what you are doing then go ahead.
4. 1 million records in a fact table is not huge (I'm working on a project likely to have to handle 10s of millions of fact records being created per day). However, as you say, putting varchar columns in a fact table should be avoided if possible: a varchar(20) column holding an invoice number as a degenerate dimension is probably not a big deal; a varchar(250) column holding a text comment is probably not a good idea

Modeling invoice payment

Modeling invoice payment

Re: Modeling invoice payment