Modelling Fact Tables That Change

Tyberious Funk Thu Nov 12, 2015 12:50 am

I've been struggling with an a challenge in modelling a fact table where the facts will occasionally change causing headaches. I was reading through design tip #25, which talks about building a fact table for sales. In this tip, Kimball states;

... we can “decorate” a measurement with everything that is known to be true at the time of the measurement. So our single line item grain fact table has the following dimensions:

Date of overall invoice (dimension)
Sales agent (dimension)
Customer (dimension)
Payment terms (dimension)
Product (dimension)
Promotion (dimension)

The gist of my issue is when something we know at the time of measurement changes. For example, a sale is incorrectly recorded against a particular agent, and so it gets modified in the source system. If I run my sales report at the end of January, I might have 6 sales recorded for Bill and 4 sales recorded for Gary. But later in the month, one of Bill's sales gets shifted to Gary (after the error gets detected). Now, if I run a sales report, both Bill and Gary have 5 sales each... which is the correct answer.

But it's different to the previous sales report.

Consequently, I get complaints from stakeholders that "the numbers keep changing". While there is nothing I can do about the numbers changing, the way the fact table gets updated makes it difficult to show how and when the numbers are change. eg, it would be good to be able to run January's sales report as it the data currently appears, and as it appeared at the end of January.

Is there a preferred approach for modelling this?

nick_white Thu Nov 12, 2015 8:06 am

Can a fact change only once? Do you only care what it looked like at the point it was created and now - rather than at any point in time between when it was created and now?

If you want to know the state of a fact at any point in time then put effective start and end dates on the fact - this is a standard design pattern that Kimball describes. Basically you have a slowly changing fact.

If you can only ever have two versions of a fact then I might just put flags on the facts to show "original" and "current" - as these are easier to query than "date between effective start and end dates".
When you first create a fact both flags would be set to Y. When a change comes in create new fact record with original = N and Current = Y and set the Current on the first fact record to N

zoom Thu Nov 12, 2015 11:05 am

I would recommend creating a “Current flag” column to identify correct data in the fact table. You can create a data quality report on the issue you described and send it to the business user to identify which agent data is the correct one. Based on that information you can set the Flag (Y/N) value. Or if your source system's most current data is considered correct, then set the flag value to ‘Y’ and update previously loaded data flag to ‘N’.

ngalemmo Thu Nov 12, 2015 1:39 pm

Another method is to store the data transactionally. In other words, when a change occurs, create a fact row that reverses the previous one (reverse the sign on all measures, same dimensions) and insert the new version. All rows would be timestamped as to when they were inserted. You would use this timestamp to recreate reports as of a particular point in time.

Tyberious Funk Thu Nov 12, 2015 6:08 pm

nick_white wrote:Can a fact change only once? Do you only care what it looked like at the point it was created and now - rather than at any point in time between when it was created and now?

For the sake of brevity, I simplified my example. But in reality, yes, the facts can change... for example, each fact has various stages they go through, so their status can change (including, potentially going backwards). And as per my example, staff members allocated to a fact can potentially change (usually as a result of incorrect data being updated).

If you want to know the state of a fact at any point in time then put effective start and end dates on the fact - this is a standard design pattern that Kimball describes. Basically you have a slowly changing fact.

This is what I was thinking, though I didn't realise this was a standard design pattern. My concern was that I would need a new entry in a fact table just because a single dimension had changed... while this is relatively simple to query, it (potentially) creates relatively large fact tables with lots of duplicated material.

My other concern was that an unwary user, not realising they would need to limit by dates, would potentially over-count volumes. My solution to this was to essentially create two fact tables -- eg, FACT_Sales_History and FACT_Sales_Current (or something similar). For convenience, FACT_Sales_Current could just be a view derived from FACT_Sales_History.

Does this seem reasonable?

nick_white Fri Nov 13, 2015 3:38 am

Nothing wrong with creating two fact tables but obviously that doesn't help if data volumes are a concern.
Personally I would create one fact table that holds the all the information that could be required and then put views on top of this that show current position, position at first creation (or whatever else you need) and then give your users access to the views rather than the table

Sponsored content

Modelling Fact Tables That Change

Modelling Fact Tables That Change

Re: Modelling Fact Tables That Change

Re: Modelling Fact Tables That Change

Re: Modelling Fact Tables That Change

Re: Modelling Fact Tables That Change

Re: Modelling Fact Tables That Change

Re: Modelling Fact Tables That Change