Accumulating Snapshot Fact table

remiby Thu Apr 12, 2012 8:41 am

I have some questions about modeling an accumulating snapshot fact table.
I will take as an example a loan application analysis process.
1- I will have a 1-1 relationship between dim_application and fact_application_process (accumulating snapshot). So in this case should I merge everything in the same table (fact_application_process ) to avoid useless joins?
2- My fact table will contain a client_key FK to dim_client. Giving dim_client is a SCD2, will I have to update all the client_key of my fact table every time a change occurs in client_dim? I would say yes. What do you think?

ngalemmo Thu Apr 12, 2012 10:07 am

1. No. Question is, do you need an application dimension, or can it be handled by a combination of smaller dimensions?

2. Definitely no. If you are going to do that, why implement a type 2?

remiby Thu Apr 12, 2012 10:57 am

1. This is actually the point. I am not sure I need a dim_application. Here are briefly the attributes of an application:
General attributes:
the application number, a client, a loan type, a branch, a currency, a purpose
Step attributes:
At every step (request,review,approval,disbursement) I would have a date, an amount, a term and a comment.

So I could put the general attributes in a dim_application or everything in my fact_application_process

2. I need to keep client changes history. So I have to model dim_client as a SCD2. Then to keep my fact_application_process table up to date I will have to keep updated the client_key to the most actual client record.
For example, if a client changes of branch then a new row with a new SK will be added in my dim_client table. Then I will need my applications corresponding to that client to be updated in fact_application_process. Am I missing something?

TheNJDevil Thu Apr 12, 2012 12:47 pm

I think what you are missing is that the Fact table joins to the client dimension using the SK, but your where clause would not filter by the SK. That is why the fact table would not need updated. If you were trying to filter by a particular client, you would be using the client's natural key. That natural key will be represented multiple times in the dimension table.

select sum(loanAmt) from factLoan l join dimClient c on l.ClientKey = c.ClientKey where clientSalesForceID = 777389

I don't care that my client switched branches 6 times, and has 6 client records in the dimClient, the natural key stayed the same so my query works to get total loan amount for this client.
Hope that makes it a little clearer.

remiby Thu Apr 12, 2012 12:59 pm

It is not clear. If in my cube (let's say application process) I need to display the branch of my client it will get the wrong branch if I did not update the client_key in my fact table.
I want to specify that this question makes sense only with an accumulating snapshot fact table because we are making updates not only inserts.

TheNJDevil Thu Apr 12, 2012 1:12 pm

I'm not that familiar with cube generation, but if the client dimension has a current_record flag, we still get the result of bringing back everything when filtering on the natural key. The client cube dimension would use the current_record flag to bring back proper client record separately.

If this is incorrect in cube generation, then you should look into making client an SCD 3 if at all possible.

ngalemmo Thu Apr 12, 2012 1:16 pm

remiby wrote:1. This is actually the point. I am not sure I need a dim_application. Here are briefly the attributes of an application:
General attributes:
the application number, a client, a loan type, a branch, a currency, a purpose
Step attributes:
At every step (request,review,approval,disbursement) I would have a date, an amount, a term and a comment.

So I could put the general attributes in a dim_application or everything in my fact_application_process

You could have other dimensions, such as client, branch, currency, etc... and reference them as FKs on the fact. Stuff like type and purpose can be combined into a single junk dimension. You then place the application number as a degenerate dimension on the fact. You do not create a big wide fact table, as performance would be terrible.

2. I need to keep client changes history. So I have to model dim_client as a SCD2. Then to keep my fact_application_process table up to date I will have to keep updated the client_key to the most actual client record.
For example, if a client changes of branch then a new row with a new SK will be added in my dim_client table. Then I will need my applications corresponding to that client to be updated in fact_application_process. Am I missing something?

You do not change fact keys. The point of a type 2 dimension is not to keep dimension history, it is to represent facts using historical context. If there is no need to represent facts in a historical context, there is no need for dimensional history. But that's beside the point... you get current values from a type 2 using a self join on the dimension using the natural key to locate the current row version. You do not re-key facts. The FK on the fact represents the version of the dimension at the time of the event represented by the fact.

remiby Fri Apr 13, 2012 4:26 am

performance would be terrible

Why performance would be terrible since I have only one row per application in the fact table?

Thanks so much for your help.

remiby Fri Apr 13, 2012 6:00 am

Searching the web I tumbled on this article confirming my initial thought:
blog.oaktonsoftware.com/2010/12/deeper-into-accumulating-snapshot.html

Here is what the author says:

If the item or entity being processed undergoes slow changes, update its surrogate key in its corresponding fact table row. This will be sure the fact table always points to the most recent version of the item.

BoxesAndLines Fri Apr 13, 2012 8:49 am

Well if you found the answer you were looking for on the internet, it must be best practice! Try to find that answer anywhere on Kimball's site.

ngalemmo Fri Apr 13, 2012 5:44 pm

remiby wrote:Searching the web I tumbled on this article confirming my initial thought:
blog.oaktonsoftware.com/2010/12/deeper-into-accumulating-snapshot.html

Here is what the author says:
If the item or entity being processed undergoes slow changes, update its surrogate key in its corresponding fact table row. This will be sure the fact table always points to the most recent version of the item.

It is incorrect. It is not best practice. Anything that requires retroactive updates of fact FKs should be avoided.

Anyway, you are building a cube. If you need current dimensional state from a type 2, handle it in the query that feeds the cube. Perform a self join on the dimension table as previously mentioned. Its very simple.

hang Fri Apr 13, 2012 6:28 pm

Here you go: "Both foreign keys and measured facts may be changed during the revisit" on Accumulating Snapshot for Admissions Tracking, p244, chapter 12 Education, Kimball's dimensional modeling toolkit.

My take is, if you use accumulating snapshot to track status and measures in columns, all the columns in such a fact table are supposed to be updated by current value. It's obvious for date key to change, mostly from 0/-1 (null), but it seems a bit uncomfortable for other FK's. I guess having SCD2 SK in this type of fact table is purely for RI, and NK combined with SCD current status would be more pragmatic approach, saving some ETL work but compromising on modeling sanity.

Sponsored content

Accumulating Snapshot Fact table

Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table

Re: Accumulating Snapshot Fact table