How to Track SCD Type 2 for Accumlating or Periodic Snapshot

kris Tue Jan 04, 2011 12:13 pm

Hi,

I am looking to create a Periodic Snapshot fact and Accumulating snapshot fact for Backlog of orders and order completion. I tried searching in this forum as to how to model this scenario SCD type 2, but unable to find any. If I have to carry over the total backlog from previous month fact to the current month and since the dimension keys could be changing every month for a customer how will I be able to track the customer as the fact does not have the reference to natural keys and as the facts will have the reference of expired created Customer Dimension Key. How will I be able to retrieve the snapshot information of the earlier cycle. Does it mean that I have to include natural keys as part of fact along with surrogate keys.

I will really appreciate you help on this.

Thanks and Regards,
Kris

ngalemmo Wed Jan 05, 2011 2:30 am

how will I be able to track the customer as the fact does not have the reference to natural keys

Of course jt does! The natural key values are in the dimension, are they not?

Facts are integrated, summarized and reported by dimensional ATTRIBUTES, not dimensional KEYS. Customer mey be a type 2, and, over time, some attributes may change, but not all.

sgudavalli Wed Jan 05, 2011 3:13 am

Can you post sample schema to visulaize the problem?

kris Wed Jan 05, 2011 4:01 pm

Thanks for the reply.

Actually its not a simple fact since its accumlating or snapshot fact the reference to the dimension keys may change. Please verify my scenario below:

Periodic Snapshot
Schma

Month Key
Customer Key
Product Key
Total Backlog Units

Month = 1
Customer Key = 20
Product Key = 24
Total Backlog Units = 19880 Tons

Month = 2
Customer Key = 25 (This is the same customer for previos month so SCD Type 2 with previous month Customer Key = 20)
Product Key = 26
Total Backlog = 234563 Tons

In the above scenario lets say Month 1 backlog needs to added to the month 2 backlog so Month 2 backlog = Month1 + Month2 backlog.
In order to carry forward the previous month backlog if the the customer key now changes to 25 when it was 20 in the last month, How will I be able to using the natual key I will not be able to derive the previous month customer key as it has changed.
How will I know the pick up the right value for previous customer. then should I use the fact table as a look up in my ETL which could be huge.

If there is another to way to do this then please help me to see what is the best way to deal with this scenario both modelling wise and ETL wise.

Thanks so much once again for all your help once again.

Jeff Smith Wed Jan 05, 2011 5:24 pm

I have a member dimension that contains multiple rows for each member. The individual records for a member roll up to a single row (the Member Number identifies the member). On the member_dimension table, I added a column called Current_Member_Key. Whenever a member is listed multiple times in the Member Dimension, it will have the same Current_Member_Key. The Current_Member_Key joins back to the Member_Dimension.

When I need to rollup fact data to the member level into an aggregate table, I join to the member dimension and aggregate to the Current_Member_Key.

This also comes in handy if I want to aggregate different facts in the same member level aggregate. I may have a claim fact, revenue fact, etc, with all of the fact tables having the Member_Key. Roll them up to the Current_Member_Key, join on the Current_member_key and loaf the aggregate.

The danger with this is that the elements that caused new rows to appear in the member dimension for the same member may no longer be valid.

ngalemmo Wed Jan 05, 2011 8:02 pm

If the question is "How do I report the current state of the customer using historical facts with a type 2 customer dimension", then Jeff describes one of three ways to do it.

The original method was to do a self-join on the dimension using the natural key and filtering on the current flag. This tends to be clumbsy and slow.

A second method is to simply maintain both a type 1 and type 2 version of the dimension table and carry keys to both in the fact table.

What Jeff describes is a variation on the second method where you only have one dimension table, with two keys: a type 2 primary key and a type 1 alternate key (filtering on the current flag). Again, you carry both keys on the fact table. To maintain such a table, you set the type1 key to the type 2 key value when you create the first dimension row for that natural key. As the dimension is updated, you carry the same original type 1 key value on all subsequent rows of the same natural key. When querying facts, you use either the type 2 or type 1 version of the FK depending on the context of the query.

kris Thu Jan 06, 2011 2:34 am

Thanks so much for the reply. I think I got my solution.

krishgenius Tue Jun 03, 2014 6:56 pm

whatever be the case ( in this case 2 SKs for the same customer 20 and 25), the calculations are driven by the dimension table. In this case, would grouping by customer natural key help? Please let me know if I am missing anything here.

The reporting query would something like:

select sum(backlog) , customer natural key
from cust_dim,
where cust_dim.cust_sk=.cust_sk and
time between and
group by customer natural key

Sponsored content

How to Track SCD Type 2 for Accumlating or Periodic Snapshot

How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot

Re: How to Track SCD Type 2 for Accumlating or Periodic Snapshot