SCD Type 2 dimensions and facts

rajeshwarr59 Tue Jul 21, 2015 10:55 pm

I just started onto DW/ETL and need some clarification on implementing SCD type 2 dimensions and loading the fact tables. I am working on developing ETL for few dimension tables/fact and the table structure of each of the dimensions looks like below:

Product Dim: Prod_Key(SK), prod_num(natural key), category, name etc, eff_dt, expir_dt, cur_row_ind
Sales Fact: Sales_key(SK),cust_key(fk), product_ key(fk), ord_key(fk), date_key(fk), quantity sold etc

The dimensions that I have are all type 2 dimensions and so we would expect to see rows being expired and added new everytime. MY question with regards to the fact table is for example in fact I already loaded a record for with product key of 100. And next day, there was category change for this product so expired the row with prod key of 100 and inserted a new record which is currently active with value of 101. How can I reflect this in fact? Do I insert a new record with prod key value of 101? And what is the strategy we follow when we have any other metrics like amounts or something on the fact? Like do we update the amount of the old row to 0 or something?

ngalemmo Tue Jul 21, 2015 11:17 pm

You don't change dimension keys.

The key represents the state of the dimension at the time the transaction was received by the system (or at the time of the transaction).

To get another version of the dimension row, the original method was to perform a self-join of the dimension using it's natural key to locate any other version of the dimension row (usually to get the current version, which is why it has a current flag).

Other techniques have evolved. There is the notion of a stable surrogate key (basically a Type 1 key) that acts as an alternate key to the dimension. You would either use that key for the self join (basically to improve join performance versus using a natural key) or you can store it as an additional foreign key on the fact to avoid the self join completely.

You have the option of joining to the type 2 table using the alternate key and filtering on the current flag, or, you can field a separate Type 1 table, depending on your needs.

As far as how you update measures, it depends on your need.

You can update in place (snapshot), insert deltas (transactional), or maintain historical versions (accumulating snapshot). Any of these will handle whatever comes up.

rajeshwarr59 Thu Jul 23, 2015 10:31 pm

Thank you, makes sense.

rohit.kataria06 Fri Mar 04, 2016 4:59 am

@ngalemmo I am new to the datawarehousing. can you please explain with example what you have written above.

ngalemmo Fri Mar 04, 2016 8:13 pm

The self join method:

The dimension has a type 2 primary key, its natural key, and a current row flag. The fact contains the type 2 foreign key. A FROM clause would look something like:

FROM fact f
join dim d on f.dim_key = d.dim_key
join dim cd on d.natural_key = cd.natural_key and current_flag (is true)

The dim aliased by cd will contain the current version of the row. Alias d will contain the historical version of the row.

The stable key method:

The stable key is a surrogate for the natural key. Basically, when used in a type 2 dimension the process is to set this value to the primary key value when the dimension row is first created. You then carry this same value on all future versions of the dimension row. The dimension has a type 2 primary key, the stable key , and a current row flag. The fact contains the type 2 foreign key, and the stable key. A FROM clause to get the current row would look something like:

FROM fact f
join dim d on f.dim_stable_key = d.stable_key and current_flag

This eliminates the dimension self join.

Sponsored content

SCD Type 2 dimensions and facts

SCD Type 2 dimensions and facts

Re: SCD Type 2 dimensions and facts

Re: SCD Type 2 dimensions and facts

Re: SCD Type 2 dimensions and facts

Re: SCD Type 2 dimensions and facts

Re: SCD Type 2 dimensions and facts