SCD Type 2 dimensions and facts
3 posters
Page 1 of 1
SCD Type 2 dimensions and facts
I just started onto DW/ETL and need some clarification on implementing SCD type 2 dimensions and loading the fact tables. I am working on developing ETL for few dimension tables/fact and the table structure of each of the dimensions looks like below:
Product Dim: Prod_Key(SK), prod_num(natural key), category, name etc, eff_dt, expir_dt, cur_row_ind
Sales Fact: Sales_key(SK),cust_key(fk), product_ key(fk), ord_key(fk), date_key(fk), quantity sold etc
The dimensions that I have are all type 2 dimensions and so we would expect to see rows being expired and added new everytime. MY question with regards to the fact table is for example in fact I already loaded a record for with product key of 100. And next day, there was category change for this product so expired the row with prod key of 100 and inserted a new record which is currently active with value of 101. How can I reflect this in fact? Do I insert a new record with prod key value of 101? And what is the strategy we follow when we have any other metrics like amounts or something on the fact? Like do we update the amount of the old row to 0 or something?
Product Dim: Prod_Key(SK), prod_num(natural key), category, name etc, eff_dt, expir_dt, cur_row_ind
Sales Fact: Sales_key(SK),cust_key(fk), product_ key(fk), ord_key(fk), date_key(fk), quantity sold etc
The dimensions that I have are all type 2 dimensions and so we would expect to see rows being expired and added new everytime. MY question with regards to the fact table is for example in fact I already loaded a record for with product key of 100. And next day, there was category change for this product so expired the row with prod key of 100 and inserted a new record which is currently active with value of 101. How can I reflect this in fact? Do I insert a new record with prod key value of 101? And what is the strategy we follow when we have any other metrics like amounts or something on the fact? Like do we update the amount of the old row to 0 or something?
rajeshwarr59- Posts : 21
Join date : 2015-06-26
Re: SCD Type 2 dimensions and facts
You don't change dimension keys.
The key represents the state of the dimension at the time the transaction was received by the system (or at the time of the transaction).
To get another version of the dimension row, the original method was to perform a self-join of the dimension using it's natural key to locate any other version of the dimension row (usually to get the current version, which is why it has a current flag).
Other techniques have evolved. There is the notion of a stable surrogate key (basically a Type 1 key) that acts as an alternate key to the dimension. You would either use that key for the self join (basically to improve join performance versus using a natural key) or you can store it as an additional foreign key on the fact to avoid the self join completely.
You have the option of joining to the type 2 table using the alternate key and filtering on the current flag, or, you can field a separate Type 1 table, depending on your needs.
As far as how you update measures, it depends on your need.
You can update in place (snapshot), insert deltas (transactional), or maintain historical versions (accumulating snapshot). Any of these will handle whatever comes up.
The key represents the state of the dimension at the time the transaction was received by the system (or at the time of the transaction).
To get another version of the dimension row, the original method was to perform a self-join of the dimension using it's natural key to locate any other version of the dimension row (usually to get the current version, which is why it has a current flag).
Other techniques have evolved. There is the notion of a stable surrogate key (basically a Type 1 key) that acts as an alternate key to the dimension. You would either use that key for the self join (basically to improve join performance versus using a natural key) or you can store it as an additional foreign key on the fact to avoid the self join completely.
You have the option of joining to the type 2 table using the alternate key and filtering on the current flag, or, you can field a separate Type 1 table, depending on your needs.
As far as how you update measures, it depends on your need.
You can update in place (snapshot), insert deltas (transactional), or maintain historical versions (accumulating snapshot). Any of these will handle whatever comes up.
Re: SCD Type 2 dimensions and facts
Thank you, makes sense.
rajeshwarr59- Posts : 21
Join date : 2015-06-26
Re: SCD Type 2 dimensions and facts
@ngalemmo I am new to the datawarehousing. can you please explain with example what you have written above.
Last edited by rohit.kataria06@gmail.com on Fri Mar 04, 2016 5:00 am; edited 1 time in total (Reason for editing : spelling)
rohit.kataria06- Posts : 1
Join date : 2016-03-04
Age : 35
Re: SCD Type 2 dimensions and facts
The self join method:
The dimension has a type 2 primary key, its natural key, and a current row flag. The fact contains the type 2 foreign key. A FROM clause would look something like:
FROM fact f
join dim d on f.dim_key = d.dim_key
join dim cd on d.natural_key = cd.natural_key and current_flag (is true)
The dim aliased by cd will contain the current version of the row. Alias d will contain the historical version of the row.
The stable key method:
The stable key is a surrogate for the natural key. Basically, when used in a type 2 dimension the process is to set this value to the primary key value when the dimension row is first created. You then carry this same value on all future versions of the dimension row. The dimension has a type 2 primary key, the stable key , and a current row flag. The fact contains the type 2 foreign key, and the stable key. A FROM clause to get the current row would look something like:
FROM fact f
join dim d on f.dim_stable_key = d.stable_key and current_flag
This eliminates the dimension self join.
The dimension has a type 2 primary key, its natural key, and a current row flag. The fact contains the type 2 foreign key. A FROM clause would look something like:
FROM fact f
join dim d on f.dim_key = d.dim_key
join dim cd on d.natural_key = cd.natural_key and current_flag (is true)
The dim aliased by cd will contain the current version of the row. Alias d will contain the historical version of the row.
The stable key method:
The stable key is a surrogate for the natural key. Basically, when used in a type 2 dimension the process is to set this value to the primary key value when the dimension row is first created. You then carry this same value on all future versions of the dimension row. The dimension has a type 2 primary key, the stable key , and a current row flag. The fact contains the type 2 foreign key, and the stable key. A FROM clause to get the current row would look something like:
FROM fact f
join dim d on f.dim_stable_key = d.stable_key and current_flag
This eliminates the dimension self join.
Similar topics
» Aggregating type 2 attributes for all facts
» No of Dimensions and Facts
» What Impact do dimensions have that actually are facts ?
» Arriving at Facts and Dimensions
» How long should -1 dummy records exist in fact tables?
» No of Dimensions and Facts
» What Impact do dimensions have that actually are facts ?
» Arriving at Facts and Dimensions
» How long should -1 dummy records exist in fact tables?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum