Fact load with SCD2- join on current row or between dates
3 posters
Page 1 of 1
Fact load with SCD2- join on current row or between dates
Hi - still learning so bear with pls.....
I've inherited a DW in which a sales fact table is truncated and reloaded every day.
Previsouly i've loaded accumulating fact tables with SCD2 Dimensions by just joining to the dimension record which has "currentrow=1". Based on a nightly load this means that the fact is joined to the appropriate dimension record with an accuracy of 1 day. This assumes the ETL suceeds every night and the fact table never gets corrupted/truncated etc.
The way I've just come across for loading a snapshot seems to be to join on the fact event date being between the effective dates of the dimension record.
What i want to know is
a) are these both valid methods?
b) if so which method is more appropriate to which scenerios?
c) what are the pros and cons of each method?
As a start I'm thinking that the currentrow method is fragile in that it requires frequent regular loading to work. Conversely the between dates method relies on having an event date(ID) in the fact table and then there may be multiple dates to choose between.
I've inherited a DW in which a sales fact table is truncated and reloaded every day.
Previsouly i've loaded accumulating fact tables with SCD2 Dimensions by just joining to the dimension record which has "currentrow=1". Based on a nightly load this means that the fact is joined to the appropriate dimension record with an accuracy of 1 day. This assumes the ETL suceeds every night and the fact table never gets corrupted/truncated etc.
The way I've just come across for loading a snapshot seems to be to join on the fact event date being between the effective dates of the dimension record.
What i want to know is
a) are these both valid methods?
b) if so which method is more appropriate to which scenerios?
c) what are the pros and cons of each method?
As a start I'm thinking that the currentrow method is fragile in that it requires frequent regular loading to work. Conversely the between dates method relies on having an event date(ID) in the fact table and then there may be multiple dates to choose between.
gettingthere2- Posts : 5
Join date : 2011-06-30
Re: Fact load with SCD2- join on current row or between dates
gettingthere2 wrote:Hi - still learning so bear with pls.....
I've inherited a DW in which a sales fact table is truncated and reloaded every day.
Previsouly i've loaded accumulating fact tables with SCD2 Dimensions by just joining to the dimension record which has "currentrow=1". Based on a nightly load this means that the fact is joined to the appropriate dimension record with an accuracy of 1 day. This assumes the ETL suceeds every night and the fact table never gets corrupted/truncated etc.
The way I've just come across for loading a snapshot seems to be to join on the fact event date being between the effective dates of the dimension record.
What i want to know is
a) are these both valid methods?
b) if so which method is more appropriate to which scenerios?
c) what are the pros and cons of each method?
As a start I'm thinking that the currentrow method is fragile in that it requires frequent regular loading to work. Conversely the between dates method relies on having an event date(ID) in the fact table and then there may be multiple dates to choose between.
Bump. Cam anyone help?
Thanks
gettingthere2- Posts : 5
Join date : 2011-06-30
Re: Fact load with SCD2- join on current row or between dates
Truncating and reloading the fact table daily is not a bad practice for a small-to-medium DW. It eliminates a lot complexity in CDC (chagne data capture) and ETL and ensures that the DW reflects the latest changes in the source system.
As far as which SCD2 dimension records to which to join when you have an accumulating snapshot with multiple dates, I think you need to pick which date you want to represent the "as of" date for that record. Could be the first date in the accumulating snapshot (ex: order received) or could be the most recent date that is populated (ex: order shipped). But you're going to have to determine some rule for when in the life of an order you want to "lock in" your view of the customer so to speak.
If you always use the current customer record when reloading your fact table, that is not different than SCD1 (which is simpler if there is no requirement for SCD2 tracking of customer (or other dimension) attributes.)
As far as which SCD2 dimension records to which to join when you have an accumulating snapshot with multiple dates, I think you need to pick which date you want to represent the "as of" date for that record. Could be the first date in the accumulating snapshot (ex: order received) or could be the most recent date that is populated (ex: order shipped). But you're going to have to determine some rule for when in the life of an order you want to "lock in" your view of the customer so to speak.
If you always use the current customer record when reloading your fact table, that is not different than SCD1 (which is simpler if there is no requirement for SCD2 tracking of customer (or other dimension) attributes.)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Fact load with SCD2- join on current row or between dates
You could load either way, but the question is, what is the business expecting?
Why is customer a type 2 dimension if the fact table is being loaded the way they are? Was there ever an expectation of customer or sales history? I would question why sales are being loaded the way they are. A combination of a type 2 customer dimension and a fact table that is reloaded daily leads me to think either something wasn't designed right or expectations are not being met. Probably both.
Why is customer a type 2 dimension if the fact table is being loaded the way they are? Was there ever an expectation of customer or sales history? I would question why sales are being loaded the way they are. A combination of a type 2 customer dimension and a fact table that is reloaded daily leads me to think either something wasn't designed right or expectations are not being met. Probably both.
Re: Fact load with SCD2- join on current row or between dates
Thanks for your replies. Reading them has brought some clarity to my mind.
It seems that it depends on whether the business wants the date the fact record was created (more or less the same as the record creation date in the OLTP system) or another date which relates to the fact.
It is actually the Staff dimension that is an SCD2 not the customer. The ability to track staff sales performance as individuals move around is sought, hence the SCD2.
I don't think there is anything wrong with the current arrangement. The Staff SCD2 allows sales history by staff member to be reported, whilst the truncating and reloading of the fact table ensures there is no need to deal with the CDC (ie changes of status and other sale attributes). Joining on (historic) fact and effective dates allows this.
Am i missing something?
It seems that it depends on whether the business wants the date the fact record was created (more or less the same as the record creation date in the OLTP system) or another date which relates to the fact.
It is actually the Staff dimension that is an SCD2 not the customer. The ability to track staff sales performance as individuals move around is sought, hence the SCD2.
Why is customer a type 2 dimension if the fact table is being loaded the way they are? Was there ever an expectation of customer or sales history? I would question why sales are being loaded the way they are. A combination of a type 2 customer dimension and a fact table that is reloaded daily leads me to think either something wasn't designed right or expectations are not being met. Probably both.
I don't think there is anything wrong with the current arrangement. The Staff SCD2 allows sales history by staff member to be reported, whilst the truncating and reloading of the fact table ensures there is no need to deal with the CDC (ie changes of status and other sale attributes). Joining on (historic) fact and effective dates allows this.
Am i missing something?
Maybe I was not being clear - previsouly on another project I was loading the fact incrementally and dealing with the CDC and had it joined to various SCD2s.If you always use the current customer record when reloading your fact table, that is not different than SCD1 (which is simpler if there is no requirement for SCD2 tracking of customer (or other dimension) attributes.)
gettingthere2- Posts : 5
Join date : 2011-06-30
Similar topics
» Dimensions with effective and end dates - best design and best way to join to fact tables?
» Combination of SCD2, partial snowflake, Effective and end dates in fact table
» Current and Historic Dimensions (one table or two?)
» can i use other two effective dates column in SCD2
» Best practice for SCD2 start and end dates
» Combination of SCD2, partial snowflake, Effective and end dates in fact table
» Current and Historic Dimensions (one table or two?)
» can i use other two effective dates column in SCD2
» Best practice for SCD2 start and end dates
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum