Optimal SCD type 2 dimension design
5 posters
Page 1 of 1
Optimal SCD type 2 dimension design
I am wondering if there are advantages or an optimal way to design dimension? Specifically, is a single "effective_date" attribute ok? Or do you recommend a start_date and end_date? Is it necessary to have a "current_flag" indicator or can you just use the max(effective_date) to get that?
Take customer for example,
customer_id <-- surrogate key
custno <-- natural key
effective_date
cust_email
.
Take customer for example,
customer_id <-- surrogate key
custno <-- natural key
effective_date
cust_email
.
kjfischer- Posts : 28
Join date : 2011-05-04
Re: Optimal SCD type 2 dimension design
At a minimum you should have "begin effective date" and "end effective date" fields for each row. Designate a special far-future value such as 2999-12-31 to place in the "end effective date" field for current records. Any BETWEEN queries you do to pull records as-of a particular point in time will work OK, and you could use WHERE End_Effective_Date = #2999-12-31# to select current rows.
That said, many modelers also implement an "is current" indicator flag to faciliate identification of current rows.
That said, many modelers also implement an "is current" indicator flag to faciliate identification of current rows.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Optimal SCD type 2 dimension design
True !
Ans also, in many cases you would need filter to extract on the current version of the record for which the max(effective_date) will not be efficient as you might end up in writing an inline view in your reporting layer rather the end_date as mentioned in the above post will do the trick.
Ans also, in many cases you would need filter to extract on the current version of the record for which the max(effective_date) will not be efficient as you might end up in writing an inline view in your reporting layer rather the end_date as mentioned in the above post will do the trick.
datamodeller- Posts : 9
Join date : 2010-07-25
Re: Optimal SCD type 2 dimension design
I have been told that the Oracle window/partition functions can handle this effeciently with just an effective_date; that is why I am asking...
kjfischer- Posts : 28
Join date : 2011-05-04
Re: Optimal SCD type 2 dimension design
As far as SCD 2 is concerned, a pair of SCD dates is a bare minimum by definition, just as VHF rightly said, anything less would not be called SCD 2.
Trying to minimise the number of SCD housekeeping attributes to achieve similar functionality at expense of ease of dimension use and performance is against the principle of dimensional modeling. Obviously max(effective_date) is highly costly for SCD, and hence not recommended.
The Window/partition functions are ANSI SQL extension, not just in Oracle, and designed to achieve some limited OLAP functionalities.
Trying to minimise the number of SCD housekeeping attributes to achieve similar functionality at expense of ease of dimension use and performance is against the principle of dimensional modeling. Obviously max(effective_date) is highly costly for SCD, and hence not recommended.
The Window/partition functions are ANSI SQL extension, not just in Oracle, and designed to achieve some limited OLAP functionalities.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Optimal SCD type 2 dimension design
kjfischer wrote:I have been told that the Oracle window/partition functions can handle this effeciently with just an effective_date; that is why I am asking...
There are some fundimental problems using window functions such as LEAD and LAG to calculate the expiration date.
If you have rows that expire and not superceded by a new row (in other words, they just end), you cannot detect that in a window function. You would need some kind of flag indicating there is no 'next' row.
LEAD and LAG do not perform well in views. My experience has been that predicates may not be pushed do to a view containing LEAD or LAG. This means that the view will be fully materialized before predicates are applied. The reason for this is that predicates can affect the results of the window function. Views must present the set as defined in the view, so predicates are ignored to avoid ambiguous results.
Similar topics
» Type 2 dimension or type 2 column?
» Design Standards - Numeric Type 2 Dimension Attributes
» Design Alternatives for Late Arriving/Updated Type 2 Dimension Data
» rationale behind dimension with Type 0 and missing Type 5
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
» Design Standards - Numeric Type 2 Dimension Attributes
» Design Alternatives for Late Arriving/Updated Type 2 Dimension Data
» rationale behind dimension with Type 0 and missing Type 5
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|