How to split the dimension members with SCD Type 2?
3 posters
Page 1 of 1
How to split the dimension members with SCD Type 2?
Hi all,
I want to create dimension table, where will be some members
and each of them has its own history (so it has own versions in time),
where at one time is one and only one always valid.
So it is commonly solved by SCD Type 2.
For example DimTable:
SurrKey  KeyDurable  Code  Description  StartDate  EndDate IsValid
1  1  A  this is first version of A member  2010101  20100131  0
2  1  A  this is second version of A member  2010201  99991231  0
3  1  A  this is third version of A member  2010301  99991231  1
4  4  B  this is first version of B member  2010101  20100131  0
5  4  B  this is second version of B member  2010201  20100228  0
6  4  B  this is third version of B member  2010301  99991231  1
Where:
 SurrKey = surrogate key = PK of DimTable;
 KeyDurable = durable surrogate key which identiffies each dimension member
(but not its concrete version);
 Code = natural key = business key from source table;
 StartDate = start date of validity of concrete version of dimension member;
 EndDate = end date of validity of concrete version of dimension member;
 IsValid = 1/0 like Yes/No = sign of current validity
There are 2 dimension members (A and B) and each of them has three versions
(first, second and third) and just now is valid the third version of them.
So till now this is the basic SCD Type 2 for dimension attributes like Description.
But my members will vary not only through its Description attribute,
but also it can split during time into more parts. Every dimension member
can split.
So here is my question: How can I do this?
For example Athird version (SurrKey=3) will some day
split into two "sons" like A1 (SurrKey=7) and A2 (SurrKey=8)
[these are two "normal" dimension members, each of them can have its own history,
but they "belong" by its origin to A member].
And then A2 will split into three "sons" like A2.1 (SurrKey=9),
A2.2 (SurrKey=10) and A2.3 (SurrKey=10). Etc.
So DimTable changes now into hierarchy table: so should I
add new column (ParentKeySurr or ParentKeyDurable ??  which of them, or both of them??)
into DimTable like in ragged hierarchy?
Say I have some FactTable with 3 keys and one measure:
 DimSurrKey (FK into DimTable),
 DimKeyDurable (FK into DimTable),
 DateOfTransaction (date of transaction in source system where is stored Number value),
 Number (measure which is a result of some transaction of source system).
I would like to generate some reports, where I would like to see
some SUMS (for values of Number column of FactTable)
for A member like A1 member + A2 member or
for A2 member like A2.1 member + A2.1 member + A2.1 member.
Please how to solve this?
Thank you very much.
Mirek
I want to create dimension table, where will be some members
and each of them has its own history (so it has own versions in time),
where at one time is one and only one always valid.
So it is commonly solved by SCD Type 2.
For example DimTable:
SurrKey  KeyDurable  Code  Description  StartDate  EndDate IsValid
1  1  A  this is first version of A member  2010101  20100131  0
2  1  A  this is second version of A member  2010201  99991231  0
3  1  A  this is third version of A member  2010301  99991231  1
4  4  B  this is first version of B member  2010101  20100131  0
5  4  B  this is second version of B member  2010201  20100228  0
6  4  B  this is third version of B member  2010301  99991231  1
Where:
 SurrKey = surrogate key = PK of DimTable;
 KeyDurable = durable surrogate key which identiffies each dimension member
(but not its concrete version);
 Code = natural key = business key from source table;
 StartDate = start date of validity of concrete version of dimension member;
 EndDate = end date of validity of concrete version of dimension member;
 IsValid = 1/0 like Yes/No = sign of current validity
There are 2 dimension members (A and B) and each of them has three versions
(first, second and third) and just now is valid the third version of them.
So till now this is the basic SCD Type 2 for dimension attributes like Description.
But my members will vary not only through its Description attribute,
but also it can split during time into more parts. Every dimension member
can split.
So here is my question: How can I do this?
For example Athird version (SurrKey=3) will some day
split into two "sons" like A1 (SurrKey=7) and A2 (SurrKey=8)
[these are two "normal" dimension members, each of them can have its own history,
but they "belong" by its origin to A member].
And then A2 will split into three "sons" like A2.1 (SurrKey=9),
A2.2 (SurrKey=10) and A2.3 (SurrKey=10). Etc.
So DimTable changes now into hierarchy table: so should I
add new column (ParentKeySurr or ParentKeyDurable ??  which of them, or both of them??)
into DimTable like in ragged hierarchy?
Say I have some FactTable with 3 keys and one measure:
 DimSurrKey (FK into DimTable),
 DimKeyDurable (FK into DimTable),
 DateOfTransaction (date of transaction in source system where is stored Number value),
 Number (measure which is a result of some transaction of source system).
I would like to generate some reports, where I would like to see
some SUMS (for values of Number column of FactTable)
for A member like A1 member + A2 member or
for A2 member like A2.1 member + A2.1 member + A2.1 member.
Please how to solve this?
Thank you very much.
Mirek
mirek.1 Posts : 3
Join date : 20110110
Location : Prague, Czech Republic, Europe
Re: How to split the dimension members with SCD Type 2?
One comment: why do you store the so called DimKeyDurable in your fact table?
In case of dimension splitting you are right: your dimension table will have a new column: the surrogate key of the parent dimension.
In case of dimension splitting you are right: your dimension table will have a new column: the surrogate key of the parent dimension.
gvarga Posts : 43
Join date : 20101215
Re: How to split the dimension members with SCD Type 2?
Hi gvarga,
I want to have DimSurrKey and DimKeyDurable in FactTable
because it will be easier to get correct version of member for:
1) as of ... I can use FactTable.DimSurrKey = DimTable.SurrKey
in JOIN condition of SELECT
and I will get correct CHANGING historical
version of dimension member (depending on transaction date);
2) as was ... I can use FactTable.DimKeyDurable=DimTable.KeyDurable
AND some 'calendar date' BETWEEN dimTable.StartDay AND dimTable.EndDay
in JOIN condition of SELECT
and I will get STATIC version of dimension member
which was valid at 'calendar date' (and not at date of transaction).
User can specify 'calendar date' like his "reference point of view" here.
This attitude was described by Joy Mundy at its article
http://www.rkimball.com/html/articles_search/articles2007/07012IE.html
(but he uses operational natural key plus the Row Start Date
and I want to use durable surrogate key plus the Row Start Date,
but the principe is the same  using BETWEEN for date).
Mirek
I want to have DimSurrKey and DimKeyDurable in FactTable
because it will be easier to get correct version of member for:
1) as of ... I can use FactTable.DimSurrKey = DimTable.SurrKey
in JOIN condition of SELECT
and I will get correct CHANGING historical
version of dimension member (depending on transaction date);
2) as was ... I can use FactTable.DimKeyDurable=DimTable.KeyDurable
AND some 'calendar date' BETWEEN dimTable.StartDay AND dimTable.EndDay
in JOIN condition of SELECT
and I will get STATIC version of dimension member
which was valid at 'calendar date' (and not at date of transaction).
User can specify 'calendar date' like his "reference point of view" here.
This attitude was described by Joy Mundy at its article
http://www.rkimball.com/html/articles_search/articles2007/07012IE.html
(but he uses operational natural key plus the Row Start Date
and I want to use durable surrogate key plus the Row Start Date,
but the principe is the same  using BETWEEN for date).
Mirek
mirek.1 Posts : 3
Join date : 20110110
Location : Prague, Czech Republic, Europe
Re: How to split the dimension members with SCD Type 2?
Hi Mirek,
I try to suggest the following star model:
Dim table
SurrKey  KeyDurable  Code  Description  StartDate  EndDate IsValid
1  1  A  this is first version of A member  2010101  20100131  0
2  1  A  this is second version of A member  2010201  20100228  0
3  1  A  this is third version of A member  2010301  99991231  1
4  4  B  this is first version of B member  2010101  20100131  0
5  4  B  this is second version of B member  2010201  20100228  0
6  4  B  this is third version of B member  2010301  99991231  1
Datum table with some rows
DateKey__Datum
1________20100101
2________20100102
3________20100103
32_______20100202 ..
Fact table with some rows
DateKey__DimSurrKey___Fact
1________1_____________N1
2________1_____________N2
..
32_______2_____________N30
Query1: you should sum facts from 20100101 to 20100202 by Dimensions as they were in the time of the transaction
There will be 2 rows concerning the dim A
A.First version of A ________ N1+N2+
A.Second version of A ______N30
SELECT Dim.Description, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.Description
Query2: you should count facts from 20100101 to 20100228 by Dimensions and you want to report for the common Code
A________N1+N2+..N30 (This time you will group the fact data for the Code).
SELECT Dim.code, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.code
I try to suggest the following star model:
Dim table
SurrKey  KeyDurable  Code  Description  StartDate  EndDate IsValid
1  1  A  this is first version of A member  2010101  20100131  0
2  1  A  this is second version of A member  2010201  20100228  0
3  1  A  this is third version of A member  2010301  99991231  1
4  4  B  this is first version of B member  2010101  20100131  0
5  4  B  this is second version of B member  2010201  20100228  0
6  4  B  this is third version of B member  2010301  99991231  1
Datum table with some rows
DateKey__Datum
1________20100101
2________20100102
3________20100103
32_______20100202 ..
Fact table with some rows
DateKey__DimSurrKey___Fact
1________1_____________N1
2________1_____________N2
..
32_______2_____________N30
Query1: you should sum facts from 20100101 to 20100202 by Dimensions as they were in the time of the transaction
There will be 2 rows concerning the dim A
A.First version of A ________ N1+N2+
A.Second version of A ______N30
SELECT Dim.Description, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.Description
Query2: you should count facts from 20100101 to 20100228 by Dimensions and you want to report for the common Code
A________N1+N2+..N30 (This time you will group the fact data for the Code).
SELECT Dim.code, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.code
gvarga Posts : 43
Join date : 20101215
Re: How to split the dimension members with SCD Type 2?
Do a self join on the member dimension on the KeyDurable. Make sure you filter the desired date to be between the Start Date and End Date.
Jeff Smith Posts : 471
Join date : 20090203
Re: How to split the dimension members with SCD Type 2?
Thank you very much to you gvarga for fully concrete example! It follows classic SCD Type 2.
So it suits to standard, natural, "historical", transaction date, and it works very nice.
And thank you very much to you Jeff. Your advice solves "artificial" date, that is the date
which can user select like other, ad hoc, "reference" date of interpretation od dimension members.
If I use Jeff advice to gvarga example I will get this solution:
SELECT Dim2.Description, SUM(Fact.fact)  here is Description column for second Dim table
FROM Dim dim, Fact fact, Datum datum, Dim dim2  here is second Dim table
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimSurrKey = Dim.SurrKey
AND dim.KeyDurable = dim2.KeyDurable  here is JOIN condition for self join
AND to_date ('20100215', 'YYYYMMDD')  here is date of user choice (for second version of A)
BETWEEN dim2.StartDate AND dim2.EndDate
GROUP BY Dim2.Description
And it works nice  it returns one row: total sum (N1+N2+...+N30) with description for SECOND version of A.
And if I will change '20100215' to '20100105',
I will get one row: total sum (N1+N2+...+N30) with description for FIRST version of A.
But I have one comment to it.
I can avoid using self join in SELECT
if I add DimKeyDurable column into Fact table like second foreign key to Dim table:
DimKeyDurable will be constant for january, february etc  it will be number 1 for A member
and number 4 for B member
Then I can reformulate SELECT into simplier form like this:
SELECT Dim.Description, SUM(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimKeyDurable = dim.KeyDurable
AND to_date ('20100215', 'YYYYMMDD') BETWEEN dim.StartDate AND dim.EndDate
GROUP BY Dim.Description
So SELECT is easier to write and  I hope  will be rapidly executed.
User can also imitate SCD Type 1, if he uses 'TODAY' (in Oracle SYSDATE)
instead of some concrete date (like '20100215').
But there are two minuses:
1) Fact table will be increased by using of durable key (DimKeyDurable) = 4 bytes (INT) for every dimension.
2) ETL process will be a little more complicated.
But all we have solved up to this moment was only the beginning, because
we have concerned only process of changing of dimension attributes (Description column)
but the identity of dimension member was solid (unchanged) for all the time:
The dimension member with code 'A' was persisted during changing of its Description column.
But I am interested how to solve potentional process of splitting of one member dimension (like 'A')
into say two dimension members (like A1 and A2) and later splitting A2 into three dimension members
(like A2.1, A2.2 and A2.3) etc. Then I want that my SELECT will sum measures columns from Fact table (N1 + N2 + ...)
for A + A1 + A2 etc., because A1 and A2 "belongs" to A, A1 and A2 are its "sons".
This is more complicated task and I think it can be solved only with help of some sort of hierarchy,
maybe with ragged hierarchy  so I will be forced to add new column into Dim table,
maybe SurrKeyParent (or KeyDurableParent ??). But how to continue with creating of SELECTs?
Have you any next idea?
Thanks in advance.
Mirek
So it suits to standard, natural, "historical", transaction date, and it works very nice.
And thank you very much to you Jeff. Your advice solves "artificial" date, that is the date
which can user select like other, ad hoc, "reference" date of interpretation od dimension members.
If I use Jeff advice to gvarga example I will get this solution:
SELECT Dim2.Description, SUM(Fact.fact)  here is Description column for second Dim table
FROM Dim dim, Fact fact, Datum datum, Dim dim2  here is second Dim table
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimSurrKey = Dim.SurrKey
AND dim.KeyDurable = dim2.KeyDurable  here is JOIN condition for self join
AND to_date ('20100215', 'YYYYMMDD')  here is date of user choice (for second version of A)
BETWEEN dim2.StartDate AND dim2.EndDate
GROUP BY Dim2.Description
And it works nice  it returns one row: total sum (N1+N2+...+N30) with description for SECOND version of A.
And if I will change '20100215' to '20100105',
I will get one row: total sum (N1+N2+...+N30) with description for FIRST version of A.
But I have one comment to it.
I can avoid using self join in SELECT
if I add DimKeyDurable column into Fact table like second foreign key to Dim table:
DimKeyDurable will be constant for january, february etc  it will be number 1 for A member
and number 4 for B member
Then I can reformulate SELECT into simplier form like this:
SELECT Dim.Description, SUM(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimKeyDurable = dim.KeyDurable
AND to_date ('20100215', 'YYYYMMDD') BETWEEN dim.StartDate AND dim.EndDate
GROUP BY Dim.Description
So SELECT is easier to write and  I hope  will be rapidly executed.
User can also imitate SCD Type 1, if he uses 'TODAY' (in Oracle SYSDATE)
instead of some concrete date (like '20100215').
But there are two minuses:
1) Fact table will be increased by using of durable key (DimKeyDurable) = 4 bytes (INT) for every dimension.
2) ETL process will be a little more complicated.
But all we have solved up to this moment was only the beginning, because
we have concerned only process of changing of dimension attributes (Description column)
but the identity of dimension member was solid (unchanged) for all the time:
The dimension member with code 'A' was persisted during changing of its Description column.
But I am interested how to solve potentional process of splitting of one member dimension (like 'A')
into say two dimension members (like A1 and A2) and later splitting A2 into three dimension members
(like A2.1, A2.2 and A2.3) etc. Then I want that my SELECT will sum measures columns from Fact table (N1 + N2 + ...)
for A + A1 + A2 etc., because A1 and A2 "belongs" to A, A1 and A2 are its "sons".
This is more complicated task and I think it can be solved only with help of some sort of hierarchy,
maybe with ragged hierarchy  so I will be forced to add new column into Dim table,
maybe SurrKeyParent (or KeyDurableParent ??). But how to continue with creating of SELECTs?
Have you any next idea?
Thanks in advance.
Mirek
mirek.1 Posts : 3
Join date : 20110110
Location : Prague, Czech Republic, Europe
Similar topics
» Tip #136 Adding a MiniDimension to a Bridge Table: What if you can't split the dimension?
» Dimension Attribute that has 13.5million members
» How to model a financial security dimension where we may find out members are actually the same after the fact
» Person Dimension  Split or Consolidate
» rationale behind dimension with Type 0 and missing Type 5
» Dimension Attribute that has 13.5million members
» How to model a financial security dimension where we may find out members are actually the same after the fact
» Person Dimension  Split or Consolidate
» rationale behind dimension with Type 0 and missing Type 5
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum

