How to split the dimension members with SCD Type 2?
3 posters
Page 1 of 1
How to split the dimension members with SCD Type 2?
Hi all,
I want to create dimension table, where will be some members
and each of them has its own history (so it has own versions in time),
where at one time is one and only one always valid.
So it is commonly solved by SCD Type 2.
For example DimTable:
SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid
1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0
2 | 1 | A | this is second version of A member | 2010201 | 99991231 | 0
3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1
4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0
5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0
6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1
Where:
- SurrKey = surrogate key = PK of DimTable;
- KeyDurable = durable surrogate key which identiffies each dimension member
(but not its concrete version);
- Code = natural key = business key from source table;
- StartDate = start date of validity of concrete version of dimension member;
- EndDate = end date of validity of concrete version of dimension member;
- IsValid = 1/0 like Yes/No = sign of current validity
There are 2 dimension members (A and B) and each of them has three versions
(first, second and third) and just now is valid the third version of them.
So till now this is the basic SCD Type 2 for dimension attributes like Description.
But my members will vary not only through its Description attribute,
but also it can split during time into more parts. Every dimension member
can split.
So here is my question: How can I do this?
For example A-third version (SurrKey=3) will some day
split into two "sons" like A1 (SurrKey=7) and A2 (SurrKey=8)
[these are two "normal" dimension members, each of them can have its own history,
but they "belong" by its origin to A member].
And then A2 will split into three "sons" like A2.1 (SurrKey=9),
A2.2 (SurrKey=10) and A2.3 (SurrKey=10). Etc.
So DimTable changes now into hierarchy table: so should I
add new column (ParentKeySurr or ParentKeyDurable ?? - which of them, or both of them??)
into DimTable like in ragged hierarchy?
Say I have some FactTable with 3 keys and one measure:
- DimSurrKey (FK into DimTable),
- DimKeyDurable (FK into DimTable),
- DateOfTransaction (date of transaction in source system where is stored Number value),
- Number (measure which is a result of some transaction of source system).
I would like to generate some reports, where I would like to see
some SUMS (for values of Number column of FactTable)
for A member like A1 member + A2 member or
for A2 member like A2.1 member + A2.1 member + A2.1 member.
Please how to solve this?
Thank you very much.
Mirek
I want to create dimension table, where will be some members
and each of them has its own history (so it has own versions in time),
where at one time is one and only one always valid.
So it is commonly solved by SCD Type 2.
For example DimTable:
SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid
1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0
2 | 1 | A | this is second version of A member | 2010201 | 99991231 | 0
3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1
4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0
5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0
6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1
Where:
- SurrKey = surrogate key = PK of DimTable;
- KeyDurable = durable surrogate key which identiffies each dimension member
(but not its concrete version);
- Code = natural key = business key from source table;
- StartDate = start date of validity of concrete version of dimension member;
- EndDate = end date of validity of concrete version of dimension member;
- IsValid = 1/0 like Yes/No = sign of current validity
There are 2 dimension members (A and B) and each of them has three versions
(first, second and third) and just now is valid the third version of them.
So till now this is the basic SCD Type 2 for dimension attributes like Description.
But my members will vary not only through its Description attribute,
but also it can split during time into more parts. Every dimension member
can split.
So here is my question: How can I do this?
For example A-third version (SurrKey=3) will some day
split into two "sons" like A1 (SurrKey=7) and A2 (SurrKey=8)
[these are two "normal" dimension members, each of them can have its own history,
but they "belong" by its origin to A member].
And then A2 will split into three "sons" like A2.1 (SurrKey=9),
A2.2 (SurrKey=10) and A2.3 (SurrKey=10). Etc.
So DimTable changes now into hierarchy table: so should I
add new column (ParentKeySurr or ParentKeyDurable ?? - which of them, or both of them??)
into DimTable like in ragged hierarchy?
Say I have some FactTable with 3 keys and one measure:
- DimSurrKey (FK into DimTable),
- DimKeyDurable (FK into DimTable),
- DateOfTransaction (date of transaction in source system where is stored Number value),
- Number (measure which is a result of some transaction of source system).
I would like to generate some reports, where I would like to see
some SUMS (for values of Number column of FactTable)
for A member like A1 member + A2 member or
for A2 member like A2.1 member + A2.1 member + A2.1 member.
Please how to solve this?
Thank you very much.
Mirek
mirek.1- Posts : 3
Join date : 2011-01-10
Location : Prague, Czech Republic, Europe
Re: How to split the dimension members with SCD Type 2?
One comment: why do you store the so called DimKeyDurable in your fact table?
In case of dimension splitting you are right: your dimension table will have a new column: the surrogate key of the parent dimension.
In case of dimension splitting you are right: your dimension table will have a new column: the surrogate key of the parent dimension.
gvarga- Posts : 43
Join date : 2010-12-15
Re: How to split the dimension members with SCD Type 2?
Hi gvarga,
I want to have DimSurrKey and DimKeyDurable in FactTable
because it will be easier to get correct version of member for:
1) as of ... I can use FactTable.DimSurrKey = DimTable.SurrKey
in JOIN condition of SELECT
and I will get correct CHANGING historical
version of dimension member (depending on transaction date);
2) as was ... I can use FactTable.DimKeyDurable=DimTable.KeyDurable
AND some 'calendar date' BETWEEN dimTable.StartDay AND dimTable.EndDay
in JOIN condition of SELECT
and I will get STATIC version of dimension member
which was valid at 'calendar date' (and not at date of transaction).
User can specify 'calendar date' like his "reference point of view" here.
This attitude was described by Joy Mundy at its article
http://www.rkimball.com/html/articles_search/articles2007/07012IE.html
(but he uses operational natural key plus the Row Start Date
and I want to use durable surrogate key plus the Row Start Date,
but the principe is the same - using BETWEEN for date).
Mirek
I want to have DimSurrKey and DimKeyDurable in FactTable
because it will be easier to get correct version of member for:
1) as of ... I can use FactTable.DimSurrKey = DimTable.SurrKey
in JOIN condition of SELECT
and I will get correct CHANGING historical
version of dimension member (depending on transaction date);
2) as was ... I can use FactTable.DimKeyDurable=DimTable.KeyDurable
AND some 'calendar date' BETWEEN dimTable.StartDay AND dimTable.EndDay
in JOIN condition of SELECT
and I will get STATIC version of dimension member
which was valid at 'calendar date' (and not at date of transaction).
User can specify 'calendar date' like his "reference point of view" here.
This attitude was described by Joy Mundy at its article
http://www.rkimball.com/html/articles_search/articles2007/07012IE.html
(but he uses operational natural key plus the Row Start Date
and I want to use durable surrogate key plus the Row Start Date,
but the principe is the same - using BETWEEN for date).
Mirek
mirek.1- Posts : 3
Join date : 2011-01-10
Location : Prague, Czech Republic, Europe
Re: How to split the dimension members with SCD Type 2?
Hi Mirek,
I try to suggest the following star model:
Dim table
SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid
1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0
2 | 1 | A | this is second version of A member | 2010201 | 20100228 | 0
3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1
4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0
5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0
6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1
Datum table with some rows
DateKey__Datum
1________20100101
2________20100102
3________20100103
32_______20100202 ..
Fact table with some rows
DateKey__DimSurrKey___Fact
1________1_____________N1
2________1_____________N2
..
32_______2_____________N30
Query1: you should sum facts from 20100101 to 20100202 by Dimensions as they were in the time of the transaction
There will be 2 rows concerning the dim A
A.First version of A ________ N1+N2+
A.Second version of A ______N30
SELECT Dim.Description, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.Description
Query2: you should count facts from 20100101 to 20100228 by Dimensions and you want to report for the common Code
A________N1+N2+..N30 (This time you will group the fact data for the Code).
SELECT Dim.code, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.code
I try to suggest the following star model:
Dim table
SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid
1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0
2 | 1 | A | this is second version of A member | 2010201 | 20100228 | 0
3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1
4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0
5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0
6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1
Datum table with some rows
DateKey__Datum
1________20100101
2________20100102
3________20100103
32_______20100202 ..
Fact table with some rows
DateKey__DimSurrKey___Fact
1________1_____________N1
2________1_____________N2
..
32_______2_____________N30
Query1: you should sum facts from 20100101 to 20100202 by Dimensions as they were in the time of the transaction
There will be 2 rows concerning the dim A
A.First version of A ________ N1+N2+
A.Second version of A ______N30
SELECT Dim.Description, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.Description
Query2: you should count facts from 20100101 to 20100228 by Dimensions and you want to report for the common Code
A________N1+N2+..N30 (This time you will group the fact data for the Code).
SELECT Dim.code, sum(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')
and Fact.DateKey = Datum.dateKey
and Fact.DimSurrKey = Dim.SurrKey
GROUP by Dim.code
gvarga- Posts : 43
Join date : 2010-12-15
Re: How to split the dimension members with SCD Type 2?
Do a self join on the member dimension on the KeyDurable. Make sure you filter the desired date to be between the Start Date and End Date.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: How to split the dimension members with SCD Type 2?
Thank you very much to you gvarga for fully concrete example! It follows classic SCD Type 2.
So it suits to standard, natural, "historical", transaction date, and it works very nice.
And thank you very much to you Jeff. Your advice solves "artificial" date, that is the date
which can user select like other, ad hoc, "reference" date of interpretation od dimension members.
If I use Jeff advice to gvarga example I will get this solution:
SELECT Dim2.Description, SUM(Fact.fact) -- here is Description column for second Dim table
FROM Dim dim, Fact fact, Datum datum, Dim dim2 -- here is second Dim table
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimSurrKey = Dim.SurrKey
AND dim.KeyDurable = dim2.KeyDurable -- here is JOIN condition for self join
AND to_date ('20100215', 'YYYYMMDD') -- here is date of user choice (for second version of A)
BETWEEN dim2.StartDate AND dim2.EndDate
GROUP BY Dim2.Description
And it works nice - it returns one row: total sum (N1+N2+...+N30) with description for SECOND version of A.
And if I will change '20100215' to '20100105',
I will get one row: total sum (N1+N2+...+N30) with description for FIRST version of A.
But I have one comment to it.
I can avoid using self join in SELECT
if I add DimKeyDurable column into Fact table like second foreign key to Dim table:
DimKeyDurable will be constant for january, february etc - it will be number 1 for A member
and number 4 for B member
Then I can reformulate SELECT into simplier form like this:
SELECT Dim.Description, SUM(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimKeyDurable = dim.KeyDurable
AND to_date ('20100215', 'YYYYMMDD') BETWEEN dim.StartDate AND dim.EndDate
GROUP BY Dim.Description
So SELECT is easier to write and - I hope - will be rapidly executed.
User can also imitate SCD Type 1, if he uses 'TODAY' (in Oracle SYSDATE)
instead of some concrete date (like '20100215').
But there are two minuses:
1) Fact table will be increased by using of durable key (DimKeyDurable) = 4 bytes (INT) for every dimension.
2) ETL process will be a little more complicated.
But all we have solved up to this moment was only the beginning, because
we have concerned only process of changing of dimension attributes (Description column)
but the identity of dimension member was solid (unchanged) for all the time:
The dimension member with code 'A' was persisted during changing of its Description column.
But I am interested how to solve potentional process of splitting of one member dimension (like 'A')
into say two dimension members (like A1 and A2) and later splitting A2 into three dimension members
(like A2.1, A2.2 and A2.3) etc. Then I want that my SELECT will sum measures columns from Fact table (N1 + N2 + ...)
for A + A1 + A2 etc., because A1 and A2 "belongs" to A, A1 and A2 are its "sons".
This is more complicated task and I think it can be solved only with help of some sort of hierarchy,
maybe with ragged hierarchy - so I will be forced to add new column into Dim table,
maybe SurrKeyParent (or KeyDurableParent ??). But how to continue with creating of SELECTs?
Have you any next idea?
Thanks in advance.
Mirek
So it suits to standard, natural, "historical", transaction date, and it works very nice.
And thank you very much to you Jeff. Your advice solves "artificial" date, that is the date
which can user select like other, ad hoc, "reference" date of interpretation od dimension members.
If I use Jeff advice to gvarga example I will get this solution:
SELECT Dim2.Description, SUM(Fact.fact) -- here is Description column for second Dim table
FROM Dim dim, Fact fact, Datum datum, Dim dim2 -- here is second Dim table
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimSurrKey = Dim.SurrKey
AND dim.KeyDurable = dim2.KeyDurable -- here is JOIN condition for self join
AND to_date ('20100215', 'YYYYMMDD') -- here is date of user choice (for second version of A)
BETWEEN dim2.StartDate AND dim2.EndDate
GROUP BY Dim2.Description
And it works nice - it returns one row: total sum (N1+N2+...+N30) with description for SECOND version of A.
And if I will change '20100215' to '20100105',
I will get one row: total sum (N1+N2+...+N30) with description for FIRST version of A.
But I have one comment to it.
I can avoid using self join in SELECT
if I add DimKeyDurable column into Fact table like second foreign key to Dim table:
DimKeyDurable will be constant for january, february etc - it will be number 1 for A member
and number 4 for B member
Then I can reformulate SELECT into simplier form like this:
SELECT Dim.Description, SUM(Fact.fact)
FROM Dim dim, Fact fact, Datum datum
WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')
AND Fact.DateKey = Datum.dateKey
AND Fact.DimKeyDurable = dim.KeyDurable
AND to_date ('20100215', 'YYYYMMDD') BETWEEN dim.StartDate AND dim.EndDate
GROUP BY Dim.Description
So SELECT is easier to write and - I hope - will be rapidly executed.
User can also imitate SCD Type 1, if he uses 'TODAY' (in Oracle SYSDATE)
instead of some concrete date (like '20100215').
But there are two minuses:
1) Fact table will be increased by using of durable key (DimKeyDurable) = 4 bytes (INT) for every dimension.
2) ETL process will be a little more complicated.
But all we have solved up to this moment was only the beginning, because
we have concerned only process of changing of dimension attributes (Description column)
but the identity of dimension member was solid (unchanged) for all the time:
The dimension member with code 'A' was persisted during changing of its Description column.
But I am interested how to solve potentional process of splitting of one member dimension (like 'A')
into say two dimension members (like A1 and A2) and later splitting A2 into three dimension members
(like A2.1, A2.2 and A2.3) etc. Then I want that my SELECT will sum measures columns from Fact table (N1 + N2 + ...)
for A + A1 + A2 etc., because A1 and A2 "belongs" to A, A1 and A2 are its "sons".
This is more complicated task and I think it can be solved only with help of some sort of hierarchy,
maybe with ragged hierarchy - so I will be forced to add new column into Dim table,
maybe SurrKeyParent (or KeyDurableParent ??). But how to continue with creating of SELECTs?
Have you any next idea?
Thanks in advance.
Mirek
mirek.1- Posts : 3
Join date : 2011-01-10
Location : Prague, Czech Republic, Europe
Similar topics
» Tip #136 Adding a Mini-Dimension to a Bridge Table: What if you can't split the dimension?
» Dimension Attribute that has 13.5million members
» How to model a financial security dimension where we may find out members are actually the same after the fact
» Person Dimension - Split or Consolidate
» rationale behind dimension with Type 0 and missing Type 5
» Dimension Attribute that has 13.5million members
» How to model a financial security dimension where we may find out members are actually the same after the fact
» Person Dimension - Split or Consolidate
» rationale behind dimension with Type 0 and missing Type 5
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|