Advice on relationships between Type 2 SCDs ?
3 posters
Page 1 of 1
Advice on relationships between Type 2 SCDs ?
Hi All,
I am constructing a new warehouse schema containing a number of dimensions which need to be modeled as type 2 SCDs. This is all fine and I have surrogate IDs which are being used as the FK in the facts. The challenge I have is that a number of the SCDs have relationships between them which need to be maintained to support other workflow aspects. Holding multiple versions over time makes all these relationships M:M which will make the model messy and doesn't really add anything for the functionality they support. I just need to know that two natural keys are related, not two specific versions of those keys.
I would like to employ a hybrid approach as follows:
- All SCDs have a primary key = surrogate id and a unique key = natural key + version
- Relationships to the fact tables use the surrogate id as best practice dictates
- Relationships between SCDs use the natural key + version instead. This allows for the following different joins to be made:
1. "As was" - join on natural key + version (the version will be the one active when the relationship was made)
2. "As is" - join on natural key + latest flag
3. "As at" - join on natural key + point in time between effective from and to
I've used this "unversioned" relationship approach before but I was just wondering if anyone has any thoughts about combining it with the surrogate approach in the same schema ?
Thanks,
Andy.
I am constructing a new warehouse schema containing a number of dimensions which need to be modeled as type 2 SCDs. This is all fine and I have surrogate IDs which are being used as the FK in the facts. The challenge I have is that a number of the SCDs have relationships between them which need to be maintained to support other workflow aspects. Holding multiple versions over time makes all these relationships M:M which will make the model messy and doesn't really add anything for the functionality they support. I just need to know that two natural keys are related, not two specific versions of those keys.
I would like to employ a hybrid approach as follows:
- All SCDs have a primary key = surrogate id and a unique key = natural key + version
- Relationships to the fact tables use the surrogate id as best practice dictates
- Relationships between SCDs use the natural key + version instead. This allows for the following different joins to be made:
1. "As was" - join on natural key + version (the version will be the one active when the relationship was made)
2. "As is" - join on natural key + latest flag
3. "As at" - join on natural key + point in time between effective from and to
I've used this "unversioned" relationship approach before but I was just wondering if anyone has any thoughts about combining it with the surrogate approach in the same schema ?
Thanks,
Andy.
Andy- Posts : 2
Join date : 2011-01-24
Re: Advice on relationships between Type 2 SCDs ?
Why not to use the same structure ( SCD2) for the intersection table ?
There are 2 dimension tables
Adimension table
Key ( Surrogate key)
StartDate
EndDate
(and NaturalKey …..)
Bdimension table
Key ( Surrogate key)
StartDate
EndDate
AB intersection table
A_Key
B_Key
StartDate
EndDate
Of course you have to maintain during the ETL process properly the dates of the intersection rows as well.
An example
Adimension table
1--- 01 May 2010---31 May 2010
2----01 June 2010
Bdimension table
1--- 01 May 2010---10 May 2010
2----11 May 2010
AB table
1---1---01 May 2010---10 May 2010
1---2---11 May 2010---31 May 2010
2---2---01 June 2010
„As is”: EndDate is null
„As was (as at)”: date between StartDate and EndDate
There are 2 dimension tables
Adimension table
Key ( Surrogate key)
StartDate
EndDate
(and NaturalKey …..)
Bdimension table
Key ( Surrogate key)
StartDate
EndDate
AB intersection table
A_Key
B_Key
StartDate
EndDate
Of course you have to maintain during the ETL process properly the dates of the intersection rows as well.
An example
Adimension table
1--- 01 May 2010---31 May 2010
2----01 June 2010
Bdimension table
1--- 01 May 2010---10 May 2010
2----11 May 2010
AB table
1---1---01 May 2010---10 May 2010
1---2---11 May 2010---31 May 2010
2---2---01 June 2010
„As is”: EndDate is null
„As was (as at)”: date between StartDate and EndDate
gvarga- Posts : 43
Join date : 2010-12-15
Re: Advice on relationships between Type 2 SCDs ?
Many thanks for the detailed reply. I did consider that approach but my only concern is the proliferation of rows in cases where the A & B dimensions perhaps aren't quite as slowly changing as I'd like ! The assertion I would like to make is that a natural key of A is related to a natural key of B regardless of the number of versions over time there happen to be in A or B. Your solution will certainly provide this but I am looking at the natural key alternative to simplify things.
I know surrogates are the best practice for fact - dimensions relationships for obvious reasons but I'm keen to explore using naturals elsewhere as the impact of a data migration can be contained within the relatively low volume world of the dimensions, leaving the facts safe in their surrogate FK world.
I know surrogates are the best practice for fact - dimensions relationships for obvious reasons but I'm keen to explore using naturals elsewhere as the impact of a data migration can be contained within the relatively low volume world of the dimensions, leaving the facts safe in their surrogate FK world.
Andy- Posts : 2
Join date : 2011-01-24
Re: Advice on relationships between Type 2 SCDs ?
Normally the relationship between two SCD dimensions is reflected in the fact table, meaning you have FKs from both dimensions in, say daily transaction grain fact. Trying to have referential relationship between two SCDs having their surrogate keys in dimensions is complicated and confusing to users. Let's not say you can't ever snowflake a dimension by another dimension. In some circumstances, you could link an SCD by an outrigger dimension which could be just a static lookup table or a mini-dimension. Outrigger dimensions are normally not full-blown SCDs.
If performance is an issue when joining to a big daily grain fact table, then create a fact-less fact dedicated to keeping the m-m relationship between tow SCDs. The fact-less fact table for this purpose will only be inserted when there is a change in either SCDs, and therefore should be relatively smaller compared to the daily fact.
If performance is an issue when joining to a big daily grain fact table, then create a fact-less fact dedicated to keeping the m-m relationship between tow SCDs. The fact-less fact table for this purpose will only be inserted when there is a change in either SCDs, and therefore should be relatively smaller compared to the daily fact.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Why do I need type 3 and 6 SCDs when I can implement type 7?
» Factless Fact table to model 1:M relationships between Type 2 SCD
» Type 2 dimension or type 2 column?
» Type-2 Dates as Date Data Type ?
» rationale behind dimension with Type 0 and missing Type 5
» Factless Fact table to model 1:M relationships between Type 2 SCD
» Type 2 dimension or type 2 column?
» Type-2 Dates as Date Data Type ?
» rationale behind dimension with Type 0 and missing Type 5
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum