Conforming Dimensions - Standardising, De-duplicating and Suvivorship
3 posters
Page 1 of 1
Conforming Dimensions - Standardising, De-duplicating and Suvivorship
Hi,
I'm currently reading the DW ETL toolkit, which seems to have some excellent ideas. However, as it doesn't come with any downloads (e.g example SSIS packages) - I'm struggling to understand a few things. If anyone can answer the following it would be most appreciated:
1)2 sources for a dimenion interests me. Am I right in understanding that the issues here are essentially that we could have an attribute (e.g. customer location for the Customer Dim) in both data sources. Therefore, we use surviorship here to favour one data source (for that attribute only) over the other data source?
2)The Kimball surviviroship support meta data (pg 159) - is this something that someone must populate manually, or can it be populated automatically during the ETL?
If anyone does know of any examples of this on the web - that would be most appreciated also
Thanks
John
I'm currently reading the DW ETL toolkit, which seems to have some excellent ideas. However, as it doesn't come with any downloads (e.g example SSIS packages) - I'm struggling to understand a few things. If anyone can answer the following it would be most appreciated:
1)2 sources for a dimenion interests me. Am I right in understanding that the issues here are essentially that we could have an attribute (e.g. customer location for the Customer Dim) in both data sources. Therefore, we use surviorship here to favour one data source (for that attribute only) over the other data source?
2)The Kimball surviviroship support meta data (pg 159) - is this something that someone must populate manually, or can it be populated automatically during the ETL?
If anyone does know of any examples of this on the web - that would be most appreciated also
Thanks
John
johnryan- Posts : 3
Join date : 2009-09-23
Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship
Depends. If the addresses have the same meaning, i.e. they are both billing addresses then I would apply a survivorship rule. If the addresses represent different meanings, i.e. shipping and billing, I would store both addresses.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship
Thanks for the reply.
So in the case that they are both billing address, is the surviorship rule as simple as just saying always take the address from Data Source A?
Also - I'm still unsure - The Kimball surviviroship support meta data (pg 159) - is this something that someone must populate manually, or can it be populated automatically during the ETL?
So in the case that they are both billing address, is the surviorship rule as simple as just saying always take the address from Data Source A?
Also - I'm still unsure - The Kimball surviviroship support meta data (pg 159) - is this something that someone must populate manually, or can it be populated automatically during the ETL?
johnryan- Posts : 3
Join date : 2009-09-23
Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship
It could be source A, it could be most recent, it could be composite, it could be the validated address. There are all sorts of survivorship rules you can define.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship
The issues you are bringing up are what Master Data Management (MDM) tools are all about. Things can get very complicated very quicky which is why most MDM solutions use rules based engines to evaluate the sources and figure out what to do. I would only attempt to do such tasks in ETL if they are simple and very clearly defined by the business. Also, if I was to do it in ETL, I would design the process where no data is destroyed so that the inevitable "gee... did we say do that?" issues can be resolved without a lot of pain.
Even with an MDM system, you should expect some level of error as well as ongoing reworking of the rules which may adversly affect the DW content. As a rule, where I have multiple independent sources for a dimension, I maintain rows for each source (to retain source identity) and use attributes to 'integrate' enities across sources.
Even with an MDM system, you should expect some level of error as well as ongoing reworking of the rules which may adversly affect the DW content. As a rule, where I have multiple independent sources for a dimension, I maintain rows for each source (to retain source identity) and use attributes to 'integrate' enities across sources.
Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship
Thanks for the replies,
I'd presumed that this is all possible without MDM - but I guess then it can get to a point where it just gets too complex without MDM.
As a final request is anyone aware of any downloadable examples of this implemented within the ETL? E.g. an SSIS package?
I'd presumed that this is all possible without MDM - but I guess then it can get to a point where it just gets too complex without MDM.
As a final request is anyone aware of any downloadable examples of this implemented within the ETL? E.g. an SSIS package?
johnryan- Posts : 3
Join date : 2009-09-23
Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship
It is possible without a MDM application. In fact I read the other day that the majority of MDM/CDI implementations are in-house solutions.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA

» Joining facts through conforming dimensions
» conforming your nonconformed data
» De-duplicating, DW and MDM
» Duplicating Dimension Attributes
» Duplicating a field in more than one fact table.
» conforming your nonconformed data
» De-duplicating, DW and MDM
» Duplicating Dimension Attributes
» Duplicating a field in more than one fact table.
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|