Match merging
3 posters
Page 1 of 1
Match merging
I have a scenario where on the source system side when any of the attributes of the customer change, a new key is generated but when we bring this information into data warehouse : these multiple customers are merged and assigned one unique customer id(cross reference table), which eventually gets populated onto the dimension table. How is this typically handled in modeling world like best practices or standard design patterns? Is this the standard way of how we implement this?
Source
cust_id Cust_Name City State
100001 A Fremont CA
100002 A SFO CA
DW_MERGE_DS
DW_CUST_ID SRC_CUST_ID
110 100001
110 100002
Customer dimension
Customer_id SRC_CUST_ID MRGD_CUST_ID CUST_NAME CITY STATE
1 100001 110 A FREMONT CA
2 100002 110 A SFO CA
Source
cust_id Cust_Name City State
100001 A Fremont CA
100002 A SFO CA
DW_MERGE_DS
DW_CUST_ID SRC_CUST_ID
110 100001
110 100002
Customer dimension
Customer_id SRC_CUST_ID MRGD_CUST_ID CUST_NAME CITY STATE
1 100001 110 A FREMONT CA
2 100002 110 A SFO CA
rajeshwarr59- Posts : 21
Join date : 2015-06-26
Re: Match merging
You are on the right track. Finding a master record or golden record for a same person from many sources has following steps:
1. Load all records into a table.
2. Use some logic to identify the golden or master record.
3. Flag that golden or master record.
4. Use that golden or master record in a dim.
1. Load all records into a table.
2. Use some logic to identify the golden or master record.
3. Flag that golden or master record.
4. Use that golden or master record in a dim.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Match merging
It really depends on what kind of application this is supporting. For example, in retail and marketing it is you have a 'purchaser' and a 'customer/consumer'. A purchaser is identified by the raw data, and the customer is derived through other systems based on how they were identified in the transaction. This would come from something much like an MDM type system.
So, you would wind up collecting both kinds of data.
You may also consider this as well for a typical corporate data warehouse. One important consideration is MDM systems change their minds. The association of customers from disparate systems is based on a set of rules established by the business. These rules are subject to change, therefore, the association between real customer and presumed customer may change. If you associate facts with the presumed (MDM, or 'golden' customer) those facts will become historically inaccurate if the association changes. It is better to associate the fact to the true customer, using the natural key from the particular source, and maintain the relationship as a bridge table between the true customer and 'golden' customer. You typically use the cross reference from the MDM system to populate the bridge.
So, you would wind up collecting both kinds of data.
You may also consider this as well for a typical corporate data warehouse. One important consideration is MDM systems change their minds. The association of customers from disparate systems is based on a set of rules established by the business. These rules are subject to change, therefore, the association between real customer and presumed customer may change. If you associate facts with the presumed (MDM, or 'golden' customer) those facts will become historically inaccurate if the association changes. It is better to associate the fact to the true customer, using the natural key from the particular source, and maintain the relationship as a bridge table between the true customer and 'golden' customer. You typically use the cross reference from the MDM system to populate the bridge.
Re: Match merging
Thank you very much.
How does this work when it's not an MDM system we are talking about? Assume I have a dimension table with mixed granularity(master_dim which refers back to another dimension table:child_dim). These are both type 2 dimensions. Does the approach you outlined earlier still holds true ? Or what would be the recommended approach in these scenarios? Store the surrogate key of the child dimension table or natural key of the child dimension table on this other master_dimension table? In the below example I have foreign key(prod_cat_id) which refers back to the child_dim table. And the child_dim table would capture the change history(versioning changes), so do we typically have the surrogate key of the child table in the master dim table or do we store the natural key(of child_dim table) in master_dim table?
Master_dim
ORD_ID PROD_CAT_ID AMT
100 1 24
101 2 48
102 3 24
Child_Dim
PROD_CAT_ID PROD_CAT_CD PROD_CAT_DESC
1 10 Beauty
2 20 Mineral Supplements
3 30 Beauty
How does this work when it's not an MDM system we are talking about? Assume I have a dimension table with mixed granularity(master_dim which refers back to another dimension table:child_dim). These are both type 2 dimensions. Does the approach you outlined earlier still holds true ? Or what would be the recommended approach in these scenarios? Store the surrogate key of the child dimension table or natural key of the child dimension table on this other master_dimension table? In the below example I have foreign key(prod_cat_id) which refers back to the child_dim table. And the child_dim table would capture the change history(versioning changes), so do we typically have the surrogate key of the child table in the master dim table or do we store the natural key(of child_dim table) in master_dim table?
Master_dim
ORD_ID PROD_CAT_ID AMT
100 1 24
101 2 48
102 3 24
Child_Dim
PROD_CAT_ID PROD_CAT_CD PROD_CAT_DESC
1 10 Beauty
2 20 Mineral Supplements
3 30 Beauty
rajeshwarr59- Posts : 21
Join date : 2015-06-26
Similar topics
» Analyzing measures from two fact tables where records do not necessarily match due to SCD2 changes.
» Merging of dimensions?
» Merging customer data from disparate sources to create a master customer dimension
» Merging two Data Mart tables
» Dealing with deduplication, merging records, etc in the warehouse layer
» Merging of dimensions?
» Merging customer data from disparate sources to create a master customer dimension
» Merging two Data Mart tables
» Dealing with deduplication, merging records, etc in the warehouse layer
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum