Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Dealing with deduplication, merging records, etc in the warehouse layer

2 posters

Go down

Dealing with deduplication, merging records, etc in the warehouse layer Empty Dealing with deduplication, merging records, etc in the warehouse layer

Post  Juan Tue Dec 06, 2011 1:32 pm

Hi,

From my experience and Kimball ETL Architecture training I know that there are (not cast in concrete) 4 layers:
1) Staging 2) Transform 3) Conform 4) Warehouse (might be named differently by others)
The dedup/merging/cleaning, etc are all done in the first 8 subsystems ie. the first two layers.

What are the pros and cons of doing client merging, client deduplication, and client key management in the warehouse layer and not in the "kitchen" layer?

Your opinions will be appreciated

Thanks

Juan

Posts : 1
Join date : 2010-07-15
Location : Cape Town, ZA

Back to top Go down

Dealing with deduplication, merging records, etc in the warehouse layer Empty Re: Dealing with deduplication, merging records, etc in the warehouse layer

Post  ngalemmo Tue Dec 06, 2011 8:25 pm

I am not a proponent of merging, de-duping in any layer other than at the reporting level. Key management (if you mean the assignment of dimension keys) occurs in the transformation process prior to loading into the warehouse.

It all has to do with business keys. Clients in the source system are identified by a business key (whatever form that takes). The transactional source, which would contain the business key, needs to be related to the dimension using through the dimension's surrogate PK. It is important that the relationship between a fact and dimension is based on the natural key in the transaction rather than base it on some "merged" picture of the client. This correctly reflects the true nature of the relationship under which the transaction took place.

The "merged" client is simply a representation of a client. You handle this by including additional attributes to reflect merged client. From an end-user standpoint, you can present them with a merged view or an as-is view or both based on the attributes you decide to expose. Also, if the merge associations change (as the merge rules change) you can easily reflect the new merged view by simply updating the merged view attributes.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum