Customer Dimension from multiple systems

felix.c Mon Jan 11, 2016 9:02 pm

Hi,

I need to integrate three live core systems and currently modeling the customer dimension. During the physical design I repeatedly ask myself do I need to create a source system ID (surrogate key to identify source system) column in my customer dimension (unique constraint SourceSystem_ID, Customer_BusinessKey) and outrigger to a source system dimension or should I just not generate the key and go directly to load into customer dimension table? I am not too sure what is the best approach in terms of integrating common data from multiple sources.

If ignore source system ID is preferred before committing the customer data from all three sources should I be running some kind of deduplicate process (only 1 distinct customer) or add them individually (3 customers)? Assumed we only allow 1 active residential customer exist across all core systems however corporate and wholesale customers will most likely to void the assumption.

Customer have static hierarchy: “Customer --> Customer Group --> Business Unit --> Master Business Unit”. Should I denormalised this hierarchy and either add these attributes in customer dimension or break the business unit to a separate dimension and add a new FK in the fact table (1 Customer FK, 1 Business Unit FK).

I was also thinking if using the above source system ID idea I assumed this will appear in all dimensions and facts within the data warehouse?

Thank you in advance.

nick_white Wed Jan 13, 2016 8:57 am

I guess it depends on what your requirements are
1. If you want to be able to report on the source system then you'll need to include it in your model
2. If the natural keys across the three core systems are not unique then you'll need to include the source system in order to create a unique key - to allow your ETL processes to update the correct dimension record
3. If you are building some form of MDM system (de-duplicating your customers) then you'll need the system ID as part of your ETL design, even if you don't include it in your data model.

Whether you try and de-duplicate your customers depends on your requirements and how your business expects to see the data. For example, if each source system has 100 customer records and your business therefore thinks they have 300 customers then building a model that de-duplicates them and will show in reports that you only have 250 (unique) customers will just cause confusion. So the de-duplication process and its impact need to be understood and agreed by everyone impacted by it before it is implemented.

Hierarchies: I normally find that the design of hierarchies is constrained by the needs of the BI tool that is going to traverse them. Given your particular example, I would at least consider the following:
1. If you create aggregates at one of the higher levels in this hierarchy then you're going to need a Dimension at that level anyway e.g. if you aggregate a fact table up from the customer level to the BU level you'll need a BU dimension
2. How often will a hierarchy change and how do you want to handle the change?
a) Only affects new customers
b) Affects all customers but they only report using the new hierarchy
c) Affects all customers and you need to report on the hierarchy that existed at the time the fact occurred
Having all the hierarchical data in the customer dimension may make reporting simple but could make hierarchical updates very complex - in which case it may be simpler to split the hierarchy out into its own table

Source System ID in all tables: I'd only include it if it's needed to load the data (as in the customer dimension). However, I would include it in an Audit dimension and have an audit key on every table

ngalemmo Wed Jan 13, 2016 2:43 pm

From an integration standpoint, I prefer to maintain separate dimension rows for each sourced customer from each system using natural keys appropriate for that particular source system. The dimension would contain 'raw' (as they appear in the source system) and 'standard' (cleaned, conforming) attributes. The standard attributes would be used to consolidate facts across sources, while the existence of multiple dimension rows ensures stability in the relationship maintained by the source systems.

Dimensional maintenance is a bit more challenging as you would apply standard attribute values across all rows for the same customer among the different sources. This is typically done using an alternate natural key to the table that identifies the standardized customer.

felix.c Wed Jan 13, 2016 5:18 pm

Thank you for the replies!

I have decided to denormalised the BU out from Customer dim as this do have more pros than cons in our BI tool environment (Tableau and SSAS Tabular). After having some discussion internally because the 3 systems represent it's own BU it does make sense to do what is described from both response above and treat them as individual customers including the churned-returnings.

The customer natural keys from all 3 systems maybe not unique and I can't guaranteed so probably safer to use system ID as part of the customer natural keys. Somewhere down the road I can see there will be a requirement to report at customer level and their related account number on which system. Do you recommend a mini SystemPlatform dim FK to customer dim or to the fact?

Thanks.

ngalemmo Wed Jan 13, 2016 6:35 pm

Yes, source system id needs to be part of the natural key. As far as a source system dimension goes, that may be overkill. Storing a small code as a degenerate dimension in facts is usually all you need to do. If you need more information having a small reference table keyed by the code with descriptive information (such as source system name) should suffice.

However, I would like to stress that having standardized attributes in the customer dimension is very useful for consolidation. Queries use attributes to group and summarize data, not keys. If you do transformations to create standardized values, its always a good idea to store the original source value as a backup, should the transformation rules change. You need not expose the raw attributes to end-users if they do not need access to them.

felix.c Wed Jan 13, 2016 8:03 pm

I will take your advise and add a degenerate dim in the fact and create single string natural key something like "|" for customer, "|"..etc for other dimensions.

With your standardized attributes it does a lot of sense to have this in place but reality is a complex challenge to implement especially customer base where you can possibly have two or more customers with the same name different address that potentially are the same or different person living in different address over time. Apart from the customer dim I will use this approach as our standard across all dimension tables for the benefit of fact consolidation.

Thanks you very much and this have help a lot.

Sponsored content

Customer Dimension from multiple systems

Customer Dimension from multiple systems

Re: Customer Dimension from multiple systems

Re: Customer Dimension from multiple systems

Re: Customer Dimension from multiple systems

Re: Customer Dimension from multiple systems

Re: Customer Dimension from multiple systems

Re: Customer Dimension from multiple systems