Dimension design
5 posters
Page 1 of 1
Dimension design
Hi
I need to design Customer Confirmed Dimension, we have two sources, One has Customer id, customer Name while other one has only Customer Name and other attributes. There may be same Customer, in that case i don't want to duplicate the Customers, there must be only one customer row. and This Dimension table should connect to multiple Business Fact tables. If i keep Customer id, Customer Name in Customer Dimension table and I use customer id to join with one fact table and Other with Customer Name, is this the good solution? Or Can we use Other system Natural Key(customer id) in both fact tables. I would have used Surrogate Key, but i have attributes tracking so there will be New Surrogate Key everytime the record uupdated in source. I am just keeping surrogate Key in customer dimension as Unique identifier but not joining with any fact table
thanks in advance
I need to design Customer Confirmed Dimension, we have two sources, One has Customer id, customer Name while other one has only Customer Name and other attributes. There may be same Customer, in that case i don't want to duplicate the Customers, there must be only one customer row. and This Dimension table should connect to multiple Business Fact tables. If i keep Customer id, Customer Name in Customer Dimension table and I use customer id to join with one fact table and Other with Customer Name, is this the good solution? Or Can we use Other system Natural Key(customer id) in both fact tables. I would have used Surrogate Key, but i have attributes tracking so there will be New Surrogate Key everytime the record uupdated in source. I am just keeping surrogate Key in customer dimension as Unique identifier but not joining with any fact table
thanks in advance
biqv12- Posts : 3
Join date : 2014-03-21
Dimension design
biqv12 wrote:Hi
I need to design Customer Confirmed Dimension, we have two sources, One has Customer id, customer Name while other one has only Customer Name and other attributes. There may be same Customer, in that case i don't want to duplicate the Customers, there must be only one customer row. and This Dimension table should connect to multiple Business Fact tables. If i keep Customer id, Customer Name in Customer Dimension table and I use customer id to join with one fact table and Other with Customer Name, is this the good solution? Or Can we use Other system Natural Key(customer id) in both fact tables. I would have used Surrogate Key, but i have attributes tracking so there will be New Surrogate Key everytime the record uupdated in source. I am just keeping surrogate Key in customer dimension as Unique identifier but not joining with any fact table
thanks in advance
You can find a very detail explanation on how to create a customer dim in data warehouse toolkit book in chapter 6, page 146.
It is a good practice to use surrogate key as a primary key in the dim table which is used to join with the fact table(s).
If you are tracking history (SCD type 2) on some of customer attributes in the dim, then it is natural that surrogate keys will change. You can keep Natural Key(customer id) in customer dim and in fact table(s). This way you can identify same customer having multiple rows in the fact table, even customer has different surrogate keys because of SCD type 2 you have for customer dim.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Dimension design
Thanks for your reply
Yes, i had gone through the doc but problem here is i have different source system with same customers but different natural key and same customer name.
So i want to use same customer table to connect to the different fact table. I was thinking to join with customer name but was worried about performance as it string field. I am pretty sure that there wont be any duplicate names in the source.
Could you please advise if i can gi ahead with this design or some other solution
Thanks alot
Yes, i had gone through the doc but problem here is i have different source system with same customers but different natural key and same customer name.
So i want to use same customer table to connect to the different fact table. I was thinking to join with customer name but was worried about performance as it string field. I am pretty sure that there wont be any duplicate names in the source.
Could you please advise if i can gi ahead with this design or some other solution
Thanks alot
biqv12- Posts : 3
Join date : 2014-03-21
Re: Dimension design
The cleanest solution is to have the customer in the dimension table more than once. One row for each natural key. Facts from a particular source will reference the customer from the same source. This ensures the facts are always referencing the true customer.
Then there is the issue of maintaining attributes of the customer. By maintaining separate rows the issue is simplified because the foreign key references are fixed, it just becomes a matter of updating attributes on all rows of the same customer. This can be handled using a cross reference if one is available. If no cross reference exists, you still have a workable system with each customer row containing whatever attributes that is available from its source system.
As far a reporting goes, its all about the attributes, not the number of dimension rows. Customers with the same attribute values will summarize together regardless of the source. It is a matter of conforming attributes such that their values are consistent across sources.
Then there is the issue of maintaining attributes of the customer. By maintaining separate rows the issue is simplified because the foreign key references are fixed, it just becomes a matter of updating attributes on all rows of the same customer. This can be handled using a cross reference if one is available. If no cross reference exists, you still have a workable system with each customer row containing whatever attributes that is available from its source system.
As far a reporting goes, its all about the attributes, not the number of dimension rows. Customers with the same attribute values will summarize together regardless of the source. It is a matter of conforming attributes such that their values are consistent across sources.
Last edited by ngalemmo on Sun Mar 23, 2014 12:40 pm; edited 1 time in total
Dimension design
biqv12 wrote:Thanks for your reply
Yes, i had gone through the doc but problem here is i have different source system with same customers but different natural key and same customer name.
So i want to use same customer table to connect to the different fact table. I was thinking to join with customer name but was worried about performance as it string field. I am pretty sure that there wont be any duplicate names in the source.
Could you please advise if i can gi ahead with this design or some other solution
Thanks alot
I dealt with this situation when I was working with patients data coming from 100 different sources. BTW, first name and last name does not uniquely identify same person/customer or a patient. You need DOB or other attribute i-e SSN or address to determine if it is a same customer or not.
My business user wanted to see a patient's most current names, DOB, but wanted to keep track of location change. So I
created one table to dump source reference data for all the source and created a column to identify which patient's data among those sources I used for my conformed patient dim as a Master/golden record ( it must be only one patient row identified as master record) .
I created a conformed patient dim where I kept only 1 row per patient and it was SCD type 1. I created a mini patient dim where I tracked patient location changes and patient dim id was a foreign key in the location dim.
Without knowing what your business user want and how they want to use this data, you can use cross reference table as suggested by neglemmo. Using a string will slow sql, but if you end up using your approach then create customer full name column on the dim and fact table to join them.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Customer Count
The notion of a customer ‘golden record’ and separate customer dimension rows for each source has been discussed in a number of previous threads but I’m still a little confused by one aspect.
If the Customer dimension has multiple active rows, one for each source/natural key, and if a cross-reference table or table of ‘matches’ is available; notwithstanding the concerns about the validity of the matching rules used, what is the best way of providing the business with a count of unique customers across all sources constrained by customer dimension attributes?
Does the cross-reference table get used to create a Customer Count Fact table?
If so, how does it join to the Customer Dimension?
If the Customer dimension has multiple active rows, one for each source/natural key, and if a cross-reference table or table of ‘matches’ is available; notwithstanding the concerns about the validity of the matching rules used, what is the best way of providing the business with a count of unique customers across all sources constrained by customer dimension attributes?
Does the cross-reference table get used to create a Customer Count Fact table?
If so, how does it join to the Customer Dimension?
D_Pons- Posts : 16
Join date : 2009-02-10
Location : UK
Re: Dimension design
When you have multiple sources for a dimension and you need to dedupe these into a single golden record (and assuming the sources don't hold a common key) I would design it as follows:
- Create the Dimension with a surrogate key (as normal) and a synthetic business key. Assuming this is an SCD dimension then the synthetic business key doesn't change as you add new versions of a record
- Create a cross reference table (in your staging area or wherever) that has at least 3 columns: Source system Identifier, source system business/primary key, Dimension synthetic business key (plus any other columns you might want e.g. for auditing purposes)
When you load data into your Dimension you first go to your cross reference table and perform a lookup using source system identifier + source system key to find the synthetic business key - which you then use to access the Dimension table. If no record is found you then go through your dedupe process and add records to the cross-reference table and Dimension as necessary.
As an aside, you also need to bear in mind which source system is the master for each attribute in a Dimension and build your update logic around this - this ensures that you always hold the most accurate value for each attribute which is not necessarily the same as the most recent version of an attribute.
- Create the Dimension with a surrogate key (as normal) and a synthetic business key. Assuming this is an SCD dimension then the synthetic business key doesn't change as you add new versions of a record
- Create a cross reference table (in your staging area or wherever) that has at least 3 columns: Source system Identifier, source system business/primary key, Dimension synthetic business key (plus any other columns you might want e.g. for auditing purposes)
When you load data into your Dimension you first go to your cross reference table and perform a lookup using source system identifier + source system key to find the synthetic business key - which you then use to access the Dimension table. If no record is found you then go through your dedupe process and add records to the cross-reference table and Dimension as necessary.
As an aside, you also need to bear in mind which source system is the master for each attribute in a Dimension and build your update logic around this - this ensures that you always hold the most accurate value for each attribute which is not necessarily the same as the most recent version of an attribute.
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Dimension design
D_Pons wrote:The notion of a customer ‘golden record’ and separate customer dimension rows for each source has been discussed in a number of previous threads but I’m still a little confused by one aspect.
If the Customer dimension has multiple active rows, one for each source/natural key, and if a cross-reference table or table of ‘matches’ is available; notwithstanding the concerns about the validity of the matching rules used, what is the best way of providing the business with a count of unique customers across all sources constrained by customer dimension attributes?
Does the cross-reference table get used to create a Customer Count Fact table?
If so, how does it join to the Customer Dimension?
As per our business practice, a golden patient records is defined as only one most current and complete patient data . We had merge/purge or dedup logic to DETERMINE which patient record is the golden record among different sources. Flagging a patient as golden record happened in the reference table. An id also generated in the reference table. If there are different variations of a patient's rows coming from different sources in the reference table, then all those rows get same id.
For example, Angela Smith from source A, Ange Smith from source B, Smith Ange from source C would get same Id. Among those 3 records we determine which 1 record is golden and set the flag as 'Golden" just for that record.
patient dim insert and update happened from the reference table. We used golden flag record Id from reference table to do a look up in patient dim. If that id exists in patient dim do update else insert in the dim
So in the patient dim there was no source (clinic) identifier. Why, because if a patient record is determined 'golden' from one source, next time most current record can come from a different source.
If I wanted to count how many patient went to different clinics then I can do it from the fact table. In my fact table I created patients Dim visit to different clinics Dim relationships.
BTW, we used 3rd party software to dedup patient records for us.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Dimension design
D_Pons wrote:The notion of a customer ‘golden record’ and separate customer dimension rows for each source has been discussed in a number of previous threads but I’m still a little confused by one aspect.
If the Customer dimension has multiple active rows, one for each source/natural key, and if a cross-reference table or table of ‘matches’ is available; notwithstanding the concerns about the validity of the matching rules used, what is the best way of providing the business with a count of unique customers across all sources constrained by customer dimension attributes?
Does the cross-reference table get used to create a Customer Count Fact table?
If so, how does it join to the Customer Dimension?
You rely on attributes to identify the customer, not the surrogate key. For example, if you have an MDM system in place, it's 'golden record' will have some kind of identifier associated with it. You carry that as an attribute on all customer rows that are associated with it (through the cross-reference). Count distinct customers based on that attribute.
Re: Dimension design
ngalemmo wrote:D_Pons wrote:The notion of a customer ‘golden record’ and separate customer dimension rows for each source has been discussed in a number of previous threads but I’m still a little confused by one aspect.
If the Customer dimension has multiple active rows, one for each source/natural key, and if a cross-reference table or table of ‘matches’ is available; notwithstanding the concerns about the validity of the matching rules used, what is the best way of providing the business with a count of unique customers across all sources constrained by customer dimension attributes?
Does the cross-reference table get used to create a Customer Count Fact table?
If so, how does it join to the Customer Dimension?
You rely on attributes to identify the customer, not the surrogate key. For example, if you have an MDM system in place, it's 'golden record' will have some kind of identifier associated with it. You carry that as an attribute on all customer rows that are associated with it (through the cross-reference). Count distinct customers based on that attribute.
Okay. I think I need to just get my head around the notion that we then have a measure in a dimension and figure out the implications for any filters to the 'distinct customer count' we might want to apply from another dimension.
Thanks.
D_Pons- Posts : 16
Join date : 2009-02-10
Location : UK
Re: Dimension design
What measure? An identifier is an attribute not a measure. Counting distinct attribute values creates a measure, but that doesn't make the attribute a measure.
Similar topics
» FACT Design Question - How to design FACT Table when computation is required on the degenerate dimension?
» Dimension Design with intermediate tables between fact and dimension
» Need help with dimension design
» Dimension Design
» Design of a multivalued dimension,
» Dimension Design with intermediate tables between fact and dimension
» Need help with dimension design
» Dimension Design
» Design of a multivalued dimension,
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum