Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Dimension with different sources (multiple business keys) ?

+5
Mike Honey
LAndrews
ngalemmo
BoxesAndLines
ian.coetzer
9 posters

Page 1 of 2 1, 2  Next

Go down

Dimension with different sources (multiple business keys) ? Empty Dimension with different sources (multiple business keys) ?

Post  ian.coetzer Thu May 31, 2012 6:16 am

Hi

We want to build a conformed dimension to house our products.
So we want to end up with a table named "DimProduct" which we can then 'safely' link to several measure groups (Fact tables).

However, over the years, different departments created their own sql databases, all with their own lists of products.
Which they maintain independently ......

So some may have more products, and some less, since some departments do not require a full product catalog.
Worst is they have their own product codes ...

So my question would something like this work?

DimProduct
=====================================
ProductKey (our new unique surrogate primary key in the DW)

BKProductCreditKey (The primary key used by our credit division to uniquely identify products)
BKProductFinanceKey (The primary key used by our finance division to uniquely identify products)

ProductName
ProductDescription
ProductCategory

So in other words, sometimes a product may have both business keys,
and sometimes only one of the business keys.

Mapping them in order to decide which desription / name is valid where a match is found will be an interresting decision.

Any comments on this? Has anyone had this situation before?




ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  BoxesAndLines Thu May 31, 2012 8:53 am

That will work. I generally manage cross reference tables outside of the dimension though. That way, if new sources for the same thing appear, no changes are required. Your other problem is that you may have two product codes for the same item. Are you consolidating to a set of distinct products?
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ian.coetzer Thu May 31, 2012 10:46 am

Hi

I have no idea how to tackle the mapping yet, and how / what to use to decide whether the two data sets match!
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Thu May 31, 2012 11:34 am

I would build an abstract natural key which includes a source system code so that separate dimension rows are created for each source. Facts from that source would reference dimension rows from that source. The dimension would contain both source raw attributes and 'standard' attributes (i.e. the corporate version of the attributes based on some cross reference or other means) as well as a common NK as needed to maintain the standard attributes.

This allows you to accommodate any source and not be bound by a bad decisions or mistakes. If there is an error in standardizing the attributes, it is an easy matter of updating rows. If you only create standard rows (i.e. not source specific rows) you have to rekey facts if an error is found in the standardization (source rows mapped to the wrong standard version).
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ian.coetzer Thu May 31, 2012 11:42 am

Hi

Thanks for the reply, but I do not understand how one will then be able to drill across measures?

If for example I pick a product I want to see measures from both business areas side by side for that product if possible ... (so i support I must have some business logic that decides whether a product is a conformed product between those two sources)
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  LAndrews Thu May 31, 2012 12:57 pm


You would be able to drill across different facts based on the "Standard" attributes.


LAndrews

Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ian.coetzer Thu May 31, 2012 1:42 pm

If I were to create an SSAS Cube with relevant dimensions.
Then if I for example pick ProductName = "Green Shirt" which is linked to two surrogate primary keys 4 and 5, then will it return both records? I must check test this ...

Also the product names are not likely to match as nicely as in my example they are captured by different front - ends

Dimension with different sources (multiple business keys) ? Fixed_10


Last edited by ian.coetzer on Thu May 31, 2012 1:50 pm; edited 1 time in total (Reason for editing : fixed spelling mistake and primary keys in DimProduct screenshot.)
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Thu May 31, 2012 1:56 pm

That is the point of the standard attributes. When a user queries, they don't see keys, do they? They see attributes. If you have a standard product name attribute and a standard product ID attribute which have the same value across both rows, they would query on those and see one consolidated view. It doesn't matter how many dimension rows there are (otherwise how could a type 2 dimension work?). But if the names don't and will never match, how could you expect to create a single consolidated view anyway?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ian.coetzer Thu May 31, 2012 2:04 pm

Yes, I understand that a dimension should not show the underlying surrogate keys.
I tried building a simple cube based on that test data and it shows two "Green Shirt" entries one for each key.

Where did my sample design go wrong?

Dimension with different sources (multiple business keys) ? Exampl11

ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Thu May 31, 2012 2:26 pm

I am not a SSAS guy. It works great with normal SQL queries. Do you include source specific attributes in your cube, such as product ID or the dimension PK? Or, the problem may be the measures are coming from different fact tables and SSAS may be storing them separately. You may need to consider a union view to source the cube (do not include source specific attributes in the view). Also, check to see if there is an aggregation option that consolidates the display.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ian.coetzer Thu May 31, 2012 2:39 pm

Thanks for the tips, it will take me a while but I will post a solution when i manage to get this working in SSAS...

Yes, for each dimension one has to give it a primary key (in this case the ProductKey).
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  Mike Honey Thu May 31, 2012 7:00 pm

Hi Ian,

It looks like the Key property for your Product Name attribute is set to something besides Product Name - perhaps the Surrogate Key?

I'd normally configure a dimension like the one you described with the Surrogate Key column as the Dimension Key Attribute, but with AttributeHierarchyVisible = False. Then you can freely create other attributes and hierarchies that will aggregate by default.

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  hang Fri Jun 01, 2012 1:04 am

I think it's important to get the model right first. It seems to me a typical heterogeneous product scenario (pg.210 Kimball's dimensional modeling book), or subtype modeling in other literature.

In essence, you have a core product dimension containing all the common attributes, and a number of custom product dimensions with common attributes and their respective extended attributes. The SK in custom dimension is the same as in core dimension for the same product. You need to do some fuzzy matching or mapping table to consolidate common products in the core product dimension. You may have custom fact tables to cater for analysis specific to certain product type, and a core fact table, or consolidated fact view for your common product analysis.


Last edited by hang on Fri Jun 01, 2012 9:15 am; edited 1 time in total

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  BoxesAndLines Fri Jun 01, 2012 9:07 am

ngalemmo wrote:That is the point of the standard attributes. When a user queries, they don't see keys, do they? They see attributes. If you have a standard product name attribute and a standard product ID attribute which have the same value across both rows, they would query on those and see one consolidated view. It doesn't matter how many dimension rows there are (otherwise how could a type 2 dimension work?). But if the names don't and will never match, how could you expect to create a single consolidated view anyway?

How often does the same thing have the same attribute values from disaparate data sources? One source may say 'green shirt'. The other source may say 'Men's green shirt'. You need to rationalize the redundancy in the data to present a unified view of product data. That's a big value of data warehousing. You do this using data cleansing tools or good old fashioned elbow grease.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Fri Jun 01, 2012 12:14 pm

The issue is not that that needs to be done. The issue is it does take a lot of work, as you say. But, not only does it take a lot of work, it is highly prone to error.

This is the very reason why you should keep a row for each source/natural key combination in the dimension. It allows you to construct and load the dimension, and field the data warehouse, without having to wait for the perfect cross reference. The dimension itself becomes the cross reference and it can be easily corrected and adjusted as errors are discovered or new sources are brought in.

In the past I have encountered significant delays and problems trying to integrate such sources before they go into the DW. When you have only one dimension row and facts from different sources all pointing to that single row, it becomes extremely difficult to split things out when someone discovers an error in the cross reference. In systems, large or small, I can guarantee you that any manually constructed cross reference will have errors, or, if you rely on having such a cross reference build before implementation, you can count on retiring before a correct cross reference is constructed.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  hang Fri Jun 01, 2012 6:06 pm

Another typical example is that same employees are entered into different OLTP systems/sources, eg. HR and finance. What do we need to do to load employees into DW? Deduplicate and integrate to have a single version of truth. In my view, product is no much different from customer in terms of integration. If you have to integrate sooner or later, I think it only makes sense to do it in upstream process before presentation (dimensional) layer.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Fri Jun 01, 2012 6:13 pm

Then we will agree to disagree.

I fail to see where 'integration' means 'one row'. To me, integration means the two or more instances appear the same. I don't believe one should burn bridges and put themselves in a position of no recourse for the sake of presumed integration.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  hang Fri Jun 01, 2012 6:43 pm

Would you have the recourse in respective OLTP system already? Or if OLTP is not convinent enough, you could leave the disparate sources in another upstream layer, say staging, instead of dimensional store. I mean sooner or later, you may still need to consolidate two products into single row in the dimension if they obviously point to the same product. If we can eyeball to pick it up, then in most cases, we should be able to automate the deduplication by system. Maybe a good MDM system will help.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Fri Jun 01, 2012 8:13 pm

The issue is if you have two or more disparate systems feeding the same dimension, the dimension members and the fact relating to them will have a different set of natural keys than the other system(s). I would argue that you want to retain these natural keys and the relationship between the fact and dimension the same. That is the way the relationship is defined in the source system, so it should also be good enough for the data warehouse.

When it comes to integration, it is a matter of attributes not keys. MDM systems do not reassign keys in OLTP systems, but rather suggest a common set of attributes (the proverbial 'golden record') for the entities in these different systems. A data warehouse should be modeled in the same way. And, if you have an MDM system, you could store the MDM ID as an alternate key in the dimension and use that to update the common attributes in the dimension table provided by the MDM. All rows carrying the same MDM ID would have the same attribute values, essentially integrating the desperate system's data.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  hang Fri Jun 01, 2012 10:58 pm

Ngalemmo, I see your point to simplify the ETL process. However it comes down to delivering conformed dimension, we should at least consider some of the steps, if not all, suggested by Kimball as part of basic ETL process in delivering conformed dimensions, as follows:

- Standardizing
- Matching and deduplication
- Surviving

Take the first step for example. Have we tried to standardize our product with a unique NK across systems. Sometimes, it is as simple as identifieing the product by its SKU number, Sometimes you may want to go a step further, if SKU does not exist, to lookup a few columns to determine your match.

The point is, we should make some effort in ETL to deliver conformed dimensions before taking the simplistic approaches. Maybe the effort is not that overwhelming if we give a second thought.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Multiple sources in dimension.

Post  pitbull mix Mon Jun 11, 2012 11:35 am

Hello ngalemmo,

I'm working on project where I'm faced with a similar problem in this thread. I have multiple source systems and will want to have a dimension that is flexible enough to work with fact data from these sources. I didn't understand the part where you talked of the dimension having both source raw attributes and 'standard' attributes?

Can you provide a sketch / draft of a source system dimension table and explain the abstract natural key? It appears that this would be a conformed dimension to work across various models? Also based on this approach those who are interested or allowed to view data from one source or another could be accomplished as well.

Here's what I am thinking about so far for a source dimension.

D_Source

SK SKCode SourceSystemDesc
1 1ABC ABC



Thanks,

Chad

ngalemmo wrote:I would build an abstract natural key which includes a source system code so that separate dimension rows are created for each source. Facts from that source would reference dimension rows from that source. The dimension would contain both source raw attributes and 'standard' attributes (i.e. the corporate version of the attributes based on some cross reference or other means) as well as a common NK as needed to maintain the standard attributes.

This allows you to accommodate any source and not be bound by a bad decisions or mistakes. If there is an error in standardizing the attributes, it is an easy matter of updating rows. If you only create standard rows (i.e. not source specific rows) you have to rekey facts if an error is found in the standardization (source rows mapped to the wrong standard version).

pitbull mix

Posts : 8
Join date : 2012-06-11

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Mon Jun 11, 2012 5:16 pm

Let's use a Product dimension as an example. You have two different ERP systems with a product master. Each uses its' own set of product IDs, some products are in common, others are not.

If system A has product #123 and system B has product #456 and both products are the same, you need to reflect that in the dimension table. One way is to have a cross-reference before the dimension that makes them the same and you create one row in the dimension table. The other way is to have two rows in the dimension with different natural keys: A|123 and B|456 respectively. In either case you need to figure out what goes in the attributes. If you go with one row, which source takes precedence on which attributes? If both have slightly different product names, which one do you use? Also, since you usually run into situations like this after a merger, it is possible, for at least a while, you may need to support both system's hierarchies, product classifications, or vendors. How do you accomplish that if there is only one row?

If you go with individual rows for each source, you can group attributes into three general categories: those that are to be common across sources, those that remain unique to the source, and PK information so you can trace the data back to the source. The attributes in the first two categories will change over time depending on business need. For example, product classification attributes may initially be unique to the source. As the business reviews the situation they may develop rules to reclassify produces to a standard structure. When that happens, the classification attributes transition from source unique to enterprise common following a conversion of the attribute. Doing this with only one row gets very clumsy and difficult to use, particularly if the are many sources.

By storing unique rows, you bind the attributes in the row to the specific product referenced in the source system. Facts coming from system A will only reference dimension rows for products in system A, the system A facts would never reference a system B product dimension row. In the one-row technique facts from both systems will reference the same dimension row, based on a cross reference. But, if that cross reference is wrong, you need to reassign the FKs in facts that were loaded using the erroneous cross reference. The source specific technique eliminates that problem and allows you to adjust the interpretation of the facts by manipulating the dimensional attributes, a much, much simpler undertaking. After all, integration occurs through attributes, not keys.

Maintaining separate rows allows the business to alter and refine it's integration rules over time without loss of information or rebuilding fact tables. It allows data warehouse integration project to proceed (at least get the data in), before the business needs to define how the data is to be integrated. This allows the data warehouse itself to serve as a tool to decide how the data will be integrated, bootstrapping the rules development process. All that needs to be done is map the attributes from the two systems which is a more technical exercise than one that requires significant business decisions prior to implementation. Worst case is to have a wide dimension that covers all attributes in both sources.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  BoxesAndLines Tue Jun 12, 2012 9:41 am

It's not really that hard to dedup disparate data sources to a single unified view. This is especially true if you have a data quality tool. All the MDM/CDI solutions do this as well. To me, the value the warehouse brings is presenting a unified view of product, customer, sales, whatever. Two rows for the same instance of a customer is simply misleading and troublesome when reporting. Where do I send mail? How much product has he ordered? What is his lifetime value?
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  ngalemmo Tue Jun 12, 2012 11:05 am

MDM systems do not 'dedupe', they take guesses at what are the same based on user defined rules. Problem is, as rules change, so do the guesses. Those who use output from MDM to dedupe what they send to the DW are opening themselves to problems when such guesses change.

Also the notion that you can somehow create a consistent unified picture of customer is a fantasy. You do not have 'two rows for the same instance of customer' because they are not the same instance. They are different representations of a customer in different systems. Wither or not they are actually the same is a judgement based on information known at the time. Such a judgement is always subject to change. Storing it in one row effectively locks the decision and does not allow reevaluation.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  nmcdermaid Wed May 15, 2013 7:40 am

I'm facing this issue now and I'm pretty happy with the multi row approach - it does align well with the concept of SCD2. I have also used this approach frequently in a cube-centric solution by placing SK at the leaf (and hiding it) then putting the 'consistent' attribute above that.

The issue I'm considering now though, is if we have a nice consistent attribute that is useful for reporting on (and therefore effectively merging and hiding multiple members), such as 'Green Shirt'. This happens to be the same value in both active source systems so things merge nicely.

If this attribute changes (to Green Singlet) in just one system, and this is an SCD2, we'll find that one source systems dimension member will split, but the other one won't. So at a point in time our consistent reporting against 'Green Shirt' will split.


I guess this is a problem for the MDM tool to sort out afterwards. Or we might need to find a more solid attribute to base our reporting on.


I definitely see the value in this aproach is that we can more quickly arrive at a populated star schema without being held up by a MDM process. Delivering a star schema then lets users see how dirty their data is and lends urgency to the MDM process (with obvious results afterwards as we clean up the dimension and add consistent attributes)

nmcdermaid

Posts : 1
Join date : 2013-05-15

Back to top Go down

Dimension with different sources (multiple business keys) ? Empty Re: Dimension with different sources (multiple business keys) ?

Post  Sponsored content


Sponsored content


Back to top Go down

Page 1 of 2 1, 2  Next

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum