Multiple source systems, some with many Business Key namespaces, and little organisation engagement

View previous topic View next topic Go down

Multiple source systems, some with many Business Key namespaces, and little organisation engagement

Post  Al Wood on Thu Nov 20, 2014 10:02 am

Hi,

I have a problem of poorly defined business keys for some dimensions in the Data Warehouse. Specifically, Staff and Location.

We have about twenty source systems where Staff and Location are recorded. Lets take Staff as an example. I'm in the Healthcare field in the UK, so we have a GMC code for each doctor. We were told to use that. Well, some problems arose.

Not all staff are doctors, so they dont have GMC codes. For doctors, not all the sources have the GMCCodes in them. Doctor's GMC codes are usually entered a few weeks after they start work with us. Occasionally they are entered wrongly, and never checked, or corrected a year later. The organisation does not use GMC code to refer to doctors, they use their names, possibly with an initial if that is needed. Different wards refer to doctors by different versions of their names. And yes, it leads to confusion. Recently we did a data migration to a new system. At the same time a junior doctor joined us with the same surname and initial as a a senior doctor. Day one of the new system - phone call from senior doctor - "Where is all my activity?" Clinic lists? Test results? Panic! Then phone call from junior doctor - "I seem to have a lot of stuff on my screen?"

Luckily, that was not my problem. Data Warehousing seems a quiet, peaceful task by comparison, but we suffer from the same root problem. We recently received a change to the definition of business key for staff. We now concatenate their Role, Title, Full Name, and GMC code. And we combine this with the Source System column in the dimension to create a "Natural Key".

But this would not have prevented the DW from having problems in the scenario described above. Sure, the new doctor would have popped up in the DW as a new staff member. But then the Data Steward would probably have grouped them in with the senior doctor's name, because they had no GMC code to start with.

And that's just one system. We have multiple wards using multiple systems, each with their own naming conventions....

For the Location Business Keys, we have an even greater key collision problem, even within single systems.

How can I get the organisation involved? Even some IT people around me can't see the issues.

Does anyone have any advice?

Many thanks,
Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

View user profile

Back to top Go down

Re: Multiple source systems, some with many Business Key namespaces, and little organisation engagement

Post  ngalemmo on Thu Nov 20, 2014 2:59 pm

I find it incredible that this would be an issue.  Doesn't the hospital use computers?  How do they keep track of who is on staff and what everybody is doing?

The important thing about a natural key is that it is based on data you receive from the transactional data.  After all, the whole point is to associate a fact with a dimension.  Good luck.

BTW - never use a natural key that is based on unstable information.  Names, roles and titles are simply asking for trouble.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Multiple source systems, some with many Business Key namespaces, and little organisation engagement

Post  Al Wood on Fri Nov 21, 2014 9:13 am

Hi ngalemmo,

Thanks for the reply.

I'm interested in what you say about about the Natural Key. I have tried to use the Business Key + Source System, that is the key used by the wider organisation, for a few reasons. One is that the Data Steward can read this key, recognize it, and so classify the key and enhance the attributes. We shouldn't use too many attributes in a composite key, so that the Data Steward isn't bothered too often without good reason. For example, being asked about changes in the source system that cause the DW to see a new key and highlight it, when it's really just a change to an attribute.

The source system mentioned before has a guaranteed, unique code that we could use, but it's a like a GUID, so it isn't used to identify staff members operationally, in fact it's not displayed on any front end screens. Are you saying we could use that as the business key? I should mention that we have several examples of the same Staff member in the system twice with different GUIDs.
Although the GUID is guaranteed to be unchanging, I'm still wary of using it - the supplier has let us down before.

We plan to flag up new entities to a Data Steward, perhaps by Email. If we use the GUID, does the Email say that there is a new A. Smith with all the same details as the old one, and ask if this is a duplicate version of any existing staff? Maybe if the new A. Smith was entered into the system by the Data Steward, she might know if they are supposed to be different. But if the Data Steward is not the person entering the new staff entity, we risk combining two different people.

Many thanks,
Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

View user profile

Back to top Go down

Re: Multiple source systems, some with many Business Key namespaces, and little organisation engagement

Post  nick_white on Sun Nov 23, 2014 9:05 am

Just to expand on what ngalemmo said, a DW cannot solve all a business' data issues and is not a solution for poor source system design.

A DW can, for example, cleanse different coding systems used in different source system by mapping to an agreed standard e.g. one system uses A, B, C as priority codes and another system uses 1, 2, 3 then you can agree with the business what codes the DW will use and map the source system code values to the appropriate DW code values.
What a DW can't do is guarantee that "John Smith" in System A is the same person as "John Smith" in System B - unless both systems use a common unique identifier. As DW designers we are understandably tempted to try and be helpful and resolve this issue with the DW as best as we can - however I would strongly recommend that you avoid trying to do this as you can't guarantee to get it right 100% of the time and if you try and fail it becomes a DW problem whereas it is actually a source system problem.

So to re-state basically what ngalemmo said, every employee must have an employee number/id in the staff master system (presumably the HR/payroll system?) which they are given when they join the organisation. If other source systems have been created that do not use this ID then the designers of these systems need to have the errors of their ways gently explained to them

Looks like your organisation has an MDM project on its hands!

Regards,

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Multiple source systems, some with many Business Key namespaces, and little organisation engagement

Post  Al Wood on Mon Nov 24, 2014 6:46 am

Hi Nick,

I'm afraid that this is just the situation I have got to deal with. You mention poor source system design, but nearly all of our systems expose a unique key, the problem is the degree of trust we can place in them. How much we trust that it won't change if data has to be restored or reloaded into that system. This is a judgement call that can lead us down a difficult road.

I have worked with some of these systems, and I am more sympathetic to their quirks than you might think. Some of the software is old, and from necessity we use it in ways that were not envisioned by the designers. We are a unique organisation.

You mention using HR staff codes, but for most of our dimensions we can't accept any one source system as the master source. The reason is that a single staff code in HR may have multiple IDs in another system as a workaround, for various purposes, and the same set of logins in a third system. That would need cross mapping between these two secondary systems. The potential number of cross mapping tables for 20 systems is 190. And then, the tables may have conflicting mappings.

We have adopted an alternative approach that allows us to use one mapping table, which is the dimension itself. We use [source system] and [business key] as the unique key, and then another field [conformed name] as the overall wide-scope business key. This is maintained by a Data Steward or by automated business rules. When querying the tables or building cubes we group facts together under one staff member by using [conformed name]. If we can't get engagement from the organisation to put data into [conformed name], they get nothing back. The moment they enter something, they see the results.

This is the most robust design we can find, but it does rely on solid business keys at the very least.

Regards,
Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

View user profile

Back to top Go down

Re: Multiple source systems, some with many Business Key namespaces, and little organisation engagement

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum