Dimension Design with Multiple Data Sources (ORACLE and SQL Server)

Zulfiqar Sun May 31, 2009 10:14 am

Hi All.

We have a client that is currently migrating from a legacy SQL Server (6.5) based ERP to Oracle e-Business Suite. The old legacy system was distributed in both application and database, however the application and database structure was (almost) exactly the same. To identify the source of the GET part of the ETL we have added an identifier called source_system_id. Each one is unique for each separate data source. We have the task of building a data warehouse in SQL Server to provide two consolidated reports; one for the sales and the other for inventory. We have just finished modelling one of the dimensions, item, luckily we can utilise a master source for all legacies, i.e. we only have two data sources for it. In the legacy system the item code was a varchar and in the new Oracle system it is an integer. Obviously we cannot store them in one common field in the item dimension table. We modelled it as follows:

item_sk, inventory_item_id, legacy_item_code, item_name, item_desc

We wanted to have a design in which once the data is brought from each source into the staging area, we could have one neat design onwards from there rather than having multiple branches to deal with each data source separately.

the complex part of it is that there is a mapping table in ORACLE that contains mapping for legacy_item_code to the new inventory_item_id. Currently this is not complete for all items and will not be so until the migration is complete. We check first to see if mapping exists in ORACLE if so we set the inventory_item_id from the mapping table and set the legacy_item_code to default value so that the item_sk updated is for ORACLE item code. We updated the sk as follows after GET in the staging table:

update d
d.item_name = s.item_name,
d.item_desc = s.item_desc
from stg.item_dim s, dwh.item_dim d
where s.inventory_item_id = d.inventory_item_id
and s.legacy_item_code = d.legacy_item_code

and inserted new rows to item_dim based on non existence of the two same columns i.e. inventory_item_id, legacy_item_code

Now what is happening is that as they add new mappings the fact table is picking up the item_sk based on the mapped inventory_item_id and there exist older rows that do have the old generated item_sk when the mapping wasnt there.

Anyone know of a method of how to resolve this so that even if new mappings are added it can pick up one item_sk for both the old and new codes.

Thanks.

Zulfiqar.