Data Problems in Datawarehouse
2 posters
Page 1 of 1
Data Problems in Datawarehouse
Hello,
I'm in a scenario where the datawarehouse has 2 or 3 dimensions with duplicate data, and has multiple intermediate databases (not staging) to do the load into final target database.
Issues started when the data sources which belonged to two different companies merged into one (ex: two sources for sales customer data). Both were separately identified with DB1 and DB2 identifier column. Later DB1 data was moved to DB2 in the source system, and all the DB1 data got loaded to the warehouse again with DB2 identifier.
Now while building the analytical cube we have arrived at a workaround by using composite keys , and hence picking non duplicate values only in the analytical database. My questions are
1. Is the approach right?
2. Will this cube be infallible in future, considering the underlying database already has issues.
3. What are the predictable cons that can occur in such design.
4. Is ther any other solution for this?
5. Is building the entire warehouse from scratch a better alternative?
Thanks in advance for your valuable inputs.
I'm in a scenario where the datawarehouse has 2 or 3 dimensions with duplicate data, and has multiple intermediate databases (not staging) to do the load into final target database.
Issues started when the data sources which belonged to two different companies merged into one (ex: two sources for sales customer data). Both were separately identified with DB1 and DB2 identifier column. Later DB1 data was moved to DB2 in the source system, and all the DB1 data got loaded to the warehouse again with DB2 identifier.
Now while building the analytical cube we have arrived at a workaround by using composite keys , and hence picking non duplicate values only in the analytical database. My questions are
1. Is the approach right?
2. Will this cube be infallible in future, considering the underlying database already has issues.
3. What are the predictable cons that can occur in such design.
4. Is ther any other solution for this?
5. Is building the entire warehouse from scratch a better alternative?
Thanks in advance for your valuable inputs.
alpsdev- Posts : 2
Join date : 2012-02-20
Re: Data Problems in Datawarehouse
Interesting questions.
1. Is the approach right?
Don't know. It would be good to know what the 'composite key' thing is and the general state of the warehouse.
2. Will this cube be infallible in future, considering the underlying database already has issues.
Its the 'issues' comment that causes concern, which is why I hedged the first question. I've never known a cube to be infallible.
3. What are the predictable cons that can occur in such design.
Other than space I would not have a problem having more than one row for a customer if each could be tracked back to the source system they came from. The issue is the attributes you are maintaining for these customers. Keep two sets of attributes for each customer... possibly field a second table using the same primary keys. The original table would contain the actual attributes that were received from an original source system while the companion dimension contains the attributes you would want the business to see... customer attributes that have been standardized to the new system and are common to all rows that represent the same customer. Users will see them as one customer in their queries. Those who need to trace things back to the original source can always use the original dimension version to do forensic analysis.
4. Is ther any other solution for this?
Sure. But it depends on the problem and any background issues, what you are trying obtain, and the value it brings to the business.
5. Is building the entire warehouse from scratch a better alternative?
That is one solution. Better? It depends.
1. Is the approach right?
Don't know. It would be good to know what the 'composite key' thing is and the general state of the warehouse.
2. Will this cube be infallible in future, considering the underlying database already has issues.
Its the 'issues' comment that causes concern, which is why I hedged the first question. I've never known a cube to be infallible.
3. What are the predictable cons that can occur in such design.
Other than space I would not have a problem having more than one row for a customer if each could be tracked back to the source system they came from. The issue is the attributes you are maintaining for these customers. Keep two sets of attributes for each customer... possibly field a second table using the same primary keys. The original table would contain the actual attributes that were received from an original source system while the companion dimension contains the attributes you would want the business to see... customer attributes that have been standardized to the new system and are common to all rows that represent the same customer. Users will see them as one customer in their queries. Those who need to trace things back to the original source can always use the original dimension version to do forensic analysis.
4. Is ther any other solution for this?
Sure. But it depends on the problem and any background issues, what you are trying obtain, and the value it brings to the business.
5. Is building the entire warehouse from scratch a better alternative?
That is one solution. Better? It depends.
Similar topics
» Problems with design to allow Rolling up of Hierarchical Data
» Data extraction from a datawarehouse
» DataWarehouse Data Model - financial Serivces
» Modelling Sales of Products and Product Packs
» ETL Design Problems for Real time
» Data extraction from a datawarehouse
» DataWarehouse Data Model - financial Serivces
» Modelling Sales of Products and Product Packs
» ETL Design Problems for Real time
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum