Test Data Generation
3 posters
Page 1 of 1
Test Data Generation
Hi,
I am a part of a data warehousing project which has now entered into testing phase. Our client or if I have to generalize every Company is reluctant to share the production data with the software vendors for testing purpose - For security reasons.
For testing a data warehouse, we need to use good test data without compromising on data security and privacy concerns. So how to generate almost real test data which will improve the test quality. The test data must give a feeling of near real production data and also in equal proportion / size of the production machine.
Has this been discussed already?
I am a part of a data warehousing project which has now entered into testing phase. Our client or if I have to generalize every Company is reluctant to share the production data with the software vendors for testing purpose - For security reasons.
For testing a data warehouse, we need to use good test data without compromising on data security and privacy concerns. So how to generate almost real test data which will improve the test quality. The test data must give a feeling of near real production data and also in equal proportion / size of the production machine.
Has this been discussed already?
Udankar- Posts : 1
Join date : 2009-03-21
Re: Test Data Generation
Well we have faced the same problem. Cooking up data on your own is a time consuming and resource intensive task, so what we did was that we asked our client to give us some fudged sample data which they agreed.
For example we were dealing with client’s consumer data, so we asked our client to fabricate their customer’s identification and contact details by eliminating 5-6 characters from important columns. Hope you get your hands around it.
For example we were dealing with client’s consumer data, so we asked our client to fabricate their customer’s identification and contact details by eliminating 5-6 characters from important columns. Hope you get your hands around it.
Mohsin- Posts : 4
Join date : 2009-03-03
Test Data
I think it is impossible to launch a datawarehouse project without having access to real production data. It is necessary at the early stages of analysis to execute a minimum of data profiling on source (file or DB) in order to have a clear picture of the format, patterns, referential integrity and business rules which apply to real data. What we found in the specifications or in the meta-data (if they exist) is often very different from reality. It would be dangerous to underestimate this step. This will avoid many surprises that we unfortunately discovered too late. Many data integration projects fail because this phase of analysis was not done correctly.
How to determine the granularity of a Fact table if fields forming the logical key are not clearly identified? How denormalize amounts for example, if the possible values of the "type of amount" are not fully listed, etc ...
Do not forget that if your data warehouse must take into account historical data, the problem may become more serious. The legacy application are the source of these data and rules that govern the data, changes over years. There are many examples where the meaning of a field in the source DB has changed over the years.
No honestly, your job is not easy. Building a data warehouse without having access to all real source data is a bit like walking blindfolded into a minefield.
Good luck!
How to determine the granularity of a Fact table if fields forming the logical key are not clearly identified? How denormalize amounts for example, if the possible values of the "type of amount" are not fully listed, etc ...
Do not forget that if your data warehouse must take into account historical data, the problem may become more serious. The legacy application are the source of these data and rules that govern the data, changes over years. There are many examples where the meaning of a field in the source DB has changed over the years.
No honestly, your job is not easy. Building a data warehouse without having access to all real source data is a bit like walking blindfolded into a minefield.
Good luck!
pbestgen- Posts : 4
Join date : 2009-02-04
Similar topics
» How to test deployment of ETL jobs (dev-test-production)?
» SK generation in SQL Server 2005/2008
» Surrogate Key Generation for Orders/Transactions
» ETL Automation test
» Command that selects Max Value from multiple columns across a row in SQL Server
» SK generation in SQL Server 2005/2008
» Surrogate Key Generation for Orders/Transactions
» ETL Automation test
» Command that selects Max Value from multiple columns across a row in SQL Server
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum