Data Obfuscation
+2
Sealeopard
JimShaw
6 posters
Page 1 of 1
Data Obfuscation
Our Information Security department are unhappy about use of production data in test environments.
However, the belief in the ETL team is that we would be extremely uncomfortable to go to production without having tested a sample of live data. ("The Data Warehouse Lifecycle Toolkit", Second Edition, p545 and p548, seems to agree with this view).
Does anybody have experience of obfuscating production data and using this for test purposes? Any hints, tips or experiences which could be shared would be useful.
Thanks
Jim
However, the belief in the ETL team is that we would be extremely uncomfortable to go to production without having tested a sample of live data. ("The Data Warehouse Lifecycle Toolkit", Second Edition, p545 and p548, seems to agree with this view).
Does anybody have experience of obfuscating production data and using this for test purposes? Any hints, tips or experiences which could be shared would be useful.
Thanks
Jim
JimShaw- Posts : 2
Join date : 2009-12-09
Location : Edinburgh, Scotland
Re: Data Obfuscation
It depends entirely on the type of data. The problem with obfuscating data is that you need to change it in such a way that you do not loose the inherent characteristincs of the data. For example, if you work with balances and transaction amounts then your obfuscated data still needs to tie out between the balances and transactions. Additioanlly, if there are inter-relationships betwene datasets, e.g. events, then these must be preserved as well.
In our environment, we ultimately got out IS/compliance/Audit groups to agree that we use full complements of production data in our DEV/STG environments. This has the advantage that code is developed under real-workd data conditiosn (warts and all) as well as provide the ability for real performance testing.
In our environment, we ultimately got out IS/compliance/Audit groups to agree that we use full complements of production data in our DEV/STG environments. This has the advantage that code is developed under real-workd data conditiosn (warts and all) as well as provide the ability for real performance testing.
Sealeopard- Posts : 4
Join date : 2011-05-17
Re: Data Obfuscation
Hi Jim,
I agree that use of production data for testing DW/BI applications is standard practice.
In my experience it is very difficult to establish and maintain a rigorous data obfuscation routine. Consider that as your source systems evolve (especailly when schemas change), you will need to completely refresh your test data and re-obfuscate.
So if you have to go down this path, try minimise the scope of the obfuscation - typically it only really needs to target people or organisation names, which hopefully only occur in a handful of columns in your DW.
Good luck!
Mike
I agree that use of production data for testing DW/BI applications is standard practice.
In my experience it is very difficult to establish and maintain a rigorous data obfuscation routine. Consider that as your source systems evolve (especailly when schemas change), you will need to completely refresh your test data and re-obfuscate.
So if you have to go down this path, try minimise the scope of the obfuscation - typically it only really needs to target people or organisation names, which hopefully only occur in a handful of columns in your DW.
Good luck!
Mike
Re: Data Obfuscation
I agree with Mike. Usually security issues only involve the ability to identify persons or organizations from the data. So obfuscation is often a simple matter of blanking or dummying names, addresses, and government identifiers, such as social security numbers. You should not need to obfuscate business keys, amounts, status codes or anything else of importance to testing.
And if they give you a hard time about business keys, you could argue that the only people who could tie a business key to a person or organization is one who has access to the production system (since such identification would not exist in test or QA). Such persons should be trusted enough since they have access to that data anyway.
And if they give you a hard time about business keys, you could argue that the only people who could tie a business key to a person or organization is one who has access to the production system (since such identification would not exist in test or QA). Such persons should be trusted enough since they have access to that data anyway.
Re: Data Obfuscation
Thanks for your replies to my question, which confirm my own thinking on this.
It is very valuable to get this kind of external validation. This will be helpful evidence in future debate.
Jim
It is very valuable to get this kind of external validation. This will be helpful evidence in future debate.
Jim
JimShaw- Posts : 2
Join date : 2009-12-09
Location : Edinburgh, Scotland
Re: Data Obfuscation
BTW, if your ETL team is using Informatica, they have an option that will subset and obfuscate production data.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Data Obfuscation
I've recently been looking into the same issue. I suppose the only real difference is that I have had experience in this area in building and masking test data subsets for application development. For that particular task I used Data Masker
However in the Data Warehouse environment I suggested that we go down the path of data discovery and only mask the critical information and do that under an ETL approach. For the ETL we used Talend community edition. In the end it was all about managing the risk from an organisational perspective.
IF your inclined you can check out some more info at http://www.datakitchen.com.au
However in the Data Warehouse environment I suggested that we go down the path of data discovery and only mask the critical information and do that under an ETL approach. For the ETL we used Talend community edition. In the end it was all about managing the risk from an organisational perspective.
IF your inclined you can check out some more info at http://www.datakitchen.com.au
data_cook- Posts : 1
Join date : 2013-06-23
Similar topics
» Looking for a Data Architect/Data Modeler for NYC Big Data Startup
» difference between data mart and data warehouse at logical/physical level
» clickstream fact data coming in with different levels of dimensional geography data
» Using the Dimensional Data Warehouse as source data for the OLTP process
» Anti-aliasing time series data in a data warehouse?
» difference between data mart and data warehouse at logical/physical level
» clickstream fact data coming in with different levels of dimensional geography data
» Using the Dimensional Data Warehouse as source data for the OLTP process
» Anti-aliasing time series data in a data warehouse?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum