Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Data Obfuscation

+2
Sealeopard
JimShaw
6 posters

Go down

Data Obfuscation Empty Data Obfuscation

Post  JimShaw Wed May 18, 2011 6:46 am

Our Information Security department are unhappy about use of production data in test environments.

However, the belief in the ETL team is that we would be extremely uncomfortable to go to production without having tested a sample of live data. ("The Data Warehouse Lifecycle Toolkit", Second Edition, p545 and p548, seems to agree with this view).

Does anybody have experience of obfuscating production data and using this for test purposes? Any hints, tips or experiences which could be shared would be useful.

Thanks

Jim

JimShaw

Posts : 2
Join date : 2009-12-09
Location : Edinburgh, Scotland

Back to top Go down

Data Obfuscation Empty Re: Data Obfuscation

Post  Sealeopard Wed May 18, 2011 10:02 am

It depends entirely on the type of data. The problem with obfuscating data is that you need to change it in such a way that you do not loose the inherent characteristincs of the data. For example, if you work with balances and transaction amounts then your obfuscated data still needs to tie out between the balances and transactions. Additioanlly, if there are inter-relationships betwene datasets, e.g. events, then these must be preserved as well.

In our environment, we ultimately got out IS/compliance/Audit groups to agree that we use full complements of production data in our DEV/STG environments. This has the advantage that code is developed under real-workd data conditiosn (warts and all) as well as provide the ability for real performance testing.

Sealeopard

Posts : 4
Join date : 2011-05-17

Back to top Go down

Data Obfuscation Empty Re: Data Obfuscation

Post  Mike Honey Wed May 18, 2011 8:36 pm

Hi Jim,

I agree that use of production data for testing DW/BI applications is standard practice.

In my experience it is very difficult to establish and maintain a rigorous data obfuscation routine. Consider that as your source systems evolve (especailly when schemas change), you will need to completely refresh your test data and re-obfuscate.

So if you have to go down this path, try minimise the scope of the obfuscation - typically it only really needs to target people or organisation names, which hopefully only occur in a handful of columns in your DW.

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Data Obfuscation Empty Re: Data Obfuscation

Post  ngalemmo Wed May 18, 2011 9:39 pm

I agree with Mike. Usually security issues only involve the ability to identify persons or organizations from the data. So obfuscation is often a simple matter of blanking or dummying names, addresses, and government identifiers, such as social security numbers. You should not need to obfuscate business keys, amounts, status codes or anything else of importance to testing.

And if they give you a hard time about business keys, you could argue that the only people who could tie a business key to a person or organization is one who has access to the production system (since such identification would not exist in test or QA). Such persons should be trusted enough since they have access to that data anyway.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Data Obfuscation Empty Re: Data Obfuscation

Post  JimShaw Wed May 25, 2011 8:39 am

Thanks for your replies to my question, which confirm my own thinking on this.

It is very valuable to get this kind of external validation. This will be helpful evidence in future debate.

Jim


JimShaw

Posts : 2
Join date : 2009-12-09
Location : Edinburgh, Scotland

Back to top Go down

Data Obfuscation Empty Re: Data Obfuscation

Post  BoxesAndLines Fri Jun 03, 2011 11:29 am

BTW, if your ETL team is using Informatica, they have an option that will subset and obfuscate production data.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Data Obfuscation Empty Re: Data Obfuscation

Post  data_cook Sun Jun 23, 2013 4:39 am

I've recently been looking into the same issue. I suppose the only real difference is that I have had experience in this area in building and masking test data subsets for application development. For that particular task I used Data Masker

However in the Data Warehouse environment I suggested that we go down the path of data discovery and only mask the critical information and do that under an ETL approach. For the ETL we used Talend community edition. In the end it was all about managing the risk from an organisational perspective.

IF your inclined you can check out some more info at http://www.datakitchen.com.au

data_cook

Posts : 1
Join date : 2013-06-23

Back to top Go down

Data Obfuscation Empty Re: Data Obfuscation

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum