Tuesday, August 28, 2007

Data Masking: What and why?

Trigger for this blog: IBM released their version of data masking solution http://www-03.ibm.com/press/us/en/pressrelease/22209.wss

This is not new thing i am blogging here. Masking mostly (notoriously) known in IT and widely/strictly adhered process in any software development practice. Security teams scan your systems, keep tab on support teams checking what they do with company's wealth.

Its all about dealing with production data in non production environments. Lovely, says my friend in testing group. Ok, this is very dangerous thing any organization can get into. Knowingly or unknowingly playing(so called organized testing) with production data can get companies in deep troubles of lawsuits and risk of getting out of business as well. In other words its also law in most of the countries keep customer and financial information etc secured and tightly controlled. Hmm... This is good but most of development team is going to be unhappy, because they will not get production data in test environment.(because they love to test on production data, or lazy to produce good test data)

Wait.. there is a solution >> Data Masking. (one of my friend calls these terms as 'old wine in new bottle': people used to do it before and now its named different)

So what is it?
Data masking is a systematic process where in Real data is masked in transformation stage so that when data is persisted in Test region, the data values would be scrambled.

Simple?! no. not really >> real challenge is assembling multiple such data elements and relating them. i.e for example Kiran has SSN as 123-45-6789 and if its scrambled to 987-65-4321 (scrambling is never this straight forward though), and if there is a requirement to link another data set of Kiran by SSN, if both datasets were not scrambled using same technique, then we would not be able to assemble and make any sense. This is real challenge in most of data warehousing environments today.

There are many product companies addressing this problem today.

Personally, i am interested to see how these solutions help companies to balance the internal quest for information mining without breaching security.

2 comments:

PrivacyExpert said...
This comment has been removed by the author.
PrivacyExpert said...

You are right about data masking. Data obfuscation/masking/anonymization/sanitization..etc all mean the same. I did work on couple of data masking tools and the process as you said is not straight forward.

Convenience or Compromise antenna for portable HF?!

 It depends on what one considers as good ops!. When speed matters like in regional emergency comms, NVIS strategy is most beneficial. NVIS ...