Skip to main content
Dave Mason - Mastodon Dave Mason - LinkedIn Dave Mason - GitHub Dave Mason - Counter Social

Data Obfuscation for SQL Server

Dave Mason - Data Obfuscation for SQL Server

In a previous post, I explored an option for generating fake data in sql server using Machine Learning services and the R language. I've expanded on that by creating some stored procedures that can be used for both generating data sets of fake data, and for obfuscating existing SQL Server data with fake data.

The code is available in a Github repository. For now, it consists of ten stored procedures. For data obfuscation, you'd use one of more of these five stored procs:

  • Obfuscator.SetAlphaNumericData
  • Obfuscator.SetEmailAddresses
  • Obfuscator.SetFullNames
  • Obfuscator.SetPhoneNumbers
  • Obfuscator.SetSSNs

Each procedure has input parameters for database name, table schema, table name, and column name(s). There is also an optional parameter to indicate whether triggers should be disabled and re-enabled. (Obfuscator.SetSSNs has an additional optional parameter to specify a formatting character.) Here are some examples from the [AdventureWorks] database, showing the data both before and after running each obfuscation proc:


Alphanumeric Data


EXEC MLtools.Obfuscator.SetAlphaNumericData
@DatabaseName
= 'Adventureworks',
@TableSchema = 'Production',
@TableName = 'Product',
@AlphaNumericColumn = 'ProductNumber',
@DisableTriggers = 1;
Obfuscator.SetAlphaNumericData - Data Obfuscation for SQL Server

BeforeAfter
Obfuscator.SetAlphaNumericData - Data Obfuscation for SQL ServerObfuscator.SetAlphaNumericData - Data Obfuscation for SQL Server


Email Addresses


EXEC MLtools.Obfuscator.SetEmailAddresses
@DatabaseName
= 'Adventureworks',
@TableSchema = 'Person',
@TableName = 'EmailAddress',
@EmailAddressColumn = 'EmailAddress',
@DisableTriggers = 1;
Obfuscator.SetEmailAddresses - Data Obfuscation for SQL Server

BeforeAfter
Obfuscator.SetEmailAddresses - Data Obfuscation for SQL ServerObfuscator.SetEmailAddresses - Data Obfuscation for SQL Server


Full Names


EXEC MLtools.Obfuscator.SetFullNames
@DatabaseName
= 'Adventureworks',
@TableSchema = 'Person',
@TableName = 'Person',
@FirstNameColumn = 'FirstName',
@LastNameColumn = 'LastName',
@MiddleNameColumn = 'MiddleName',
@DisableTriggers = 1;
Obfuscator.SetFullNames - Data Obfuscation for SQL Server

BeforeAfter
Obfuscator.SetFullNames - Data Obfuscation for SQL ServerObfuscator.SetFullNames - Data Obfuscation for SQL Server


Phone Numbers


EXEC MLtools.Obfuscator.SetPhoneNumbers
@DatabaseName
= 'Adventureworks',
@TableSchema = 'Person',
@TableName = 'PersonPhone',
@PhoneNumColumn = 'PhoneNumber',
@DisableTriggers = 1;
Obfuscator.SetPhoneNumbers - Data Obfuscation for SQL Server

BeforeAfter
Obfuscator.SetPhoneNumbers - Data Obfuscation for SQL ServerObfuscator.SetPhoneNumbers - Data Obfuscation for SQL Server


Social Security Numbers


EXEC MLtools.Obfuscator.SetSSNs
@DatabaseName
= 'Adventureworks',
@TableSchema = 'HumanResources',
@TableName = 'Employee',
@SsnColumn = 'NationalIDNumber',
@FormatChar = '-',
@DisableTriggers = 1;
Obfuscator.SetSSNs - Data Obfuscation for SQL Server

BeforeAfter
Obfuscator.SetSSNs - Data Obfuscation for SQL ServerObfuscator.SetSSNs - Data Obfuscation for SQL Server


Design Notes

The "Obfuscator" stored procedures each invoke a "Generator" procedure. These returns result sets of fake data without performing any changes to existing data:

  • Generator.GetAlphaNumericData
  • Generator.GetEmailAddresses
  • Generator.GetFullNames
  • Generator.GetPhoneNumbers
  • Generator.GetSSNs

The "Generator" and "Obfuscator" functionality were separated for a couple of reasons. The first, perhaps more obvious reason, is that each type of procedure has a different purpose. The other reason has to do with script language. My code use the R language and takes advantage of the "generator" package for R (among others). But perhaps R isn't your favorite. The procedures could easily be rewritten using Python (starting with SQL 2017) or Java (starting with SQL 2019). Those languages may offer other packages/libraries with better or different functionality for whatever your needs are. Or you may simply have more experience with (or a preference for) a different scripting language.

All of the stored procedures have a header and code comments--hopefully it is easy to understand and is not too arcane. Happy obfuscating!

Comments

Post comment