Examples of Pattern-Based Anonymization Rules
This section includes examples of commonly used pattern-based anonymization rules.
Example 1: Mask by pattern across all log files, without extract pattern
To mask all email addresses in all log files, use the following rule definition:
{
"name": "EMAIL",
"rule_id": "Pattern",
"patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])"],
"include_files": ["*.log*"],
"shared": false
}Example 2: Mask by pattern across all log files, with extract pattern
To mask encryption keys, logged in the following format Key=12.. with a value consisting of 64 hexadecimal characters, use the following rule definition:
{
"name": "ENC_KEYS",
"rule_id": "Pattern",
"patterns": ["Key=[a-f\\d]{64}\\s"],
"extract": "=([a-f\\d]{64})",
"include_files": ["*.log*"],
"shared": false
}Input data, test.log is:
encryption key=1234567890adc1234567aaabc1234567890adc1234567aaabc12345678901234 for keystore derby.system.home=null
Output data, test.log, with the encryption keys anonymized, is:
encryption key=‡8697685738fnx1736987qigyx7611731027yds0096404hlsph91727138403654‡ for keystore derby.system.home=null
Example 3: Mask by pattern across all files, except a few files
To mask email addresses in all files, except hdfs-site.xml and .property files, use the following rule definition:
{
"name": "EMAIL",
"rule_id": "Pattern",
"patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])"],
"exclude_files" : ["*.properties", "hdfs-site.xml"],
"shared": false
}Input data, version.txt, is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion git@github.com :hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z
Output file version.txt, with an anonymized email address, is:
Hadoop 2.7.3.2.5.0.0-1245 Subversion ‡qpe@unqfay.mjp‡ :hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59 Compiled by jenkins on 2016-08-26T00:55Z

