Persidio Anonymizer package - replaces analyzed text with desired values.
The Presidio anonymizer is a Python based module for anonymizing detected PII text entities with desired values.
Use the following button to deploy presidio anonymizer to your Azure subscription.
The Presidio-Anonymizer package contains both Anonymizers and Deanonymizers.
Presidio anonymizer comes by default with the following anonymizers:
Replace: Replaces the PII with desired value.
new_value
- replaces existing text with the given value.
If new_value
is not supplied or empty, default behavior will be: <entity_type>
e.g: <PHONE_NUMBER>Redact: Removes the PII completely from text.
Hash: Hashes the PII using either sha256, sha512 or md5.
hash_type
: Sets the type of hashing.
Can be either sha256
, sha512
or md5
.
The default hash type is sha256
.Mask: Replaces the PII with a sequence of a given character.
Parameters:
chars_to_mask
: The amount of characters out of the PII that should be
replaced.masking_char
: The character to be replaced with.from_end
: Whether to mask the PII from it's end.Encrypt: Encrypt the PII entity text and replace the original with the encrypted string.
Custom: Replace the PII with the result of the function executed on the PII string.
lambda
: Lambda function to execute on the PII string.
The lambda return type must be a string.The Anonymizer default setting is to use the Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael.
key
: A cryptographic key used for the encryption.
The length of the key needs to be of 128, 192 or 256 bits, in a string format.Note: If the default anonymizer is not provided, the default anonymizer is "replace" for all entities. The replacing value will be the entity type e.g.: <PHONE_NUMBER>
As the input text could potentially have overlapping PII entities, there are different anonymization scenarios:
I'm George Washington Square Park.
Assuming one entity is George Washington
and the other is Washington State Park
and assuming the default anonymizer, the result would beI'm <PERSON><LOCATION>.
Additional examples for overlapping PII scenarios:
Text:
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is:
03-232323.
Inigo
is recognized as NAME:My name is <NAME> Montoya. You Killed my Father. Prepare to die. BTW my number is:
03-232323.
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <
PHONE_NUMBER>.
My name is <NAME>. You Killed my Father. Prepare to die. BTW my number is: 03-232323.
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <
PHONE_NUMBER><SSN>.
Presidio deanonymizer currently contains one operator:
key
- a cryptographic key used for the encryption.
The length of the key needs to be of 128, 192 or 256 bits, in a string format.Please notice: you can use "DEFAULT" as an operator key to define an operator over all entities.
To install Presidio Anonymizer, run the following, preferably in a virtual environment:
pip install presidio-anonymizer
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
# Initialize the engine with logger.
engine = AnonymizerEngine()
# Invoke the anonymize function with the text,
# analyzer results (potentially coming from presidio-analyzer) and
# Operators to get the anonymization output:
result = engine.anonymize(
text="My name is Bond, James Bond",
analyzer_results=[
RecognizerResult(entity_type="PERSON", start=11, end=15, score=0.8),
RecognizerResult(entity_type="PERSON", start=17, end=27, score=0.8),
],
operators={"PERSON": OperatorConfig("replace", {"new_value": "BIP"})},
)
print(result)
This example take the output of the AnonymizerEngine with encrypted PII entities, and decrypt it back to the original text:
from presidio_anonymizer import DeanonymizeEngine
from presidio_anonymizer.entities import OperatorResult, OperatorConfig
# Initialize the engine with logger.
engine = DeanonymizeEngine()
# Invoke the deanonymize function with the text, anonymizer results and
# Operators to define the deanonymization type.
result = engine.deanonymize(
text="My name is S184CMt9Drj7QaKQ21JTrpYzghnboTF9pn/neN8JME0=",
entities=[
OperatorResult(start=11, end=55, entity_type="PERSON"),
],
operators={"DEFAULT": OperatorConfig("decrypt", {"key": "WmZq4t7w!z%C&F)J"})},
)
print(result)
In folder presidio/presidio-anonymizer run:
docker-compose up -d
Follow the API Spec for the Anonymizer REST API reference details