Updated on June 6, 2024

Identify and mask (Auto-detect)

This method automatically identifies and masks personal/sensitive data within specific sentences.

Endpoint and Authentication:

To use Protecto.ai’s token-based masking, you need to send a PUT request to the following endpoint:

Endpoint: https://trial.protecto.ai/api/vault/mask

For authentication, include the following token in the request headers:

Headers: {“Authorization”: “Bearer <AUTH_TOKEN>”}

Request Payload:

Here’s an example of a request payload to mask sensitive data:

Request Payload:
{"mask": [{ "value": "George Williams lives in Washington"}]}
Parameters for the Request Payload:
· value (string): Sensitive data value that needs to be masked.
Response :
Upon successful masking, you will receive a response with the masked data. Here’s an example of a response:
    "data": [
            "value": "George Williams lives in Washington",
            "token_value": "<PER>hSw8kAEB10 ITItAd8FsN</PER> lives in <ADDRESS>748785848000</ADDRESS>",
            "toxicity_analysis": {
                "toxicity": 0.00088834815,
                "severe_toxicity": 0.000104515464,
                "obscene": 0.00018257574,
                "threat": 0.0001108902,
                "insult": 0.00017547917,
                "identity_attack": 0.00013806517
            "individual_tokens": [
                    "value": "George Williams",
                    "pii_type": "PERSON",
                    "token": "hSw8kAEB10 ITItAd8FsN",
                    "prefix": "<PER>",
                    "suffix": "</PER>",
                    "start_pos": 0,
                    "end_pos": 15
                    "value": "Washington",
                    "pii_type": "GPE",
                    "token": "748785848000",
                    "prefix": "<ADDRESS>",
                    "suffix": "</ADDRESS>",
                    "start_pos": 25,
                    "end_pos": 35
    "success": true,
    "error": {
        "message": ""
We identify toxic content while masking. Click on ‘Identify Toxic Content‘ to learn more.
 Response Parameters:
  • data: An array containing the unmasked text along with additional analyses and individual tokens.
    • value: The original masked text.
    • token_value: The unmasked text with sensitive information replaced by tokenized values.
    • toxicity_analysis: Analysis of the toxicity levels in the unmasked text.
      • toxicity: Overall toxicity score.
      • severe_toxicity: Score indicating severe toxicity.
      • obscene: Score indicating obscenity.
      • threat: Score indicating threats.
      • insult: Score indicating insults.
      • identity_attack: Score indicating identity attacks.
    • individual_tokens: An array containing individual tokens extracted from the unmasked text, along with their types and positions.
      • value: The value of the token.
      • pii_type: The type of personally identifiable information (PII) detected (e.g., PERSON, GPE).
      • token: The tokenized representation of the value.
      • prefix: Prefix indicating the start of the token.
      • suffix: Suffix indicating the end of the token.
      • start_pos: Starting position of the token in the original text.
      • end_pos: Ending position of the token in the original text.
  • success: Boolean indicating the success of the request.
  • error: Object containing details of any errors encountered during the request.
    • message: Error message, if any.
Advantage :
This approach is particularly useful when users may not be aware of all the sensitive data present in a given text or when dealing with large volumes of data where manual identification is impractical. Auto-detect masking offers the following advantages:
  1. Effortless Sensitive Data Identification: Users do not need to manually identify and specify sensitive data elements within the text. The system automatically detects patterns and formats indicative of sensitive information, such as names, phone numbers, credit card numbers, etc
  2. Comprehensive Data Protection: Auto-detect masking ensures that all instances of sensitive data within a dataset are masked, providing a comprehensive approach to data security. This is especially valuable for scenarios where users may not be aware of all the potential data types that need protection
  3. Reduced Human Error: Automated detection reduces the risk of human errors that can occur during manual identification and specification of sensitive data. It enhances accuracy and consistency in data protection processes.
  4. Time and Resource Savings: Auto-detect masking saves time and effort, particularly when dealing with large amounts of data. Users do not need to spend time identifying each instance of sensitive data, allowing them to focus on other tasks.

What are your feelings?
Scroll to Top