Data ScanningOverview

Data Scanning

Discover, classify, and validate personal data across structured data sources — without masking. Data Scanning runs asynchronously and requires a paid subscription.

Data Scanning APIs are used to discover, classify, and validate personal data across structured data sources such as databases and warehouses.

Data Scanning is not available on trial accounts. A paid subscription is required.

What data scanning does

Data Scanning APIs do not mask data. They help you answer:

  • Which tables contain PII

  • What type of PII exists per column

  • How confident the system is about detection

  • Where ML detection needs manual correction

Execution model

Data scanning runs asynchronously. You submit a scan job, it runs in the background, and you retrieve results when complete.

Typical workflow

Submit scan

Use the Data Scan Async API to submit one or more objects for scanning. You receive a tracking_id.

Track progress

Poll the Scan Status API using the tracking_id until status is SUCCESS.

Explore objects

Use List Scan Objects to browse the scanned data source hierarchy.

Inspect results

Use Scan Details to view column-level PII detection results with confidence percentages.

Tune detection

Use Update Scan Conclusions to adjust the confidence threshold used to classify columns as PII.

Correct ML output

Use Update or Delete Detected Entities to manually override incorrect ML results.

API reference