Why Deterministic Detection Matters

317 regex patterns deliver 100% reproducible results. Same input, same output, every time. Perfect for compliance audits.

Try the Scanner →

AI Detection Results Vary

Most PII detection tools use AI/ML models that produce probabilistic results. Run the same document twice, get different answers. Explain that to an auditor.

When regulators ask "How did you identify this data as personal information?", you need a clear, repeatable answer. Not "the model thought so."

✓ Deterministic (Regex)

  • Same input = same output, always
  • Fully auditable pattern rules
  • No model drift over time
  • Explainable to regulators
  • 100% reproducible results

✗ Probabilistic (AI/ML)

  • Results vary between runs
  • Black box decision making
  • Model drift over updates
  • Hard to explain to auditors
  • Confidence scores, not certainty

317 Pattern Recognizers

cloak.business uses 317 deterministic regex patterns for structured data like IDs, tax numbers, credit cards, IBANs, and email addresses. NLP models supplement for names and locations.

317
Regex Patterns
390+
Entity Types
70+
Countries
48
Languages

Built on Microsoft Presidio with custom recognizers optimized for global PII formats. ISO 27001:2022 certified servers in Germany. Data never leaves EU jurisdiction.

Benefits for Compliance Teams

📊
Audit-Ready Results Pattern-based detection produces documented, repeatable outcomes that satisfy GDPR Article 30 record-keeping requirements.
🔒
Regulatory Transparency Explain exactly why data was classified as PII. No black boxes. Auditors can verify detection rules independently.
No Model Drift Regex patterns don't change unless you update them. AI models drift over time, changing results unpredictably.
🎯
Higher Accuracy for Structured Data 317 custom recognizers with checksum validation achieve 82% higher accuracy than generic ML models for IDs and numbers.

Regex + NLP Hybrid Approach

Structured data (emails, SSNs, credit cards, IBANs) uses deterministic regex patterns. 100% reproducible. Perfect for compliance.

Unstructured data (names, organizations, locations) uses NLP models (spaCy, Stanza, XLM-RoBERTa) with confidence scores. All processing on German servers—no third-party AI services.

Five anonymization methods: Replace, Redact, Mask, Hash (SHA-256), or Encrypt (AES-256-GCM).

Try the PII Website Scanner

Scan any website for exposed personal information. Free tier includes 200 tokens monthly.

Common Questions About Detection

What is deterministic PII detection?
Deterministic detection uses explicit regex patterns to identify PII. The same input always produces the same output—no variation, no surprises. This makes results fully auditable for compliance purposes.
Why is deterministic better than AI/ML for compliance?
AI/ML models produce probabilistic results that can vary between runs. Deterministic patterns give 100% reproducible results that auditors and regulators can verify independently. When a DPA asks "how did you identify this?", you have a documented answer.
How many entity types can be detected?
cloak.business detects 390+ entity types across 70+ countries using 317 deterministic pattern recognizers, supplemented by NLP models for names and locations. Entity types include SSNs, tax IDs, passport numbers, credit cards, IBANs, driver's licenses, and more.
Where is data processed?
All processing occurs on ISO 27001:2022 certified servers in Germany (Hetzner infrastructure). Data never leaves EU jurisdiction. No third-party AI services are used. Original text is processed in-memory and never stored.
What anonymization methods are available?
Five methods: Replace (swap with placeholder), Redact (remove entirely), Mask (partial hiding like ****1234), Hash (SHA-256), or Encrypt (AES-256-GCM reversible encryption for legal discovery scenarios).

Further Reading