A dataset can be called "anonymous" only if its content cannot be related to a person, not by any means and not even ex post or by combination with other information. Free text entries highly impede "factual anonymization" for secondary research.
This research paper examines a critical privacy challenge related to COMPLEXITY CASCADE — pii protection requires perfection across all layers simultaneously.
cloak.business addresses this through zero-storage in-memory architecture with self-hosted NLP models, simplifying the stack by eliminating storage and third-party dependency layers.
PII protection requires perfection across ALL layers simultaneously. One failure anywhere collapses everything. The attacker needs to find ONE weakness; the defender must protect ALL layers with zero failures.
Irreducible truth: Protection = Layer1 × Layer2 × ... × LayerN. Any zero makes the product zero. The attacker gets to choose which layer to attack. The defender must achieve perfection across all of them simultaneously, forever.
cloak.business identifies 390+ entity types including message content, contact names, conversation metadata, attachment identifiers. The dual-layer (317 custom regex + NLP) architecture uses 317 custom regex recognizers with context word analysis and confidence scoring 0.0–1.0 for structured identifiers and spaCy (25 languages) + Stanza (7 languages) + XLM-RoBERTa (16 languages) — all self-hosted for contextual references.
Encrypt is recommended for this pain point: AES-256-GCM encryption in backups provides protection that persists even if backup systems lack encryption. Redact provides an alternative — removing PII from messages before backup prevents unencrypted-backup exposure regardless of backup encryption status. For permanent removal, Redact ensures data cannot be recovered under any circumstances.
Zero-storage microservices process all data in-memory with no disk writes. All NLP models are self-hosted on German servers — no third-party API calls. Data residency is Germany-only.
This pain point intersects with GDPR Article 32 encryption as security measure, Article 5(1)(f) confidentiality.
cloak.business’s GDPR (Article 25 Privacy by Design), ISO 27001:2022 compliance coverage, combined with Germany only, no third-party transfers, ISO 27001:2022 certified hosting, provides documented technical measures organizations can reference in their compliance documentation and regulatory submissions.
| Specification | Value |
|---|---|
| Platform Version | Analyzer 6.9.1, Image Redactor 5.3.0 |
| Entity Types | 390+ (519 documented) |
| Detection Layers | 317 custom regex + 3 NLP engines (all self-hosted) |
| Languages | 48 UI languages, 37 OCR language packs |
| Anonymization Methods | Replace, Redact, Mask, Hash (SHA-256), Encrypt (AES-256-GCM) |
| Architecture | Zero-storage microservices (in-memory only) |
| Integration Points | Web App, Desktop, Office Add-in, MCP Server (9 tools), REST API |
| Hosting | Germany only, ISO 27001:2022, no third-party transfers |
| Compliance | GDPR Article 25, ISO 27001:2022 |