Detecting 68 Technical Secret Patterns: API Keys to Database URIs
Research Source
Developers and DevOps engineers paste code snippets, configuration files, and log outputs into AI chat interfaces and documents. These contain API keys, database connection strings, cloud credentials, and authentication tokens. Standard PII detection focuses on personal data (names, emails, SSNs) but misses technical secrets that are equally or more damaging when exposed.
Executive Summary
Standard PII detection catches names and emails but misses API keys, cloud credentials, and database connection strings. These technical secrets are pasted into AI chats and documents daily.
cloak.business detects 68 technical secret patterns across major platforms: AWS access keys, GCP service account keys, Azure connection strings, OpenAI API keys, Anthropic keys, Stripe keys, GitHub tokens, database URIs, JWT tokens, SSH private keys, and more.
The Problem: Technical Secrets are PII's Dangerous Cousin
A leaked AWS access key can cost an organization thousands in minutes (crypto mining on hijacked instances). A leaked database URI exposes every record in the database. A leaked OpenAI API key racks up charges and exposes conversation history. These secrets appear in code snippets pasted into ChatGPT, in configuration files attached to support tickets, in documentation shared with contractors, and in stack traces included in bug reports. Traditional PII detection — focused on names, addresses, and government IDs — does not detect these patterns.
Irreducible truth: Any credential that grants access to a system is as sensitive as the data that system protects. An AWS key to a database containing PII is functionally equivalent to possessing all the PII in that database. Secret detection must be part of PII detection.
The Solution: How cloak.business Addresses This
68 Platform-Specific Patterns
cloak.business detects secrets for: AWS (access keys, secret keys, session tokens), GCP (API keys, service account JSON, OAuth tokens), Azure (connection strings, SAS tokens, AD tokens), OpenAI (API keys), Anthropic (API keys), Stripe (publishable/secret keys, webhook secrets), GitHub (personal access tokens, OAuth, app tokens), GitLab, Bitbucket, Docker Hub, npm, PyPI, and 50+ more platforms.
Pattern Validation
Each secret pattern includes format validation beyond simple regex. AWS access keys must start with AKIA and be exactly 20 characters. Stripe keys must start with sk_live_ or pk_live_. GitHub tokens must match the gh{p,o,u,s,r}_ prefix format. This validation minimizes false positives — random strings are not flagged as secrets.
Integration with PII Detection
Secret detection runs alongside standard PII detection in a single API call. The same /api/presidio/analyze endpoint detects both a customer's SSN and a developer's AWS key in the same document. No separate tool or configuration needed.
Compliance Mapping
This feature addresses SOC 2 Type II (credential management controls), PCI-DSS Requirement 6.5.3 (secure credential storage), ISO 27001 Annex A.9 (access control — leaked credentials are access control failures), and NIST 800-53 (IA-5 authenticator management).
cloak.business's GDPR, HIPAA, PCI-DSS, ISO 27001, SOC 2 compliance coverage, combined with Customer-selected hosting, provides documented technical measures organizations can reference in their compliance documentation.
Product Specifications
| Specification | Value |
|---|---|
| Entity Types | 320+ |
| Detection | 3-layer hybrid: Presidio + NLP + Stance classification |
| Test Coverage | 100% (419/419 tests) |
| Languages | 48 |
| Anonymization Methods | Replace, Redact, Mask, Hash, Encrypt (AES-256-GCM), RSA-4096 Asymmetric, Keep |
| Platforms | Web App, REST API, SDKs (JavaScript, Python), Cloud Storage Add-ins, Nextcloud |
| Pricing | Enterprise (custom) |
| Hosting | Customer-selected |
| Compliance | GDPR, HIPAA, PCI-DSS, ISO 27001, SOC 2 |