Regex › Secret Discovery Regex
Writing high-precision secret patterns
Finding leaked secrets — API keys, tokens, private keys — is one of the highest-value things regex does for you. But a sloppy pattern buries one real key under a thousand false positives, and you stop looking. The skill is writing patterns precise enough to trust. This lesson teaches the recipe.
You'll learn to
- Build secret patterns on prefix, length, and character class
- Recognise the major credential formats
- Avoid the false positives that make scanning useless
The recipe: prefix + length + character class
The best secret patterns lean on three things a real key has and random text rarely does: a fixed prefix, a known length, and a specific character class.
AKIA[0-9A-Z]{16}
That matches an AWS access key: the literal prefix AKIA, then exactly sixteen characters that are each an uppercase letter or digit. Random text almost never has AKIA followed by exactly that shape, so false positives are rare. This is the template for every good secret pattern.
The major formats
AWS access key (AKIA|ASIA)[0-9A-Z]{16}
GitHub PAT ghp_[0-9A-Za-z]{36}
Google API key AIza[0-9A-Za-z_-]{35}
Stripe secret key sk_live_[0-9A-Za-z]{24,}
Slack token xox[baprs]-[0-9A-Za-z-]{10,}
JWT eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]*
Private key block -----BEGIN (RSA |EC |OPENSSH )?PRIVATE KEY-----
Each follows the recipe. The JWT pattern is worth a look: a JWT is three base64url chunks joined by dots, and it always starts eyJ (because that’s what {" base64-encodes to at the start of the header). That fixed start makes it findable.
The public-vs-secret distinction that matters
Not every key you find is a finding. Some are designed to be public.
sk_live_... Stripe SECRET key → a real finding
pk_live_... Stripe PUBLISHABLE key → public by design, usually NOT a finding
AIza... Google API key → often a public browser key, needs verification
ghp_... GitHub PAT → almost always a real finding
Running it in practice
# Pull all AWS keys out of a JavaScript bundle:
curl -s https://site.com/app.js | grep -oE "(AKIA|ASIA)[0-9A-Z]{16}"
# Scan a whole repo for several secret types:
grep -rEo "ghp_[0-9A-Za-z]{36}|AKIA[0-9A-Z]{16}|sk_live_[0-9A-Za-z]{24,}" .
Checkpoint
Why is a pattern that starts with the literal AKIA and then matches exactly sixteen uppercase-or-digit characters far more reliable than a pattern that just matches any twenty uppercase-alphanumeric characters?
The AKIA-prefixed pattern anchors on the fixed AWS prefix plus an exact length, so it matches real keys and almost nothing else. A pattern that matches any twenty uppercase-alphanumeric characters hits countless IDs, hashes, and tokens — a flood of false positives.
Try it yourself
Take this text and mentally apply the patterns: “config: AKIAIOSFODNN7EXAMPLE, token: ghp_1234567890abcdef1234567890abcdef1234, pubkey: pk_live_abc”. Which two are real secret findings, and which is public by design? (Answer: the AKIA key and the ghp_ token are findings; the pk_live_ key is publishable and usually not.)
Summary
Good secret patterns combine a fixed prefix, a known length, and a specific character class — the recipe behind AKIA[0-9A-Z]{16} and every reliable detector. Learn the major formats (AWS, GitHub, Google, Stripe, Slack, JWT, private keys), anchor on their prefixes, and distinguish secret keys (sk_live_, ghp_, AKIA) from publishable ones (pk_live_, browser keys) that aren’t findings. Precision beats coverage: avoid over-broad patterns that bury real keys in noise.
Key takeaways
- Build patterns on prefix + length + character class.
- Anchor on fixed prefixes (
AKIA,ghp_,eyJ) to kill false positives. - Not every key is a finding — publishable keys are public by design.
- An over-broad pattern is worse than useless; precision is the whole point.
Quick quiz
Next module, you’ll wire these patterns into recon pipelines that scan whole sites and codebases automatically.