Regex › Secret Discovery Regex

Writing high-precision secret patterns

5 min read Intermediate 6 sections

Finding leaked secrets — API keys, tokens, private keys — is one of the highest-value things regex does for you. But a sloppy pattern buries one real key under a thousand false positives, and you stop looking. The skill is writing patterns precise enough to trust. This lesson teaches the recipe.

You'll learn to

Build secret patterns on prefix, length, and character class
Recognise the major credential formats
Avoid the false positives that make scanning useless

The recipe: prefix + length + character class

The best secret patterns lean on three things a real key has and random text rarely does: a fixed prefix, a known length, and a specific character class.

AKIA[0-9A-Z]{16}

That matches an AWS access key: the literal prefix AKIA, then exactly sixteen characters that are each an uppercase letter or digit. Random text almost never has AKIA followed by exactly that shape, so false positives are rare. This is the template for every good secret pattern.

The major formats

AWS access key      (AKIA|ASIA)[0-9A-Z]{16}
GitHub PAT          ghp_[0-9A-Za-z]{36}
Google API key      AIza[0-9A-Za-z_-]{35}
Stripe secret key   sk_live_[0-9A-Za-z]{24,}
Slack token         xox[baprs]-[0-9A-Za-z-]{10,}
JWT                  eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]*
Private key block    -----BEGIN (RSA |EC |OPENSSH )?PRIVATE KEY-----

Each follows the recipe. The JWT pattern is worth a look: a JWT is three base64url chunks joined by dots, and it always starts eyJ (because that’s what {" base64-encodes to at the start of the header). That fixed start makes it findable.

The public-vs-secret distinction that matters

Not every key you find is a finding. Some are designed to be public.

sk_live_...   Stripe SECRET key  → a real finding
pk_live_...   Stripe PUBLISHABLE key → public by design, usually NOT a finding
AIza...       Google API key → often a public browser key, needs verification
ghp_...       GitHub PAT → almost always a real finding

Running it in practice

# Pull all AWS keys out of a JavaScript bundle:
curl -s https://site.com/app.js | grep -oE "(AKIA|ASIA)[0-9A-Z]{16}"

# Scan a whole repo for several secret types:
grep -rEo "ghp_[0-9A-Za-z]{36}|AKIA[0-9A-Z]{16}|sk_live_[0-9A-Za-z]{24,}" .

Checkpoint

Why is a pattern that starts with the literal AKIA and then matches exactly sixteen uppercase-or-digit characters far more reliable than a pattern that just matches any twenty uppercase-alphanumeric characters?

Try it yourself

Take this text and mentally apply the patterns: “config: AKIAIOSFODNN7EXAMPLE, token: ghp_1234567890abcdef1234567890abcdef1234, pubkey: pk_live_abc”. Which two are real secret findings, and which is public by design? (Answer: the AKIA key and the ghp_ token are findings; the pk_live_ key is publishable and usually not.)

Summary

Good secret patterns combine a fixed prefix, a known length, and a specific character class — the recipe behind AKIA[0-9A-Z]{16} and every reliable detector. Learn the major formats (AWS, GitHub, Google, Stripe, Slack, JWT, private keys), anchor on their prefixes, and distinguish secret keys (sk_live_, ghp_, AKIA) from publishable ones (pk_live_, browser keys) that aren’t findings. Precision beats coverage: avoid over-broad patterns that bury real keys in noise.

Key takeaways

Build patterns on prefix + length + character class.
Anchor on fixed prefixes (AKIA, ghp_, eyJ) to kill false positives.
Not every key is a finding — publishable keys are public by design.
An over-broad pattern is worse than useless; precision is the whole point.

Quick quiz

Next module, you’ll wire these patterns into recon pipelines that scan whole sites and codebases automatically.

Was this lesson helpful?