Regex › Web Application Security Regex

Validation patterns and why regex controls fail

3 min read Intermediate 3 sections

Regex is the workhorse of input validation and filtering, from form checks to WAF rules. But using regex as a security control fails in predictable ways. This lesson is about those failure modes — the gaps that turn a ‘blocklist’ into a bypass.

You'll learn to

Recognise allowlist vs blocklist validation
Name the common regex-control failures
Read a filter to find its gap

Allowlist beats blocklist

There are two ways to validate: define what’s allowed (allowlist) or what’s forbidden (blocklist). Allowlists are far safer.

Allowlist:  ^[a-zA-Z0-9_]{3,20}$    only these chars, this length — safe
Blocklist:  /<script>/i  reject     blocks one thing, misses a thousand variants

An allowlist says ‘only these exact characters in this exact shape are valid’ — anything else is rejected, so attacks have nowhere to hide. A blocklist tries to enumerate bad input and inevitably misses variants.

The classic failure modes

Missing anchors:   /[a-z]+/  passes any string CONTAINING lowercase
Case sensitivity:  /<script>/  misses <ScRiPt>
Encoding:          blocks "<" but not "%3C" or "&lt;"
Incomplete list:   blocks <script> but not <img onerror=> or <svg onload=>
Multiline:         ^/$ match per-line, so a 2nd line sneaks past

Each is a real bypass. A blocklist for <script> misses <img onerror=>. A filter for < misses its URL-encoded form %3C. A case-sensitive pattern misses mixed case. These gaps are where payloads get through.

Checkpoint

Why is an allowlist regex generally safer than a blocklist regex for security validation?

Try it yourself

Take a blocklist filter that rejects the literal script tag in lowercase. List five ways you might bypass it (think case, encoding, alternative tags, and anchoring). Then rewrite the protection as an allowlist and explain why the bypasses no longer apply.

Key takeaways

Allowlists (define what’s allowed) fail closed; blocklists leak.
Common failures: missing anchors, case, encoding, incomplete lists, multiline.
Hitting a filter? Try case, encodings, and alternative syntaxes.
Defenders: allowlist, anchor, normalise input, don’t rely on regex alone.

Quick quiz

Next, source-review regex — finding hardcoded secrets across many languages at once.

Was this lesson helpful?