Regex › Web Application Security Regex
Validation patterns and why regex controls fail
Regex is the workhorse of input validation and filtering, from form checks to WAF rules. But using regex as a security control fails in predictable ways. This lesson is about those failure modes — the gaps that turn a ‘blocklist’ into a bypass.
You'll learn to
- Recognise allowlist vs blocklist validation
- Name the common regex-control failures
- Read a filter to find its gap
Allowlist beats blocklist
There are two ways to validate: define what’s allowed (allowlist) or what’s forbidden (blocklist). Allowlists are far safer.
Allowlist: ^[a-zA-Z0-9_]{3,20}$ only these chars, this length — safe
Blocklist: /<script>/i reject blocks one thing, misses a thousand variants
An allowlist says ‘only these exact characters in this exact shape are valid’ — anything else is rejected, so attacks have nowhere to hide. A blocklist tries to enumerate bad input and inevitably misses variants.
The classic failure modes
Missing anchors: /[a-z]+/ passes any string CONTAINING lowercase
Case sensitivity: /<script>/ misses <ScRiPt>
Encoding: blocks "<" but not "%3C" or "<"
Incomplete list: blocks <script> but not <img onerror=> or <svg onload=>
Multiline: ^/$ match per-line, so a 2nd line sneaks past
Each is a real bypass. A blocklist for <script> misses <img onerror=>. A filter for < misses its URL-encoded form %3C. A case-sensitive pattern misses mixed case. These gaps are where payloads get through.
Checkpoint
Why is an allowlist regex generally safer than a blocklist regex for security validation?
An allowlist defines exactly what is permitted and rejects everything else, so an attacker has no room to slip through — any input not matching the allowed shape is denied. A blocklist tries to enumerate forbidden patterns, but attackers can always find a variant the author didn't list (different case, encoding, or syntax). Since you can't anticipate every malicious form, blocklists are inherently leaky while allowlists fail closed.
Try it yourself
Take a blocklist filter that rejects the literal script tag in lowercase. List five ways you might bypass it (think case, encoding, alternative tags, and anchoring). Then rewrite the protection as an allowlist and explain why the bypasses no longer apply.
Key takeaways
- Allowlists (define what’s allowed) fail closed; blocklists leak.
- Common failures: missing anchors, case, encoding, incomplete lists, multiline.
- Hitting a filter? Try case, encodings, and alternative syntaxes.
- Defenders: allowlist, anchor, normalise input, don’t rely on regex alone.
Quick quiz
Next, source-review regex — finding hardcoded secrets across many languages at once.