Regex › Regex Bypass Techniques

Bypass techniques and the filter-interpreter gap

4 min read Advanced 3 sections

Almost every filter bypass comes from one idea: the filter and the thing it’s protecting read input differently. The filter sees a string; the database, browser, or shell sees something else after decoding. Exploiting that gap is the heart of bypass technique. This lesson names the gaps systematically.

You'll learn to

Understand the filter-interpreter gap
Apply the main bypass families
See why normalisation is the defence

The core idea

A filter checks input as text. But the input then passes to an interpreter — a SQL engine, an HTML parser, a shell — that may decode or normalise it first. If the filter and the interpreter disagree about what the input means, you bypass the filter.

Filter sees:        %3Cscript%3E   (just a harmless-looking string)
Browser decodes to: <script>        (the actual attack)
-> filter blocked '<script>' but never saw it, because it was encoded

The bypass families

Encoding:     URL (%3C), HTML entity (&lt;), Unicode (\u003c), double-encoding
Case:         <ScRiPt>  when the filter is case-sensitive
Whitespace:   union/**/select, tab/newline where the filter expects spaces
Nesting:      <scr<script>ipt>  when a naive filter strips once and stops
Normalisation: Unicode forms that collapse to the dangerous char after the filter

Each family is a way the interpreter’s view differs from the filter’s. Encoding is the biggest: if the filter checks before decoding and the interpreter decodes after, the filter never sees the real payload.

Checkpoint

What single principle underlies almost every filter bypass?

Try it yourself

Take a filter that blocks the literal less-than sign. List how you’d represent that character so the filter misses it but the browser still interprets it — URL encoding, HTML entity, Unicode escape, and double encoding. Then state the one defensive change that defeats all of them.

Key takeaways

Bypasses exploit the gap between how a filter and an interpreter read input.
Families: encoding, case, whitespace, nesting, Unicode normalisation.
Encoding is biggest: filter checks before decode, interpreter acts after.
Defence: decode and normalise to final form first, then validate that.

Quick quiz

Next, ReDoS — when a regex pattern itself becomes a denial-of-service vulnerability.

Was this lesson helpful?