Regex › Regex Bypass Techniques

Bypass techniques and the filter-interpreter gap

4 min read Advanced 3 sections

Almost every filter bypass comes from one idea: the filter and the thing it’s protecting read input differently. The filter sees a string; the database, browser, or shell sees something else after decoding. Exploiting that gap is the heart of bypass technique. This lesson names the gaps systematically.

You'll learn to

  • Understand the filter-interpreter gap
  • Apply the main bypass families
  • See why normalisation is the defence

The core idea

A filter checks input as text. But the input then passes to an interpreter — a SQL engine, an HTML parser, a shell — that may decode or normalise it first. If the filter and the interpreter disagree about what the input means, you bypass the filter.

Filter sees:        %3Cscript%3E   (just a harmless-looking string)
Browser decodes to: <script>        (the actual attack)
-> filter blocked '<script>' but never saw it, because it was encoded

The bypass families

Encoding:     URL (%3C), HTML entity (&lt;), Unicode (\u003c), double-encoding
Case:         <ScRiPt>  when the filter is case-sensitive
Whitespace:   union/**/select, tab/newline where the filter expects spaces
Nesting:      <scr<script>ipt>  when a naive filter strips once and stops
Normalisation: Unicode forms that collapse to the dangerous char after the filter

Each family is a way the interpreter’s view differs from the filter’s. Encoding is the biggest: if the filter checks before decoding and the interpreter decodes after, the filter never sees the real payload.

Checkpoint

What single principle underlies almost every filter bypass?

Try it yourself

Take a filter that blocks the literal less-than sign. List how you’d represent that character so the filter misses it but the browser still interprets it — URL encoding, HTML entity, Unicode escape, and double encoding. Then state the one defensive change that defeats all of them.

Key takeaways

  • Bypasses exploit the gap between how a filter and an interpreter read input.
  • Families: encoding, case, whitespace, nesting, Unicode normalisation.
  • Encoding is biggest: filter checks before decode, interpreter acts after.
  • Defence: decode and normalise to final form first, then validate that.

Quick quiz

Next, ReDoS — when a regex pattern itself becomes a denial-of-service vulnerability.

Was this lesson helpful?