Regex › Regex Fundamentals

Anchors and boundaries

5 min read Beginner 5 sections

So far your patterns match anywhere in a string. Anchors change that — they pin a pattern to a position: the start of a line, the end, or a word boundary. This sounds small, but it’s the mechanism behind input validation, and understanding it precisely is how you find validation bypasses.

You'll learn to

  • Pin patterns to the start and end with anchors
  • Match whole words with boundaries
  • See how missing anchors create validation bypasses

The anchors

^      start of the string (or line)
$      end of the string (or line)
\b     a word boundary (edge between a word char and a non-word char)
\B     NOT a word boundary

Anchors don’t match characters — they match positions. ^ matches the position before the first character; $ matches the position after the last.

^cat       matches "cat" only at the start: "cat dog" yes, "a cat" no
cat$       matches "cat" only at the end:   "a cat" yes, "cat dog" no
^cat$      matches ONLY the exact string "cat", nothing more
\bcat\b    matches "cat" as a whole word: "the cat" yes, "category" no

That last distinction matters constantly. \bcat\b matches the word “cat” but not “cat” inside “category” or “concatenate” — the word boundaries pin it to a standalone word.

Why anchors are the heart of validation

When an app validates input — “is this a valid username?” — it uses anchors to mean “the whole input must match, start to end”:

^[a-zA-Z0-9_]{3,20}$

Read it: from the start (^), three to twenty characters that are letters, digits, or underscore, to the end ($). The anchors are what make it “the entire input is valid,” not “the input contains something valid somewhere.”

The classic missing-anchor bypass

Intended:  ^[a-z]+$        "the whole input is lowercase letters"
Mistake:   [a-z]+          "the input contains some lowercase letters"

With the un-anchored version, an input like a malicious payload that merely contains some lowercase letters passes the check, because the regex found its match somewhere inside. The validation looks correct in testing (valid inputs pass) but is wide open, because it never required the match to cover the whole string.

Checkpoint

A username validator uses the pattern [a-z]+ with no anchors. Why might an attacker-controlled value that isn't all lowercase letters still pass this check?

Try it yourself

Mentally test the pattern that requires start, then one or more lowercase letters, then end, against three inputs: “hello”, “Hello”, and “hello world”. Which pass and which fail, and why? Then drop the anchors and re-test — notice how the results change.

Summary

Anchors match positions, not characters: ^ (start), $ (end), \b (word boundary). They’re what make a validation pattern mean “the whole input matches” rather than “something valid appears somewhere.” A pattern missing ^ and $ is the classic validation bypass — it accepts any input that merely contains a valid piece. Anchor behaviour also varies by language (line vs whole-string), a recurring vulnerability source.

Key takeaways

  • ^ and $ anchor to start and end; \b marks word boundaries.
  • Proper validation anchors both ends: ^...$ means the entire input must match.
  • A missing-anchor check accepts payloads that merely contain a valid fragment.
  • Anchor semantics differ across languages — a known bypass area.

Quick quiz

Next, quantifiers in depth — greedy versus lazy matching, and how getting it wrong leads to both bugs and denial-of-service.

Was this lesson helpful?