Regex › Regex Fundamentals
Anchors and boundaries
So far your patterns match anywhere in a string. Anchors change that — they pin a pattern to a position: the start of a line, the end, or a word boundary. This sounds small, but it’s the mechanism behind input validation, and understanding it precisely is how you find validation bypasses.
You'll learn to
- Pin patterns to the start and end with anchors
- Match whole words with boundaries
- See how missing anchors create validation bypasses
The anchors
^ start of the string (or line)
$ end of the string (or line)
\b a word boundary (edge between a word char and a non-word char)
\B NOT a word boundary
Anchors don’t match characters — they match positions. ^ matches the position before the first character; $ matches the position after the last.
^cat matches "cat" only at the start: "cat dog" yes, "a cat" no
cat$ matches "cat" only at the end: "a cat" yes, "cat dog" no
^cat$ matches ONLY the exact string "cat", nothing more
\bcat\b matches "cat" as a whole word: "the cat" yes, "category" no
That last distinction matters constantly. \bcat\b matches the word “cat” but not “cat” inside “category” or “concatenate” — the word boundaries pin it to a standalone word.
Why anchors are the heart of validation
When an app validates input — “is this a valid username?” — it uses anchors to mean “the whole input must match, start to end”:
^[a-zA-Z0-9_]{3,20}$
Read it: from the start (^), three to twenty characters that are letters, digits, or underscore, to the end ($). The anchors are what make it “the entire input is valid,” not “the input contains something valid somewhere.”
The classic missing-anchor bypass
Intended: ^[a-z]+$ "the whole input is lowercase letters"
Mistake: [a-z]+ "the input contains some lowercase letters"
With the un-anchored version, an input like a malicious payload that merely contains some lowercase letters passes the check, because the regex found its match somewhere inside. The validation looks correct in testing (valid inputs pass) but is wide open, because it never required the match to cover the whole string.
Checkpoint
A username validator uses the pattern [a-z]+ with no anchors. Why might an attacker-controlled value that isn't all lowercase letters still pass this check?
Without ^ and $ anchors, the pattern only requires that some run of lowercase letters appears somewhere in the input — not that the whole input matches. So a value containing other characters (including an attack payload) passes as long as it has at least one lowercase letter in it. The fix is to anchor both ends: ^[a-z]+$.
Try it yourself
Mentally test the pattern that requires start, then one or more lowercase letters, then end, against three inputs: “hello”, “Hello”, and “hello world”. Which pass and which fail, and why? Then drop the anchors and re-test — notice how the results change.
Summary
Anchors match positions, not characters: ^ (start), $ (end), \b (word boundary). They’re what make a validation pattern mean “the whole input matches” rather than “something valid appears somewhere.” A pattern missing ^ and $ is the classic validation bypass — it accepts any input that merely contains a valid piece. Anchor behaviour also varies by language (line vs whole-string), a recurring vulnerability source.
Key takeaways
^and$anchor to start and end;\bmarks word boundaries.- Proper validation anchors both ends:
^...$means the entire input must match. - A missing-anchor check accepts payloads that merely contain a valid fragment.
- Anchor semantics differ across languages — a known bypass area.
Quick quiz
Next, quantifiers in depth — greedy versus lazy matching, and how getting it wrong leads to both bugs and denial-of-service.