Regex › Language-Specific Implementations
Cross-language regex quirks
When reviewing source in many languages, you’ll meet their regex differences — and some of those differences are security-relevant. This lesson covers the cross-language quirks that matter: dangerous functions, anchor behaviour, and the features that turn a regex into a vulnerability.
You'll learn to
- Know the dangerous regex functions per language
- Watch anchor and multiline differences
- Spot regex-driven vulnerabilities in source
Dangerous regex functions
PHP: preg_replace with the /e modifier (old) = code execution!
preg_match without anchors = validation bypass
Perl: regex with (?{ ... }) embedded code = code execution
Java: Pattern with catastrophic backtracking = ReDoS (very common)
Ruby: ^ and $ match line boundaries, NOT string -> bypass with newlines
.NET: similar multiline anchor behaviour to watch
Some languages have regex features that execute code — PHP’s old preg_replace /e modifier and Perl’s embedded-code constructs are direct code-execution sinks when fed attacker input. These are high-severity finds in source review.
The Ruby/multiline anchor trap
In Ruby, ^ and $ anchor to LINE boundaries, not the whole string.
A validator: /^[a-z]+$/ in Ruby
Bypass: "safe\nmalicious" -> ^ and $ match the 'safe' line,
the malicious second line passes validation!
The string-anchoring equivalents are \A and \z.
Checkpoint
Why is a /^[a-z]+$/ validator a potential bypass in Ruby specifically, and what's the fix?
In Ruby, the ^ and $ anchors match the start and end of each line, not the whole string. So /^[a-z]+$/ only requires that some line of the input consists of lowercase letters — a multiline input like 'safe' followed by a newline and a malicious payload passes, because the anchors match the 'safe' line while the malicious line slips through. The fix is to use \A and \z, which anchor to the start and end of the entire string, forcing the whole input (including any newlines) to match the intended pattern.
Try it yourself
Explain why the same caret-dollar validation pattern is whole-string in Python but line-anchored in Ruby, and what a multiline bypass against the Ruby version looks like. Then name the two anchors Ruby uses for true string boundaries.
Key takeaways
- Some languages’ regex can execute code (PHP /e modifier, Perl embedded code).
- Ruby’s ^ and $ anchor to lines, not the string — a multiline bypass risk.
- Use \A and \z for true string anchoring in Ruby.
- Don’t assume a pattern safe in one language behaves the same in another.
Quick quiz
Next, regex engine internals — how NFA, DFA, and backtracking actually work under the hood.