Regex › Regex Engine Internals
Regex engine internals: NFA, DFA, and backtracking
To truly understand regex performance and ReDoS, you need to know how engines work underneath. There are two fundamental approaches — backtracking and automaton-based — and the choice explains everything from why some features exist to why some patterns explode. This lesson is the theory that makes the rest click.
You'll learn to
- Distinguish backtracking from automaton engines
- See why each has its trade-offs
- Connect internals to ReDoS and RE2
Two kinds of engine
Backtracking (NFA-based): PCRE, Python re, JS, Java, .NET, Perl
- tries one path, backs up on failure, tries another
- SUPPORTS backreferences and lookarounds
- CAN explode exponentially (catastrophic backtracking / ReDoS)
Automaton (DFA-based): RE2 (Go), Rust regex, grep (often)
- tracks all possible matches simultaneously
- LINEAR time always, no catastrophic backtracking
- CANNOT do backreferences or lookarounds
The split is fundamental. Backtracking engines explore match paths one at a time and retreat when stuck — flexible (they support backreferences and lookarounds) but vulnerable to exponential blowup. Automaton engines track all possibilities at once — always linear-time, but unable to support features that need backtracking.
Why this explains everything
Checkpoint
Why can a backtracking engine suffer catastrophic backtracking while an automaton engine like RE2 cannot?
A backtracking engine explores possible match paths one at a time, retreating and trying another whenever it hits a dead end. With certain patterns (like nested quantifiers) the number of distinct paths grows exponentially with input length, so on a near-miss input the engine tries them all — catastrophic backtracking. An automaton (DFA-based) engine like RE2 instead tracks all possible match states simultaneously as it scans the input once, so its work is bounded by the input length (linear time) and there are no alternative paths to explode. The trade is that the automaton approach can't support features like backreferences that fundamentally require backtracking.
Try it yourself
Explain in your own words why backreferences exist only in backtracking engines and not in RE2, and why ReDoS is impossible in an automaton engine. Then state, for a server-side validator over untrusted input, which engine type you’d choose and what you’d give up.
Key takeaways
- Backtracking (NFA) engines try paths and retreat — flexible but can explode.
- Automaton (DFA) engines track all states at once — linear time, fewer features.
- Backreferences and lookarounds need backtracking; RE2 lacks them by design.
- Engine choice is a security decision: guaranteed performance vs full features.
Quick quiz
Finally, the expert offensive synthesis — how elite practitioners wield regex across every security role.