Regex › DOM XSS Discovery using Regex
Discovering DOM XSS with regex
DOM XSS happens when attacker-controllable data (a source) reaches a dangerous sink unsanitised. Regex can find both ends across a whole bundle, narrowing thousands of lines to the few worth tracing by hand. This lesson is regex-assisted DOM XSS discovery.
You'll learn to
- Match DOM XSS sources and sinks
- Narrow a bundle to candidate flows
- Know regex's limit and what comes next
Patterns for sources and sinks
SOURCES (attacker-controllable input):
location\.(hash|search|href|pathname)
document\.(URL|documentURI|referrer|cookie)
window\.name
(?:get|post)Message|addEventListener\(["']message
SINKS (where data becomes code/HTML):
\.innerHTML\s*=
\.outerHTML\s*=
document\.write\(
\beval\(
\.insertAdjacentHTML\(
(?:setTimeout|setInterval)\(\s*[a-zA-Z_$]
One set of patterns finds where untrusted data enters (the URL, postMessage, cookies); another finds where data executes (innerHTML, eval, document.write). Run both over a bundle and you have the two ends of every potential flow.
The workflow
1. grep the bundle for SINKS -> the dangerous write points
2. grep for SOURCES -> where attacker data enters
3. For each sink, read nearby code: does a source reach it
without sanitisation? That's a candidate DOM XSS.
Checkpoint
Regex finds DOM XSS sources and sinks but can't confirm the vulnerability. What's the missing step, and why can't regex do it?
The missing step is data-flow analysis: confirming that a specific source's data actually reaches a specific sink without being sanitised along the way. Regex only locates where sources and sinks appear in the text — it can't follow how a value moves through variables, function calls, and conditionals between them, or whether sanitisation happens in between. That tracing requires reading the code's logic (or using an AST-based tool that understands structure), which is why regex narrows the candidates but a human or structural analysis confirms the bug.
Try it yourself
List three DOM XSS source patterns and three sink patterns. Then describe the workflow: which do you search for first and why, and what you check in the surrounding code once you have a sink and a nearby source.
Key takeaways
- Source patterns find where attacker data enters; sink patterns find where it executes.
- Run both over a bundle to get the two ends of every potential flow.
- Regex narrows thousands of lines to a few candidates — it doesn’t confirm.
- Start from sinks, trace back to sources; confirmation is data-flow work.
Quick quiz
Next, analysing WAF rules — reading the filters that block you, to find their gaps.