Python › Regex for Pentesters

The re module for pentesters

3 min read Intermediate 4 sections

Regex is the extraction engine of security automation, and Python’s re module is how you drive it from code. You already know patterns from the regex course; this lesson is about the Python side — which method to call, how matches come back, and the one rule that prevents most bugs.

You'll learn to

  • Use the core re methods correctly
  • Capture groups and extract just what you need
  • Avoid the backslash trap with raw strings

The methods you’ll use

Python’s re module gives you a handful of functions. The ones that matter for security work:

import re

text = "Authorization: Bearer eyJhbGci.payload.sig and id=42"

re.search(r"Bearer\s+(\S+)", text).group(1)   # 'eyJhbGci.payload.sig' — first match, group 1
re.findall(r"id=(\d+)", text)                  # ['42'] — every match of the group
re.sub(r"\d", "X", text)                       # replace every digit with X

# finditer keeps groups AND positions — best for structured extraction:
for m in re.finditer(r"(\w+)=(\w+)", text):
    print(m.group(1), m.group(2))               # key, value

search finds the first match anywhere; findall returns every match (just the group if you have one); finditer gives you match objects with groups and positions; sub rewrites. For extracting many things with groups, prefer finditer — its return type is predictable.

The one rule: always use raw strings

Write every pattern as a raw string with the r prefix: r"\d+", not "\d+". Without it, Python’s string parser consumes your backslashes before the regex engine ever sees them.

Compile patterns you reuse

# Compile once, use many times — faster in loops:
AWS = re.compile(r"\b(?:AKIA|ASIA)[0-9A-Z]{16}\b")
for filename in files:
    text = open(filename, errors="ignore").read()
    for hit in AWS.findall(text):
        print(filename, hit)

re.compile turns a pattern into a reusable object. When you run the same pattern across thousands of files or lines, compiling once is meaningfully faster than recompiling every call.

Checkpoint

Why must you write regex patterns as raw strings (r'...') in Python?

Try it yourself

Take a block of text containing a few fake tokens. Use re.findall with a raw-string pattern to extract them, then re.sub to redact them (replace with asterisks). Notice how findall returns the group when your pattern has one.

Key takeaways

  • search finds first, findall gets all, finditer gives groups+positions, sub rewrites.
  • Always write patterns as raw strings: r"...".
  • re.compile once and reuse for speed across many inputs.
  • The same patterns from the regex course become automated scanners here.

Quick quiz

Next, building custom web scanners that put these extraction skills to work against live targets.

Was this lesson helpful?