Regex › Python Regex for Security Automation
Python regex for security automation
Python is where you turn regex into automation, and the re module is the tool. If you’ve done the Python course this overlaps, but here the focus is the regex side: getting the most out of re for secret discovery, recon, and log parsing.
You'll learn to
- Use the re methods for extraction
- Apply named groups for readable parsing
- Compile patterns for repeated use
The methods, regex-focused
import re
re.findall(r"(\w+)=(\w+)", text) # list of group tuples
re.finditer(r"AKIA[0-9A-Z]{16}", text) # match objects with positions
re.sub(r"\d{16}", "[REDACTED]", text) # redact card-like numbers
re.split(r"[,;\s]+", text) # split on any separator run
For security work finditer is often best — it gives you each match with its position and groups, ideal for structured extraction. Always raw strings (r"...") so backslashes reach the engine intact.
Named groups make parsing readable
# Parse a log line into named fields:
m = re.match(r"(?P<ip>\S+) .* \[(?P<time>[^\]]+)\] \"(?P<method>\S+) (?P<path>\S+)", line)
if m:
print(m.group("ip"), m.group("method"), m.group("path"))
Named groups, written with the (?P<name>…) syntax, let you pull fields by name instead of number — far more readable when parsing structured text like logs, and the standard way to turn a messy line into clean fields.
Checkpoint
What's the difference between re.match and re.search in Python, and which finds a pattern anywhere in a string?
re.search scans the whole string and finds the pattern anywhere in it. re.match only matches at the very beginning of the string — if the pattern doesn't start at position zero, match returns None even if it appears later. So re.search is what you use to find a pattern anywhere; re.match is only for checking whether a string starts with the pattern. Confusing the two is a common cause of patterns silently not matching.
Try it yourself
Write a pattern with named groups that captures the IP and the request method from a web log line, and show how you’d read each by name. Then note whether you’d use re.match or re.search to find an error keyword anywhere in a line, and why.
Key takeaways
- finditer gives matches with positions and groups — best for extraction.
- Always use raw strings so backslashes reach the regex engine.
- Named groups (the (?P<name>…) syntax) parse structured text readably.
- re.search finds anywhere; re.match only at the start — don’t confuse them.
Quick quiz
Next, BRE, ERE, and PCRE — the regex dialects of grep, sed, and awk on the command line.