Regex › Log Analysis Regex
Log analysis with regex
Logs record everything that happened, in semi-structured text. Regex is how you parse them into fields and find the lines that signal an attack. This lesson covers parsing the common formats and matching the attack signatures — the regex side of detection and incident response.
You'll learn to
- Parse log lines into fields with regex
- Match attack signatures in logs
- Extract indicators for hunting
Parsing a log line
Apache/Nginx combined log:
192.0.2.5 - - [10/Jan/2025:13:55:36] "GET /admin HTTP/1.1" 403 1234 "-" "curl/7"
Pattern with named groups:
(?P<ip>\S+) \S+ \S+ \[(?P<time>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+)[^"]*" (?P<status>\d+)
A regex with named groups turns each messy log line into clean fields — IP, time, method, path, status. Once parsed, you can filter and aggregate them. This is the structured foundation for everything else.
Attack signatures
Path traversal: (\.\./|%2e%2e|\.\.\\)
SQLi attempt: (?i)(union\s+select|'\s+or\s+'1'\s*=\s*'1|sleep\()
XSS attempt: (?i)(<script|onerror\s*=|javascript:)
Scanner UAs: (?i)(sqlmap|nikto|nmap|masscan|acunetix)
Each signature matches a category of malicious request. Running them over an access log surfaces the attack attempts among millions of normal lines — the regex equivalent of the grep-based hunting from the Bash course, with more precise patterns.
Checkpoint
Why are named groups particularly useful when parsing log lines with regex?
A log line is semi-structured text with several fields (IP, timestamp, method, path, status) in fixed positions. Named groups let you capture each field with a meaningful label and then read it by name rather than counting group numbers, which makes the parsing code clear and maintainable. Once each line is parsed into named fields, you can filter, aggregate, and rank them — for example grouping by IP or filtering by status — turning raw log text into structured, queryable data.
Try it yourself
Write a regex with named groups that captures the IP, method, path, and status from a combined-format log line. Then write a signature pattern that matches a path-traversal attempt in either its raw (../) or URL-encoded (%2e%2e) form.
Key takeaways
- Named-group patterns parse log lines into clean fields.
- Signature patterns match traversal, SQLi, XSS, and scanner user-agents.
- Regex over raw logs finds attacks without a SIEM — ideal in incidents.
- Extract IOCs (IPs, domains, hashes); handle defanged forms too.
Quick quiz
Next, detection engineering — how regex underpins Sigma, YARA, Suricata, and Snort rules.