Regex › Threat Hunting Regex

Threat hunting: extracting IOCs

3 min read Intermediate 3 sections

Threat hunting often starts with extracting indicators of compromise — IPs, domains, URLs, file hashes, emails — from logs, reports, and samples. Each has a recognisable shape, so regex pulls them out of any text at scale. This lesson covers the IOC patterns and the defanging that analysts use.

You'll learn to

  • Match the common IOC types
  • Handle defanged indicators
  • Extract IOCs from any dataset

The IOC patterns

IPv4:    \b(?:\d{1,3}\.){3}\d{1,3}\b
Domain:  \b(?:[a-z0-9-]+\.)+[a-z]{2,}\b
URL:     https?://[^\s"'<>]+
MD5:     \b[a-f0-9]{32}\b
SHA256:  \b[a-f0-9]{64}\b
Email:   \b[\w.+-]+@[\w-]+\.[\w.-]+\b

Each indicator has a fixed structure: an IPv4 is four dot-separated number groups, a SHA-256 is 64 hex characters, a URL starts with a scheme. These shapes make extraction reliable — run the battery over any text and collect the indicators.

Defanged indicators

Analysts 'defang' IOCs so they're not accidentally clicked or auto-blocked:
  hxxp://evil[.]com/path     1.2.3[.]4     evil(dot)com     user[at]evil.com

Your patterns must handle both live and defanged forms — match
hxxps?, the bracketed dot [.], and (dot)/(at) variants.

Checkpoint

Why must IOC-extraction patterns handle 'defanged' indicators like hxxp://evil[.]com?

Try it yourself

Write patterns to extract IPv4 addresses and SHA-256 hashes from text, noting the exact length that makes the hash pattern reliable. Then describe how you’d modify a URL pattern to also catch the defanged hxxp:// and [.] forms used in threat reports.

Key takeaways

  • IOCs (IPs, domains, URLs, hashes, emails) have fixed, matchable shapes.
  • Hashes are reliable by exact length: MD5 is 32 hex, SHA-256 is 64.
  • Handle defanged forms (hxxp, [.], (dot)) from reports, then re-fang to search.
  • Validate matches (octet range, real TLD, exact length) to cut false IOCs.

Quick quiz

Next, language-specific regex implementations — the quirks across PHP, Java, C#, Ruby, Rust, and Perl.

Was this lesson helpful?