Bash › Text Processing
sed and awk: reshaping text
grep finds the lines you want. sed and awk reshape them — rewriting text and pulling out columns. Together with grep they’re the core of every recon pipeline and log-analysis one-liner. You don’t need to master their full languages; a handful of patterns covers almost everything you’ll do.
You'll learn to
- Rewrite text on the fly with sed
- Extract and reorder columns with awk
- Chain grep, sed, and awk into a clean pipeline
sed — the stream editor
sed edits text as it flows past. The one command you’ll use constantly is substitution:
# Replace the first match on each line:
echo "https://example.com" | sed 's/https/http/'
# → http://example.com
# Replace ALL matches (the g flag = global):
echo "a.b.c" | sed 's/\./-/g'
# → a-b-c
# Delete lines matching a pattern:
sed '/^#/d' config.txt # remove comment lines (starting with #)
# Print only a specific line range:
sed -n '10,20p' file # lines 10 to 20
The substitution syntax is s/find/replace/, with an optional g to replace every match on the line rather than just the first. That s///g is the workhorse — cleaning, rewriting, normalising.
awk — the column processor
awk splits each line into fields (by whitespace by default) and lets you work with them by number. $1 is the first field, $2 the second, $0 the whole line.
# Print just the first field of each line:
awk '{print $1}' access.log
# Print the first and seventh fields (IP and URL in a log):
awk '{print $1, $7}' access.log
# Use a different separator (-F) — here, colon for /etc/passwd:
awk -F: '{print $1}' /etc/passwd # just the usernames
# Filter AND extract — print field 7 only when status (field 9) is 404:
awk '$9 == 404 {print $7}' access.log
# Count occurrences and sum:
awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' access.log
awk shines on structured, column-based text — which is exactly what logs are. The pattern awk '$9 == 404 {print $7}' reads “when field 9 equals 404, print field 7” — filtering and extracting in one step.
Chaining them together
# From a log, find failed logins, extract the IP, rank by count:
grep "Failed password" /var/log/auth.log \
| awk '{print $(NF-3)}' \
| sort | uniq -c | sort -rn | head
# Extract all unique domains from a list of URLs:
cat urls.txt | sed 's|https\?://||' | awk -F/ '{print $1}' | sort -u
Read the first pipeline left to right: grep keeps only failed-login lines, awk pulls the IP field (NF is the number of fields, so NF-3 counts back from the end), then the sort/uniq/sort chain counts and ranks them. Each tool does one job; the pipe connects them.
Checkpoint
You have an access log and want the IP address (field 1) of every request that returned a 500 status (field 9). What awk command does this?
awk '$9 == 500 {print $1}' access.log — awk checks whether field 9 equals 500 and, when it does, prints field 1 (the IP). This filter-and-extract in one step is awk's core strength on column-based data like logs.
Try it yourself
Take /etc/passwd (colon-separated). Use awk -F: '{print $1}' to list usernames, then add the home directory (field 6) so each line shows username and home. Then use sed to strip the protocol from a URL by replacing the leading https:// with nothing.
Summary
sed rewrites text as it streams — s/find/replace/g for substitution, /pattern/d to delete lines — and treats . as any character (escape it for literals). awk splits lines into fields ($1, $2, … $0), works with columns, and filters-and-extracts in one step ($9 == 404 {print $7}). Chained with grep and sort/uniq, they turn raw logs and tool output into ranked, clean data — the backbone of recon and log analysis.
Key takeaways
sed 's/find/replace/g'rewrites text; escape.for a literal dot.awk '{print $1}'extracts columns;-Fsets the field separator.awk '$9 == 404 {print $7}'filters and extracts together.- grep + awk + sort | uniq -c | sort -rn is the classic log-ranking pipeline.
Quick quiz
Next module, you put grep, sed, and awk to work building real recon pipelines that chain security tools together.