Bash › Text Processing

sed and awk: reshaping text

5 min read Intermediate 5 sections

grep finds the lines you want. sed and awk reshape them — rewriting text and pulling out columns. Together with grep they’re the core of every recon pipeline and log-analysis one-liner. You don’t need to master their full languages; a handful of patterns covers almost everything you’ll do.

You'll learn to

  • Rewrite text on the fly with sed
  • Extract and reorder columns with awk
  • Chain grep, sed, and awk into a clean pipeline

sed — the stream editor

sed edits text as it flows past. The one command you’ll use constantly is substitution:

# Replace the first match on each line:
echo "https://example.com" | sed 's/https/http/'
# → http://example.com

# Replace ALL matches (the g flag = global):
echo "a.b.c" | sed 's/\./-/g'
# → a-b-c

# Delete lines matching a pattern:
sed '/^#/d' config.txt        # remove comment lines (starting with #)

# Print only a specific line range:
sed -n '10,20p' file          # lines 10 to 20

The substitution syntax is s/find/replace/, with an optional g to replace every match on the line rather than just the first. That s///g is the workhorse — cleaning, rewriting, normalising.

awk — the column processor

awk splits each line into fields (by whitespace by default) and lets you work with them by number. $1 is the first field, $2 the second, $0 the whole line.

# Print just the first field of each line:
awk '{print $1}' access.log

# Print the first and seventh fields (IP and URL in a log):
awk '{print $1, $7}' access.log

# Use a different separator (-F) — here, colon for /etc/passwd:
awk -F: '{print $1}' /etc/passwd      # just the usernames

# Filter AND extract — print field 7 only when status (field 9) is 404:
awk '$9 == 404 {print $7}' access.log

# Count occurrences and sum:
awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' access.log

awk shines on structured, column-based text — which is exactly what logs are. The pattern awk '$9 == 404 {print $7}' reads “when field 9 equals 404, print field 7” — filtering and extracting in one step.

Chaining them together

# From a log, find failed logins, extract the IP, rank by count:
grep "Failed password" /var/log/auth.log \
  | awk '{print $(NF-3)}' \
  | sort | uniq -c | sort -rn | head

# Extract all unique domains from a list of URLs:
cat urls.txt | sed 's|https\?://||' | awk -F/ '{print $1}' | sort -u

Read the first pipeline left to right: grep keeps only failed-login lines, awk pulls the IP field (NF is the number of fields, so NF-3 counts back from the end), then the sort/uniq/sort chain counts and ranks them. Each tool does one job; the pipe connects them.

Checkpoint

You have an access log and want the IP address (field 1) of every request that returned a 500 status (field 9). What awk command does this?

Try it yourself

Take /etc/passwd (colon-separated). Use awk -F: '{print $1}' to list usernames, then add the home directory (field 6) so each line shows username and home. Then use sed to strip the protocol from a URL by replacing the leading https:// with nothing.

Summary

sed rewrites text as it streams — s/find/replace/g for substitution, /pattern/d to delete lines — and treats . as any character (escape it for literals). awk splits lines into fields ($1, $2, … $0), works with columns, and filters-and-extracts in one step ($9 == 404 {print $7}). Chained with grep and sort/uniq, they turn raw logs and tool output into ranked, clean data — the backbone of recon and log analysis.

Key takeaways

  • sed 's/find/replace/g' rewrites text; escape . for a literal dot.
  • awk '{print $1}' extracts columns; -F sets the field separator.
  • awk '$9 == 404 {print $7}' filters and extracts together.
  • grep + awk + sort | uniq -c | sort -rn is the classic log-ranking pipeline.

Quick quiz

Next module, you put grep, sed, and awk to work building real recon pipelines that chain security tools together.

Was this lesson helpful?