Python › Source Code Review Automation

Automating source code review

3 min read Intermediate 3 sections

When you have source access — a repo, a decompiled app, a grey-box target — Python lets you triage thousands of files in seconds. This lesson builds a reusable scanner that finds hardcoded credentials, dangerous functions, and vulnerability patterns, reporting them as file-and-line hits you can verify.

You'll learn to

  • Walk a codebase efficiently
  • Match the patterns that signal real bugs
  • Report findings as file:line you can jump to

Walking the tree

import os

SKIP = {"node_modules", ".git", "vendor", "dist", "__pycache__"}
EXT = {".py", ".js", ".php", ".java", ".rb", ".go", ".env"}

def code_files(root):
    for dirpath, dirnames, filenames in os.walk(root):
        dirnames[:] = [d for d in dirnames if d not in SKIP]  # prune noise dirs
        for name in filenames:
            if os.path.splitext(name)[1] in EXT:
                yield os.path.join(dirpath, name)

os.walk recursively yields files; pruning dirnames in place stops it descending into node_modules and .git — a big speed and signal win. yield makes this a generator, so it scales to huge trees without building a giant list.

The patterns that find bugs

import re

RULES = {
    "hardcoded-cred": re.compile(r"(?i)(password|secret|api[_-]?key|token)\s*[:=]\s*['\"][^'\"]{4,}['\"]"),
    "code-exec":      re.compile(r"\b(eval|exec|system|popen)\s*\("),
    "py-deser":       re.compile(r"\b(pickle\.loads|yaml\.load)\s*\("),
    "sql-fstring":    re.compile(r"(?i)(execute|cursor\.execute)\s*\(\s*f['\"]"),
}

def scan(path):
    src = open(path, encoding="utf-8", errors="ignore").read()
    for rule, rx in RULES.items():
        for m in rx.finditer(src):
            line = src[:m.start()].count("\n") + 1     # line number of the hit
            print(f"{path}:{line}: [{rule}] {m.group(0)[:60]}")

Three families: hardcoded credentials, dangerous sinks (code execution, unsafe deserialization), and vulnerability patterns like SQL built with an f-string. The line-number trick — counting newlines before the match — turns each hit into a file:line reference you can jump straight to.

Checkpoint

Why does the scanner prune directories like node_modules and .git from the os.walk, and how?

Try it yourself

Point the file walker at a small project directory and list the source files it finds. Then run one rule — say the code-exec pattern — over each file and print any file:line hits. Verify a hit by opening that line in context.

Key takeaways

  • os.walk with in-place dirname pruning scans big trees fast.
  • Three rule families: credentials, dangerous sinks, vulnerability patterns.
  • Count newlines before a match to get its line number.
  • Review output is triage — verify each candidate in context; scan git history too.

Quick quiz

Next, automating Active Directory enumeration with LDAP and the impacket toolkit.

Was this lesson helpful?