Python › Recon Automation

Analysing JavaScript files for endpoints and secrets

5 min read Intermediate 5 sections

JavaScript bundles are the richest recon source there is — they ship the whole client’s view of the API. This lesson builds a Python tool that fetches an app’s JavaScript and extracts the endpoints, routes, and secrets hidden inside. It combines the requests, regex, and parsing skills you’ve built into one genuinely useful script.

You'll learn to

Harvest the JavaScript files a page loads
Extract endpoints and secrets with regex
Combine it into a reusable analysis function

Step one: find the script files

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

base = "https://example.com"
html = requests.get(base, timeout=10).text
soup = BeautifulSoup(html, "html.parser")

# Every script the page references, as absolute URLs:
scripts = set()
for tag in soup.find_all("script", src=True):
    scripts.add(urljoin(base, tag["src"]))

BeautifulSoup (install with pip install beautifulsoup4) parses the HTML so you can pull every <script src="...">. urljoin turns relative paths into full URLs. The result is the set of JavaScript files the app loads — the files that contain its API knowledge.

Step two: scan each file with regex

import re

ENDPOINT = re.compile(r"""["'`](/(?:api|v\d+|graphql)/[A-Za-z0-9_./{}-]+)["'`]""")
SECRETS = {
    "AWS":    re.compile(r"\b(?:AKIA|ASIA)[0-9A-Z]{16}\b"),
    "GitHub": re.compile(r"\bghp_[0-9A-Za-z]{36}\b"),
    "JWT":    re.compile(r"\beyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]*"),
}

def scan(js_text):
    endpoints = set(ENDPOINT.findall(js_text))
    secrets = []
    for label, rx in SECRETS.items():
        for hit in set(rx.findall(js_text)):
            secrets.append((label, hit))
    return endpoints, secrets

The ENDPOINT pattern looks for quoted strings that look like API paths (starting with /api/, /v1/, /graphql/). The secret patterns are the high-precision ones from the regex course. Note the triple-quoted raw string for the endpoint pattern — it lets the pattern contain both single and double quotes without escaping headaches, which matters because bundle code mixes quote styles.

Step three: tie it together

def analyse(base):
    session = requests.Session()
    html = session.get(base, timeout=10).text
    soup = BeautifulSoup(html, "html.parser")
    scripts = {urljoin(base, t["src"]) for t in soup.find_all("script", src=True)}

    all_endpoints, all_secrets = set(), []
    for js_url in scripts:
        try:
            js = session.get(js_url, timeout=10).text
        except requests.RequestException:
            continue
        endpoints, secrets = scan(js)
        all_endpoints.update(endpoints)
        all_secrets.extend(secrets)

    return sorted(all_endpoints), all_secrets

This fetches the page, finds every script, downloads each, and accumulates endpoints and secrets across all of them — with a try/except so one unreachable script doesn’t crash the run. One function call turns a URL into a map of the app’s API surface plus any leaked secrets.

Checkpoint

Why does fetching and scanning an app's JavaScript files reveal endpoints that the visible UI never shows?

Try it yourself

On a site you’re authorised to test, write a script that fetches the page, uses BeautifulSoup to list every script src as an absolute URL, then downloads one of those scripts and runs your endpoint regex over it. Print the unique endpoints found. You’ve built the core of a JS recon tool.

Summary

JavaScript bundles describe an app’s backend. The workflow is: fetch the page, use BeautifulSoup to harvest every script URL, download each, and scan with regex for endpoints (quoted API-looking paths) and secrets (the high-precision prefix patterns). Accumulate into sets to dedupe, with try/except so one bad script doesn’t crash the run. Static analysis finds what’s written; pair it with a headless browser to catch runtime-built URLs.

Key takeaways

BeautifulSoup harvests every <script src>; urljoin makes them absolute.
Regex extracts endpoints (quoted API paths) and secrets from the JS text.
Accumulate into sets to dedupe; use try/except per script for robustness.
Static scanning misses runtime-built URLs — pair with a headless browser.

Quick quiz

Next, the secret-discovery deep dive — turning these patterns into a thorough, low-false-positive scanner across whole sites and codebases.

Was this lesson helpful?