Python › Recon Automation
Analysing JavaScript files for endpoints and secrets
JavaScript bundles are the richest recon source there is — they ship the whole client’s view of the API. This lesson builds a Python tool that fetches an app’s JavaScript and extracts the endpoints, routes, and secrets hidden inside. It combines the requests, regex, and parsing skills you’ve built into one genuinely useful script.
You'll learn to
- Harvest the JavaScript files a page loads
- Extract endpoints and secrets with regex
- Combine it into a reusable analysis function
Step one: find the script files
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
base = "https://example.com"
html = requests.get(base, timeout=10).text
soup = BeautifulSoup(html, "html.parser")
# Every script the page references, as absolute URLs:
scripts = set()
for tag in soup.find_all("script", src=True):
scripts.add(urljoin(base, tag["src"]))
BeautifulSoup (install with pip install beautifulsoup4) parses the HTML so you can pull every <script src="...">. urljoin turns relative paths into full URLs. The result is the set of JavaScript files the app loads — the files that contain its API knowledge.
Step two: scan each file with regex
import re
ENDPOINT = re.compile(r"""["'`](/(?:api|v\d+|graphql)/[A-Za-z0-9_./{}-]+)["'`]""")
SECRETS = {
"AWS": re.compile(r"\b(?:AKIA|ASIA)[0-9A-Z]{16}\b"),
"GitHub": re.compile(r"\bghp_[0-9A-Za-z]{36}\b"),
"JWT": re.compile(r"\beyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]*"),
}
def scan(js_text):
endpoints = set(ENDPOINT.findall(js_text))
secrets = []
for label, rx in SECRETS.items():
for hit in set(rx.findall(js_text)):
secrets.append((label, hit))
return endpoints, secrets
The ENDPOINT pattern looks for quoted strings that look like API paths (starting with /api/, /v1/, /graphql/). The secret patterns are the high-precision ones from the regex course. Note the triple-quoted raw string for the endpoint pattern — it lets the pattern contain both single and double quotes without escaping headaches, which matters because bundle code mixes quote styles.
Step three: tie it together
def analyse(base):
session = requests.Session()
html = session.get(base, timeout=10).text
soup = BeautifulSoup(html, "html.parser")
scripts = {urljoin(base, t["src"]) for t in soup.find_all("script", src=True)}
all_endpoints, all_secrets = set(), []
for js_url in scripts:
try:
js = session.get(js_url, timeout=10).text
except requests.RequestException:
continue
endpoints, secrets = scan(js)
all_endpoints.update(endpoints)
all_secrets.extend(secrets)
return sorted(all_endpoints), all_secrets
This fetches the page, finds every script, downloads each, and accumulates endpoints and secrets across all of them — with a try/except so one unreachable script doesn’t crash the run. One function call turns a URL into a map of the app’s API surface plus any leaked secrets.
Checkpoint
Why does fetching and scanning an app's JavaScript files reveal endpoints that the visible UI never shows?
The JavaScript bundle contains all the code the client might run, including calls to endpoints that are only reached under certain conditions, behind feature flags, or in admin/unreleased areas. The UI only exercises some of them, but the code references all of them — so scanning the bundle surfaces the full set the frontend knows about, including hidden or unused ones.
Try it yourself
On a site you’re authorised to test, write a script that fetches the page, uses BeautifulSoup to list every script src as an absolute URL, then downloads one of those scripts and runs your endpoint regex over it. Print the unique endpoints found. You’ve built the core of a JS recon tool.
Summary
JavaScript bundles describe an app’s backend. The workflow is: fetch the page, use BeautifulSoup to harvest every script URL, download each, and scan with regex for endpoints (quoted API-looking paths) and secrets (the high-precision prefix patterns). Accumulate into sets to dedupe, with try/except so one bad script doesn’t crash the run. Static analysis finds what’s written; pair it with a headless browser to catch runtime-built URLs.
Key takeaways
- BeautifulSoup harvests every
<script src>;urljoinmakes them absolute. - Regex extracts endpoints (quoted API paths) and secrets from the JS text.
- Accumulate into sets to dedupe; use
try/exceptper script for robustness. - Static scanning misses runtime-built URLs — pair with a headless browser.
Quick quiz
Next, the secret-discovery deep dive — turning these patterns into a thorough, low-false-positive scanner across whole sites and codebases.