Skip to content

b2d3/SrcApe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SrcApe [WIP]

[ comparison between percentage of source maps recovered by tool and which progams other OSS might miss ]

Authenticated web bundle scraper, JavaScript unpacker, and deobfuscator.

SrcApe drives a real Chromium against a target, captures every JavaScript bundle the SPA loads (including async / lazy / code-split chunks), and recovers the original source tree from any source maps that ship with the deployment. Works authenticated via Playwright storage state, cookies, or custom headers — so it can reach the parts of an app that exist behind login, where the interesting code actually lives. Built on top of existing libraries (Playwright, source-map-js) plus custom code for the crawl + recovery orchestration.

The gap it tries to fill

Existing source-map tools assume you have a .map URL in hand and walk you through extracting that one. Real SPAs ship dozens of chunks across routes — many lazy-loaded only after the app initializes for an authenticated user. None of the existing OSS tools cross the gap from "I have a domain" to "I have the entire source tree."

The closest existing tool, denandz/sourcemapper, is excellent for one map at a time and added basic --header support in early 2024. SrcApe is what happens when you push the use case all the way: drive a real browser, log in, capture everything.

Use cases

  • Bug bounty recon. Recover the original source tree of a target's deployed JS so you can read what's actually there, not the minified blob.
  • AppSec / threat modeling. Pull the source of your own SaaS to feed into other analysis tools (incl. StoXSS — its sibling, which consumes SrcApe output to look for stored-XSS-shaped patterns).
  • Reverse engineering / learning. Read how production React/Vue/Svelte apps are actually structured.
  • CI / build verification. Confirm your prod builds aren't accidentally shipping source maps to the public.

Status

Early development. CLI shape and output schema may change.

Quick start

git clone https://github.com/b2d3/SrcApe
cd SrcApe
npm install
node srcape.mjs https://your-target.com

Output lands in out/<host>/:

out/your-target.com/
  bundles/           Raw JS responses + their .map files
  sources/           Recovered original sources (reconstructed tree)
  recovery-report.md Human summary
  recovery.json      Machine-readable summary

Authenticated scraping

The marketing pages an unauthenticated crawl sees load a small subset of an SPA's codebase. The interesting code — profile editors, dashboards, messaging, admin panels — lives behind login on app.<target>.com-style deploys. SrcApe takes auth state any of four ways. Pick whichever matches how you already have a logged-in session:

Option 1: paste cookies from your already-logged-in browser (easiest)

If you're already signed in to the target in Chrome (or any browser):

  1. Open DevTools on a logged-in page → Network tab → click any authenticated request
  2. Copy the value of the Cookie: request header (or right-click → Copy → Copy value)
  3. Paste into a file, then:
node srcape.mjs https://app.example.com --cookies-file cookies.txt

Or inline if it's short:

node srcape.mjs https://app.example.com --cookies "session=abc123; user=42"

--cookies and --cookies-file auto-detect the format. Three accepted shapes:

  • Raw Cookie: header line (with or without the Cookie: prefix)
  • Netscape cookies.txt (the format exported by "Get cookies.txt LOCALLY" or similar extensions)
  • JSON (a Playwright cookie array, or a full storage-state file's {cookies: [...]})

Option 2: bearer tokens and API keys

For SaaS APIs that authenticate via Authorization: Bearer … rather than cookies:

node srcape.mjs https://api.example.com \
  --header "Authorization=Bearer eyJ..." \
  --header "X-API-Key=..."

Headers are repeatable and apply to every request the browser makes — page load, bundle fetches, and .map fetches.

Option 3: Playwright storage state

For modern SPAs that put auth tokens in localStorage / sessionStorage rather than cookies, you need a full storage-state JSON (which captures cookies + localStorage + sessionStorage in one file):

node srcape.mjs https://app.example.com --storage-state auth.json

To export storage state from a real session, use the Playwright CLI:

npx playwright codegen --save-storage auth.json https://app.example.com
# log in manually in the window that opens, then close the browser

URL discovery

Beyond --storage-state, getting the full surface usually means seeding multiple URLs. Three modes:

# Multiple positional args (same host crawls in one browser session)
node srcape.mjs https://app.example.com/dashboard \
                https://app.example.com/profile \
                https://app.example.com/messages

# Auto-discover via /sitemap.xml + /robots.txt
node srcape.mjs example.com --sitemap --sitemap-limit 30

# BFS link-follow from the seed (same-site, capped depth + page count)
node srcape.mjs https://app.example.com --bfs 2 --bfs-limit 30

# BFS through an authenticated SPA — the dashboard, every route it links to,
# and every route those link to. This is how you get to the lazy-loaded
# admin / messaging / billing chunks that the seed page alone never touches.
node srcape.mjs https://app.example.com/dashboard \
  --storage-state auth.json --bfs 2 --bfs-limit 40

# Read from a file
node srcape.mjs example.com --urls hunting-targets.txt

The --urls file also pairs well with recon tools. Generate a URL list elsewhere, feed it in:

# Pair with katana (ProjectDiscovery's modern crawler)
katana -u https://app.example.com -d 3 -silent > urls.txt
node srcape.mjs app.example.com --urls urls.txt --storage-state auth.json

# Pair with waybackurls for historical pages
waybackurls example.com | grep -v static | sort -u > urls.txt
node srcape.mjs example.com --urls urls.txt

# Pair with gau, hakrawler, gospider, etc. — any tool that outputs URLs.

Library API

SrcApe also exposes a programmatic API for tools that want to build on the recovery pipeline:

import { recover, renderRecoveryReport } from "srcape";

const result = await recover({
	urls: ["https://app.example.com/dashboard"],
	outDir: "./out/app.example.com",
	auth: {
		storageStateFile: "./auth.json",
		headers: { Authorization: "Bearer …" },
	},
	useSitemap: true,
	log: console.log,
});

console.log(`Wrote ${result.stats.sourcesWritten} source files`);
const md = renderRecoveryReport("app.example.com", result);

Sibling exports:

  • import { crawl } from 'srcape/crawl' — just the Playwright crawl
  • import { expandSourceMap, loadConsumer } from 'srcape/sourcemaps' — just the source-map work
  • import { discoverSitemap } from 'srcape/sitemap' — URL discovery via sitemap.xml

Responsible use

Use SrcApe only against targets you are authorized to test:

  • Your own applications and infrastructure
  • Bug-bounty programs whose scope explicitly includes the target
  • CTFs and lab environments

Only use auth state from a session you own. Never use cookies or storage state obtained from anywhere else. Most bug-bounty programs explicitly allow authenticated scanning against your own test session; using someone else's session is a different and much more serious thing.

Prior art & inspiration

SrcApe stands on a lot of work by other people. It's meant to complement existing tools, not replace them.

  • denandz/sourcemapper — the canonical single-map extractor. Go binary, BSD-3-licensed, 1.3k+ stars. Use it when you have one specific .map URL. Use SrcApe when you want to recover everything an authenticated SPA loads.
  • jsmap — alternative single-map extractor in Go.
  • unwebpack-sourcemap — Python tool with similar single-map scope.
  • Burp Source Mapper — Burp Pro extension that injects sourceMappingURL pragmas so DevTools loads originals. Requires Burp Pro + manual browsing.
  • BitMapper — browser-side equivalent of Burp Source Mapper.
  • Playwright — the underlying browser automation. SrcApe is basically a focused frontend on top of Playwright's network-interception + storage-state APIs.
  • source-map-js — the pure-JS source-map parser SrcApe uses for VLQ decoding and position mapping.

Contributing

Issues and PRs welcome once the surface stabilizes. The most useful contribution right now is trying it on a target you have permission to test and opening an issue with edge cases — odd source-map formats, auth flows that don't survive storage-state replay, SPAs whose chunks load only after specific user interactions.

License

MIT. See LICENSE.

About

[Work In Progress] Web scraper that outputs unpacked and deobfuscated src files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors