Reproducible experiments for the paper:
LLM Exposure Monitoring: A Security Framework for Recording AI Agent Data Access Across Heterogeneous Platforms
src/ Core modules (zero external dependencies)
engine.js Dual-stream corroboration engine (exact + temporal matching)
generator.js Synthetic event stream generator with ET5 injection
metrics.js Detection metrics (precision, recall, F1, CI)
prng.js Seeded PRNG (xoshiro128**) for reproducibility
experiments/ Experiment scripts
01-reproduce.js Single-run reproduction of paper results
02-sweep.js Parameter sensitivity sweep (ET5 rate, temporal window, drop rate)
03-baseline.js Baseline comparison: full vs temporal-only vs exact-only
04-multirun.js 100-run statistical analysis with 95% CIs
05-webhook-probe.js GitHub webhook delivery measurement
06-real-world-pilot.js Real-world pilot with live GitHub webhooks
worker/ Cloudflare Worker for webhook endpoint (KV-backed)
results/ Pre-computed results (JSON + CSV)
# Run all simulation experiments (no network, no API keys needed)
npm run all-sim
# Run individually
npm run reproduce # Experiment 01
npm run sweep # Experiment 02
npm run baseline # Experiment 03
npm run multirun # Experiment 04Requires infrastructure setup:
- Deploy the Cloudflare Worker in
worker/(seewrangler.toml) - Set environment variables:
export GITHUB_TOKEN="..." # PAT with repo + admin:repo_hook scope export GITHUB_TEST_REPO="user/repo" # Test repository export WEBHOOK_ENDPOINT="https://your-worker.example.com"
- Run:
npm run pilot
The pre-computed results in results/06-real-world-pilot.json were generated against this repository's own issue tracker. The 152 closed issues are experimental artifacts: each issue was created via the GitHub API during the pilot run, with timestamps and webhook delivery records serving as independently verifiable evidence of real webhook traffic.
| Metric | Simulation (N=100) | Real-World Pilot (N=50) |
|---|---|---|
| Recall | 0.983 [0.978, 0.989] | 1.000 |
| Precision | 0.223 (at ~10% rate) | 1.000 (at 20% rate) |
| F1 | 0.363 | 1.000 |
| Webhook drop | 5.0% (modeled) | 0.0% (measured) |
| Webhook duplicates | 2.0% (modeled) | 0 (measured) |
| Latency p50 / p99 | modeled | 3,608 ms / 4,423 ms |
- Node.js >= 14
- No external dependencies for simulation experiments (01-04)
- GitHub API access + Cloudflare Workers account for experiments 05-06
If you use this code or data, please cite:
@misc{lem-experiments-2026,
author = {Li, Tianyu},
title = {LEM Experiments: Reproducible Validation for LLM Exposure Monitoring},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19220582},
url = {https://github.com/crabsatellite/lem-experiments}
}
MIT