LEM Experiments

Reproducible experiments for the paper:

LLM Exposure Monitoring: A Security Framework for Recording AI Agent Data Access Across Heterogeneous Platforms

Structure

src/                 Core modules (zero external dependencies)
  engine.js            Dual-stream corroboration engine (exact + temporal matching)
  generator.js         Synthetic event stream generator with ET5 injection
  metrics.js           Detection metrics (precision, recall, F1, CI)
  prng.js              Seeded PRNG (xoshiro128**) for reproducibility

experiments/         Experiment scripts
  01-reproduce.js      Single-run reproduction of paper results
  02-sweep.js          Parameter sensitivity sweep (ET5 rate, temporal window, drop rate)
  03-baseline.js       Baseline comparison: full vs temporal-only vs exact-only
  04-multirun.js       100-run statistical analysis with 95% CIs
  05-webhook-probe.js  GitHub webhook delivery measurement
  06-real-world-pilot.js  Real-world pilot with live GitHub webhooks

worker/              Cloudflare Worker for webhook endpoint (KV-backed)
results/             Pre-computed results (JSON + CSV)

Quick Start

# Run all simulation experiments (no network, no API keys needed)
npm run all-sim

# Run individually
npm run reproduce    # Experiment 01
npm run sweep        # Experiment 02
npm run baseline     # Experiment 03
npm run multirun     # Experiment 04

Real-World Pilot (Experiment 06)

Requires infrastructure setup:

Deploy the Cloudflare Worker in worker/ (see wrangler.toml)

Set environment variables:

export GITHUB_TOKEN="..."           # PAT with repo + admin:repo_hook scope
export GITHUB_TEST_REPO="user/repo" # Test repository
export WEBHOOK_ENDPOINT="https://your-worker.example.com"

Run: npm run pilot

The pre-computed results in results/06-real-world-pilot.json were generated against this repository's own issue tracker. The 152 closed issues are experimental artifacts: each issue was created via the GitHub API during the pilot run, with timestamps and webhook delivery records serving as independently verifiable evidence of real webhook traffic.

Key Results

Metric	Simulation (N=100)	Real-World Pilot (N=50)
Recall	0.983 [0.978, 0.989]	1.000
Precision	0.223 (at ~10% rate)	1.000 (at 20% rate)
F1	0.363	1.000
Webhook drop	5.0% (modeled)	0.0% (measured)
Webhook duplicates	2.0% (modeled)	0 (measured)
Latency p50 / p99	modeled	3,608 ms / 4,423 ms

Requirements

Node.js >= 14
No external dependencies for simulation experiments (01-04)
GitHub API access + Cloudflare Workers account for experiments 05-06

Citation

If you use this code or data, please cite:

@misc{lem-experiments-2026,
  author       = {Li, Tianyu},
  title        = {LEM Experiments: Reproducible Validation for LLM Exposure Monitoring},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19220582},
  url          = {https://github.com/crabsatellite/lem-experiments}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
experiments		experiments
results		results
src		src
worker		worker
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEM Experiments

Structure

Quick Start

Real-World Pilot (Experiment 06)

Key Results

Requirements

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LEM Experiments

Structure

Quick Start

Real-World Pilot (Experiment 06)

Key Results

Requirements

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages