Skip to content

crabsatellite/lem-experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LEM Experiments

DOI

Reproducible experiments for the paper:

LLM Exposure Monitoring: A Security Framework for Recording AI Agent Data Access Across Heterogeneous Platforms

Structure

src/                 Core modules (zero external dependencies)
  engine.js            Dual-stream corroboration engine (exact + temporal matching)
  generator.js         Synthetic event stream generator with ET5 injection
  metrics.js           Detection metrics (precision, recall, F1, CI)
  prng.js              Seeded PRNG (xoshiro128**) for reproducibility

experiments/         Experiment scripts
  01-reproduce.js      Single-run reproduction of paper results
  02-sweep.js          Parameter sensitivity sweep (ET5 rate, temporal window, drop rate)
  03-baseline.js       Baseline comparison: full vs temporal-only vs exact-only
  04-multirun.js       100-run statistical analysis with 95% CIs
  05-webhook-probe.js  GitHub webhook delivery measurement
  06-real-world-pilot.js  Real-world pilot with live GitHub webhooks

worker/              Cloudflare Worker for webhook endpoint (KV-backed)
results/             Pre-computed results (JSON + CSV)

Quick Start

# Run all simulation experiments (no network, no API keys needed)
npm run all-sim

# Run individually
npm run reproduce    # Experiment 01
npm run sweep        # Experiment 02
npm run baseline     # Experiment 03
npm run multirun     # Experiment 04

Real-World Pilot (Experiment 06)

Requires infrastructure setup:

  1. Deploy the Cloudflare Worker in worker/ (see wrangler.toml)
  2. Set environment variables:
    export GITHUB_TOKEN="..."           # PAT with repo + admin:repo_hook scope
    export GITHUB_TEST_REPO="user/repo" # Test repository
    export WEBHOOK_ENDPOINT="https://your-worker.example.com"
  3. Run: npm run pilot

The pre-computed results in results/06-real-world-pilot.json were generated against this repository's own issue tracker. The 152 closed issues are experimental artifacts: each issue was created via the GitHub API during the pilot run, with timestamps and webhook delivery records serving as independently verifiable evidence of real webhook traffic.

Key Results

Metric Simulation (N=100) Real-World Pilot (N=50)
Recall 0.983 [0.978, 0.989] 1.000
Precision 0.223 (at ~10% rate) 1.000 (at 20% rate)
F1 0.363 1.000
Webhook drop 5.0% (modeled) 0.0% (measured)
Webhook duplicates 2.0% (modeled) 0 (measured)
Latency p50 / p99 modeled 3,608 ms / 4,423 ms

Requirements

  • Node.js >= 14
  • No external dependencies for simulation experiments (01-04)
  • GitHub API access + Cloudflare Workers account for experiments 05-06

Citation

If you use this code or data, please cite:

@misc{lem-experiments-2026,
  author       = {Li, Tianyu},
  title        = {LEM Experiments: Reproducible Validation for LLM Exposure Monitoring},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19220582},
  url          = {https://github.com/crabsatellite/lem-experiments}
}

License

MIT

About

Reproducible experiments for: LLM Exposure Monitoring — A Security Framework for Recording AI Agent Data Access Across Heterogeneous Platforms

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors