A pure-Rust, semantics-first NanoAOD analysis framework for high-energy physics, built for the agentic-coding era.
📖 API docs + notes: https://dickychant.github.io/nano.rust/
Agentic tools can write analysis code faster than anyone can review it. The bottleneck is no longer writing code but guaranteeing it is correct. Soft guardrails — prompts, skills, harnesses, human review — can steer an agent but cannot guarantee the absence of silent analysis bugs (wrong branch, dropped systematic, mixed units, stale outputs). A hard guarantee needs a mechanical enforcer:
Physicists define and review physics semantics. Agents generate implementation. The Rust compiler and a validation layer reject inconsistent states.
So the analysis is modelled as a typed state machine the compiler checks
(make invalid analysis states unrepresentable), with Rust's strengths layered on:
performance, FFI to legacy libraries, SIMD per-event execution, and TUI-friendly
orchestration. Full rationale: docs/vision.md.
- Validated on a real analysis — reproduces ROOT's Higgs→ZZ→4ℓ (df103,
three channels: 4μ/4e/2e2μ) on CMS Open Data, read remotely on-demand in
pure Rust. The full stacked discovery plot (signal + ZZ background + 2012 data,
11.6 fb⁻¹) is bit-identical to ROOT (and the df102 dimuon spectrum). Plots
are in
docs/site/plots/; see the blog. - Owned, pure-Rust ROOT I/O (
nano-rootio, no ROOT/C++ dependency):- reads real CMS NanoAODv9 — scalars, jagged collections, windowed reads, bounded-memory streaming (~3 MB to stream a skim of any-size file);
- reads locally and remotely on-demand over HTTPS byte-range (the first 10 events of a 2 GB open-data file fetch ~1.3 MB — only the baskets touched);
- writes ROOT/uproot-readable skims (scalars and jagged);
- validated A/B against the upstream reader and cross-checked against
uprootin CI, both read and write.
- Typed event model (
nano-core): collections, attributes, thePrefix_attrgrouping rule,Arc-shared per-event columns (Send + Sync). - Compile-enforced state machine (
nano-analysis):Ev<Raw> → Baseline → InRegion<R> → Weighted<R>; filling a histogram requires aWeighted<R>, so wrong-stage / wrong-region / unweighted fills are compile errors (proven by compile-fail tests). Unit newtypes, exhaustiveSystematic. - Semantic IR (
nano-spec): a physics-facing YAML spec is parsed, statically validated (missing branch / wrong type / missing unit / undefined object are rejected with precise errors), and used to derive the exactread_branchesfor the reader.
crates/
nano-rootio owned ROOT TTree read + write (NanoAOD subset; pure Rust)
nano-core event model (Event / Collection / ObjectView, branch schema)
nano-io streaming reader + skim writer over nano-rootio
nano-producers analysis channels (muon control region)
nano-analysis compile-enforced analysis state machine (typestate)
nano-spec semantic compiler: spec -> validate -> derive read_branches -> codegen
nano-corrections native correctionlib evaluator (typed SF inputs)
nano-inference backend-agnostic ML inference protocol (mock/ONNX/remote/managed)
nano-cli the `nano` CLI: validate / branches / inspect / codegen
nano-mcp MCP server exposing the same ops as agent tools
nano-gen-demo, nano-gen-tagger-demo codegen == hand-written equivalence proofs
root-io vendored upstream reader, retained only as a dev/A-B oracle
The architecture, layer by layer:
physics spec (TOML/YAML) -> semantic IR (typed, validated) -> Rust codegen
-> Rust execution kernels (typed state machine)
-> Rust-native workflow DAG (planned)
cargo build
cargo test # whole workspace
cargo test --features http # also exercise remote (HTTPS byte-range) reads
# write a small NanoAOD-like file and inspect it (e.g. with uproot)
cargo run -p nano-rootio --example write_demo -- /tmp/demo.rootReal-data tests read a local NanoAOD file from
tests/data/muon_validation/inputs/ if present (gitignored) and skip otherwise.
The uproot interop + benchmark runs in CI (scripts/bench_vs_uproot.py)
against CMS Open Data over HTTPS — no checked-in data files.
Built: owned ROOT I/O (read + write, local + remote), the event model, the
compile-enforced state machine, the semantic compiler including codegen
(proven equal to a hand-written producer), a native correctionlib
evaluator, an ML inference protocol, and an agent action space (nano CLI +
MCP server). Next: golden tests against the frozen .root references, wiring
real corrections/JME systematics into the channel, and a Rust-native workflow
DAG orchestrator (the LAW backend is descoped). See docs/ —
architecture, compiler roadmap, ADL front-end, orchestrator,
vision, versioning,
state machine, semantic layer,
inference protocol,
agent interface,
reader rewrite, remote source,
migration.
nano.rust grew out of, and is inspired by, prior work:
- Origins — it began as a C++ port (
nano.cpp/ NanoAODToolsCpp) of selected NanoAOD-tools / NanoHRT-tools workflows, preserved on thecpp-snapshotbranch. - root-io (cbourjau / alice-rs) — the
pure-Rust ROOT reader we vendored and grew the owned
nano-rootioI/O core (read + write) from; still a differential A/B oracle in tests (MPL-2.0). - uproot (with awkward-array) — for showing that ROOT can be treated as a storage format readable outside ROOT; it is also our independent read/write oracle in CI.
- ROOT — the on-disk format and reference implementation; our correctness and performance baseline.
- correctionlib — the
corrections JSON schema and evaluation model that
nano-correctionsre-implements natively in Rust.
MPL-2.0. crates/root-io is vendored from
cbourjau/alice-rs (MPL-2.0); its license
and attribution are retained.