Skip to content

Explore: component-level deterministic simulation testing (DST) harness #187

Description

@joshua-temple

Summary

Explore a component-level deterministic simulation testing (DST) layer over the state module. The expensive prerequisites for DST are already in place; this issue is to scope the thin scheduling/selection layer on top so a simulator can own message interleaving and replay scenarios deterministically.

What already exists (the hard 80%)

The kernel and host components are pure and synchronous, with no goroutines in the core fire/deliver/step path:

  • Pure event loopFire runs run-to-completion off a macrostep-local internal queue, single-threaded (state/fire.go).
  • Slice-backed mailbox, not a channel — runningActor.mailbox []envelope (state/actor_system.go:85).
  • Synchronous advance verbs, all returning []FireResult[S] for trace capture:
    • ActorSystem.Tick(ctx, id) (state/actor_system.go:486)
    • ServiceRunner.Tick(ctx, id) — runs the service fn inline and settles synchronously, no goroutine (state/runner.go:231)
    • Scheduler.Tick(ctx) + FakeClock.Advance(d) for virtual time (state/driver.go:116, state/driver.go:156)
  • Deterministic ordering — timer/service maps are sorted (due-time then id) before iteration (state/driver.go:136, state/runner.go:262); parallel-region determinism is property-tested (state/parallel_determinism_property_test.go).
  • Replay framework: serializable Scenario/Trace + RunAgainst (state/conformance/scenario.go).

The only sync/chan usage in state/ is mutexes guarding maps for production-host concurrency (uncontended under a sequential driver) and the Clock.After channel the deterministic driver ignores.

The gaps to close for component-level DST

  1. Tick drains, it doesn't step one. ActorSystem.Tick(id) loops popping mailbox[0] until the actor quiesces (state/actor_system.go:486-535) — the unit of control is the component, not the individual message.
  2. Deliver couples enqueue + drain. It appends to the mailbox then immediately Ticks (state/actor_system.go:459-478); there is no public enqueue-only seam, so messages can't accumulate for the simulator to order.
  3. No unified ready-set. Pending work is queried per-component (HasPending(id), Scheduler.Pending(), ServiceRunner.Pending()); there's no single "what is runnable right now" view to choose from.
  4. No RNG seam. No injectable *rand.Rand in Instance/registry, so probabilistic guards/actions can't be seeded — and the simulator itself needs a seeded source for weighted choice.

Direction to explore (not committed)

  • StepOne(ctx, id) (FireResult, bool) on ActorSystem — pop one envelope, fire, return; Tick becomes for StepOne {}.
  • Decouple Send/Post (enqueue-only) from Deliver (enqueue+drain).
  • A unified Ready() []StepRef + Step(ref) across actors / timers / services so a simulator picks the next step by weight.
  • A Simulator harness owning {ActorSystem, Scheduler, ServiceRunner, FakeClock} plus a seeded RNG that drives both probabilistic guards/actions and the simulator's own weighted selection (one knob for reproducibility).

Notes

  • This is a design/exploration issue — start with a brief design pass before any kernel API changes (the Step/Ready/Simulator seam touches the public surface).
  • Determinism contract to document: host-side services/actions must stay pure (no real goroutines) for replay guarantees to hold.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestroadmapPlanned enhancement / roadmap itemspikeTime-boxed investigation or design exploration

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions