Skip to content

feat(continuum-core): headless persona host loop (#133 slice 13)#1511

Merged
joelteply merged 7 commits into
canaryfrom
feat/headless-persona-host-loop-slice-13
Jun 2, 2026
Merged

feat(continuum-core): headless persona host loop (#133 slice 13)#1511
joelteply merged 7 commits into
canaryfrom
feat/headless-persona-host-loop-slice-13

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

Summary

Slice 13 of #133 — the IPC-boot rewire that finally makes continuum-core host personas headlessly. Before this PR, the boot loop at ipc/mod.rs:1024-1089 called bootstrap_one per persona and logged a welcome — the persona was reachable via airc peers but never responded. After this PR, the boot path composes slices 7-12 into one substrate-managed pipeline; each planned persona ends up with a tokio task running her serve loop.

Implements the revised HEADLESS-PERSONA-HOST-LOOP design (PR #1510 + reviewer fixes).

What's in the PR

Four commits stacked, each independently reviewable:

  1. 71429daae Q1bootstrap_planned / derive_spawn_plan / build_profile take &Registry instead of &Arc<Registry>. Smaller change than migrating the OnceLock singleton; deref-coercion handles test sites transparently.
  2. 3f843aada Q3PersonaAircRuntimeRegistry extended: each slot now holds runtime + service_loop together. New methods: attach_service_loop, is_service_loop_finished, shutdown_slot (orderly abort + await + remove). Old keyspace duplication concern resolved.
  3. f940fa47d P2plan_for_tier returns single Helper for all tiers until slice 14 lands role-in-seed.json. Position-pairing hazard from PR docs(planning): HEADLESS-PERSONA-HOST-LOOP — slice 13 design #1510 review is named in the commit body; regression test pinned #[ignore].
  4. 8611bae56 boot composition — the IPC boot loop replaced with the substrate pipeline. Old welcome-log-only path deleted per [[organization-purity-as-we-migrate]].

Status against design doc

Applied ✅: Q1, Q3, P2, boot composition.

Deferred with TODO markers (sub-slices):

  • Q2 (detect_host_capability wiring): no production GpuMonitor constructor exists today — only tests build one. Slice 13 uses HwCapabilityTier::CpuOnly + HwTierCategory::Compat as a safe floor (which produces the LCD Helper anyway). TODO Build(deps-dev): Bump @typescript-eslint/eslint-plugin from 8.29.1 to 8.32.1 #52.
  • P1 (tokio::signal::ctrl_cRuntime::shutdown): per-slot shutdown is available via PersonaAircRuntimeRegistry::shutdown_slot and used by persona/instances/* IPC commands. Server-level signal handler is its own sub-slice.
  • P3 (ResourceBroker.acquire admission): current LCD case is 1 persona × ~500 MiB GGUF, well within all tiers' headroom. Becomes load-bearing when multi-persona returns in slice 14.

These three deferred items are named at top of the boot composition's doc-comment so the next author doesn't have to re-discover the gaps.

Cleanup model (critical, per PR #1510 review)

PersonaAircRuntimeRegistry::shutdown_slot is the orderly cleanup path. The cascade:

  1. Take JoinHandle from slot.
  2. abort() + await (drains).
  3. Slot drops → Arc<PersonaAircRuntime> drops → Arc<Airc> drops → inner.subscribers map drops → daemon-attached wire-subscriber tasks abort.

abort() alone is insufficient — the wire subscriber stays in Arc<Airc>.inner.subscribers until the Arc itself drops via registry removal. Both steps are required; the registry's shutdown_slot is the one place that does them in order.

Reference docs

Test plan

  • cargo test persona::airc_runtime_registry — 5 passed (new tests for attach/shutdown/is_finished failure modes)
  • cargo test persona::service_loop — 4 passed
  • cargo test persona::supervisor — 4 passed
  • cargo test persona::spawner — 9 passed + 1 ignored
  • cargo test persona::spawner_module — 5 passed + 1 ignored (slice 14 regression test)
  • cargo test persona::profile_builder — 4 passed
  • cargo test persona::host — 0 tests (composition seam, exercised by boot)
  • CI confirms cross-platform builds
  • Real-airc smoke (post-merge, in a continuum-core checkout with a running airc daemon)

Memories worth refreshing on review

  • [[no-fallbacks-ever]] — per-slot errors stay errored, no substitution
  • [[no-stdio-piping-for-process-ipc]] — substrate talks only via typed sockets + airc-lib
  • [[substrate-is-a-good-citizen-on-the-host]] — caller controls scheduling pool
  • [[observability-is-half-the-architecture]] — slot-level shutdown is honest about why it left
  • [[organization-purity-as-we-migrate]] — old welcome-log-only path DELETED, not kept beside
  • [[commands-are-dumb-daemons-are-smart]] — spawn helper trivial; registry + boot do the smart work

🤖 Generated with Claude Code

joelteply and others added 4 commits June 2, 2026 06:52
…take &Registry

Per HEADLESS-PERSONA-HOST-LOOP design doc Q1 (PR #1510 review found
the original recommendation was inverted): the substrate boot path
holds `&'static Registry` from `model_registry::global()`. Migrating
the singleton to `OnceLock<Arc<Registry>>` would touch every callsite
of `global()` and change the lifetime semantics throughout the crate.

Smaller change: drop the Arc requirement from the three functions
that took `&Arc<Registry>` and accept `&Registry` instead. Rust's
Deref coercion at the test call sites handles `Arc<Registry>` ↦
`&Registry` transparently — no test changes needed.

Functions updated:
- profile_builder::build_profile (slice 5)
- spawner::derive_spawn_plan (slice 6)
- spawner_module::bootstrap_planned (slice 8)

All slice 5-9 tests still pass:
  persona::profile_builder — 4 passed
  persona::spawner — 4 passed
  persona::spawner_module — 5 passed

Unblocks the slice-13 boot composition at `ipc::start_server` where
the registry is `&'static Registry`, not an Arc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ervice_loop

Per HEADLESS-PERSONA-HOST-LOOP design doc Q3 (PR #1510 review):
a PersonaSupervisor module storing JoinHandles keyed by persona_id
would duplicate the existing PersonaAircRuntimeRegistry keyspace.
Both modules would own per-persona lifetime info. Compression
failure per [[organization-purity-as-we-migrate]].

Resolution: extend the registry. Each slot becomes:

    PersonaSlot {
        runtime: Arc<PersonaAircRuntime>,
        service_loop: Mutex<Option<JoinHandle<Result<ServeOutcome, _>>>>,
    }

New methods:
- attach_service_loop(persona_id, handle) — supervisor wires the
  per-persona serve loop into the slot. Refuses silent overwrites.
- is_service_loop_finished(persona_id) — Q7's periodic crash poll.
- shutdown_slot(persona_id) — the orderly path: take JoinHandle →
  abort → await → remove slot. The slot drop cascades:
    Arc<PersonaSlot> → Arc<PersonaAircRuntime> → Arc<Airc> →
    inner.subscribers map drop → daemon-attach wire tasks abort.
  Per the cleanup-model section of the design doc, BOTH steps
  (abort + slot remove) are required — abort alone leaves the
  wire subscriber alive until the Arc drops via registry removal.
- ids() — Vec<Uuid> snapshot for the supervisor's poller without
  cloning N runtime Arcs.

Existing surface preserved for back-compat:
- register, get, get_by_agent_name, remove (sync), iter, len,
  is_empty all return runtime Arcs (not slot Arcs). The slot is
  internal.

Tests cover the failure modes:
- attach_service_loop_errors_when_no_slot
- is_service_loop_finished_returns_none_for_missing_slot
- shutdown_slot_returns_none_for_missing_persona

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per HEADLESS-PERSONA-HOST-LOOP design doc P2 (PR #1510 review
finding #2 — position-pairing broken from boot 2):

ResumeOrMintProvider::scan_personas_dir sorts directory entries
alphabetically (resume_or_mint_provider.rs:200). On boot 1 the
substrate mints personas in plan order [Helper, Coder] with
random-derived names (e.g. Maya for Helper, Bart for Coder). On
boot 2, scan yields them in alphabetic order [Bart, Maya] —
position-pairing against [Helper, Coder] flips the roles. Bart
becomes Helper when he was Coder. Role identity flipped silently.

The hazard exists in slice 8's bootstrap_planned today but doesn't
manifest because nothing depends on (persona_id, role) yet. Slice
13 IS that consumer (cognition + supervisor both observe the role).
Without a fix, slice 13 ships with a latent boot-2 regression.

Fix shape:
- plan_for_tier returns ONE Helper for all tiers until slice 14.
- TODO marker names slice 14 as the load-bearing fix
  (role-in-seed.json + RoleAwareProvider).
- Existing test `compat_tier_plans_helper_and_coder_on_lcd` renamed
  to `compat_tier_plans_single_helper_on_lcd` with updated invariant.
- New `slice_14_restores_helper_plus_coder_for_compat` test pinned
  `#[ignore]` until slice 14 — it's the spec slice 14 has to
  satisfy. Going red on the ignore-removal date is the design's
  reminder.
- bootstrap_planned_exhausted_provider_errors_with_slot_info updated:
  `required` field now 1, not 2.

Net result: slice 13's substrate hosts ONE Helper per tier through
the managed path. Same coverage the demo binary currently provides,
but composed via the substrate. Slice 14 reopens the multi-role
case once role identity is durable across boots.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…133 slice 13)

The moment-of-truth. Before this commit, the IPC boot loop at
`ipc/mod.rs:1024-1089` called `bootstrap_one(&intent)` per
ResumeOrMintProvider output and only LOGGED a welcome. The persona
was reachable via `airc peers` but never responded — a mute citizen.

After this commit, the boot path composes:

  PersonaSpawnerModule::plan_for_tier (slice 7)
    → bootstrap_planned (slice 8): mints/resumes airc identities
    → materialize_adapters (slice 9): builds inference adapters
    → spawn_persona_service (slice 12): runs serve_persona_loop
    → PersonaAircRuntimeRegistry::attach_service_loop (slice 13 Q3):
      parks the JoinHandle in the slot alongside the runtime

Each planned persona ends up with a tokio task running her cognition
path. The substrate hosts personas headlessly — no `airc_chat_demo`
in the inner ring.

Status against the design doc HEADLESS-PERSONA-HOST-LOOP.md:

APPLIED:
- P2: plan_for_tier returns single Helper (separate commit f940fa4).
- Q1: bootstrap_planned takes &Registry (separate commit 71429da).
- Q3: registry slot owns runtime + service_loop (commit 3f843aa).
- Boot composition collapses ~65 lines of inline bootstrap-only loop
  into ~115 lines of substrate composition using the existing slice
  primitives. Per [[organization-purity-as-we-migrate]], the old
  welcome-log-only path is DELETED, not kept alongside.

DEFERRED with TODO markers:
- Q2 (detect_host_capability wiring): the existing free function at
  cognition/host_capability_probe.rs:87 takes &dyn GpuMonitor +
  &System. No production code constructs a GpuMonitor today — only
  tests do. Slice 13 uses HwCapabilityTier::CpuOnly + HwTierCategory::
  Compat as the safe floor (the LCD Helper Qwen2.5-0.5B works for
  all tiers). TODO #52 cited for when GpuMonitor construction lands.
- P1 (tokio::signal::ctrl_c → Runtime::shutdown): the per-slot
  shutdown is available via `PersonaAircRuntimeRegistry::shutdown_slot`
  and exercised by persona/instances/* IPC commands. The server-
  level signal handler is its own sub-slice.
- P3 (ResourceBroker.acquire admission): current LCD case is 1
  persona × ~500 MiB GGUF, well within all tiers. Becomes load-
  bearing when multi-persona returns in slice 14.

Tests:
- 31 tests across slices 5-13 all green (registry, service_loop,
  supervisor, spawner, spawner_module, profile_builder, host).
- No new tests in this commit — the boot composition is the
  integration point; the integration test requires a stub
  PersonaInstanceManagerModule (slice 13 follow-up).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@joelteply

Copy link
Copy Markdown
Contributor Author

Adversarial review of slice 13 (PR #1511) — verdict: REQUEST_CHANGES

Traced the cleanup cascade, the Q1 deref-coercion claim, the spawn/attach race, the bootstrap_planned early-return path, the airc_chat_demo regression surface, and the airc-lib subscriber-drop chain. Two blocking findings, four advisory.


BLOCKER 1 — leaked tokio task on attach_service_loop failure (ipc/mod.rs:1147-1166)

let handle = spawn_persona_service(hosted, runtime, ServeOptions::default(), rt_handle_for_spawn.clone());
if let Err(e) = registry_for_lookup.attach_service_loop(persona_id, handle).await {
    tracing::error!(... "attach_service_loop failed — persona will not respond on the grid");
    failed_count += 1;
    continue;                       // ← `handle` dropped, but JoinHandle::drop DETACHES, not aborts
}

JoinHandle::drop does not abort the task; it detaches it. So on either error path —

  • Err("no slot"): invariant violation, but the bug below is real even so.
  • Err("already attached"): a prior attach_service_loop succeeded and is still running. We just spawned a second serve_persona_loop for the same persona, the error log claims "will not respond", and now the substrate has two competing service loops on one airc identity, neither tracked by shutdown_slot.

The "persona will not respond" log line is a lie when this branch fires — the loop is alive, it just has no registry handle.

Fix: abort + await before continue, mirroring the orderly drain in shutdown_slot:

if let Err(e) = registry_for_lookup.attach_service_loop(persona_id, handle).await {
    // We own the handle returned by spawn; attach didn't take it.
    // Drop alone would detach (leak). Abort + drain explicitly.
    handle.abort();
    let _ = handle.await;
    tracing::error!(...);
    failed_count += 1;
    continue;
}

Caveat: handle was moved into attach_service_loop. Either change the signature to return the handle on error (Result<(), (JoinHandle, &'static str)>) or clone-friendly the spawn pattern. The former is cleaner — attach_service_loop already knows the failure was non-takeover, returning the orphan handle to the caller for proper drain.


BLOCKER 2 — bootstrap_planned partial-failure leaves orphan registry entries (ipc/mod.rs:1099-1112, spawner_module.rs:307-344)

bootstrap_planned iterates the plan, calling instance_manager.bootstrap_one(&intent) per slot. bootstrap_one has the side-effect of inserting the new persona into the registry. If slot 3 of 5 fails its mint, the function returns Err(BootstrapPlannedError::AircBootstrap{...}) after slots 0, 1, 2 have already been bootstrapped and registered.

The boot loop's response (ipc/mod.rs:1104-1111):

Err(e) => {
    tracing::error!(... "slice 13 boot: bootstrap_planned failed — no personas hosted. Server stays up; fire `persona/instances/bootstrap` manually after resolving the underlying issue.");
    return;
}

The error message says "no personas hosted" but that's misleading — slots 0..2 are registered with no service loop attached. They occupy registry slots, will show up in persona/instances/list, and are mute on the grid until either:

  • The operator runs persona/instances/bootstrap (does that re-trigger service-loop attach for already-registered personas? Or does it skip them because they're "already present"? The PR doesn't say.)
  • The operator restarts continuum-core.

This is a genuine substrate state divergence. Two options:

  1. Cleanup on error: on bootstrap_planned error, iterate registry_for_lookup.ids() for any persona registered during this boot run and call shutdown_slot to roll back. Requires bootstrap_planned to return partial success or the boot loop to compare pre/post registry IDs.
  2. Honest documentation + ops path: explicitly state in the error log "N personas are registered but mute; re-fire persona/instances/bootstrap to wire their service loops" — and verify that command actually does the attach step for already-registered personas.

Either is acceptable. The current "no personas hosted" log is provably wrong if any prior slot in this run succeeded mint. Per [[observability-is-half-the-architecture]], shutdown should be honest about why it left; same principle applies to partial-boot.


ADVISORY 3 — missing debug_assert!(plan.len() <= 1) on P2 invariant

The commit body for f940fa47d names the position-pairing hazard explicitly, and the spawner_module test asserts plan.len() == 1. But the boot composition does not encode this. A future refactor that accidentally returns [Helper, Coder] from plan_for_tier would break boot 2's role-pairing silently — there's no runtime tripwire.

Fix location: top of plan_for_tier in spawner_module.rs:69, after the existing comments:

let result = vec![DesiredRole { role: RoleId::Helper, model_id: "...".to_string() }];
debug_assert!(
    result.len() <= 1,
    "slice 14 must update plan_for_tier AND ResumeOrMintProvider's role mapping atomically; \
     see PR #1510 review finding #2 (position-pairing hazard)"
);
result

Putting the assert at the producer (not the consumer) means the next slice-14 author trips it the moment they bump the vector and forces them to engage with the doc-comment context.


ADVISORY 4 — concurrent shutdown_slot for same persona_id is correct but silently asymmetric

Traced the race manually (airc_runtime_registry.rs:199-220):

  • T1 and T2 both call shutdown_slot(p).
  • Both inner.get(&p)?.clone() succeed (DashMap Ref guard drops immediately after clone).
  • T1 wins the mutex, loop_slot.take() returns Some(handle). T1 aborts + awaits.
  • T2 acquires mutex, take() returns None. T2's if let Some(handle) branch is skipped — no double-await, no panic.
  • T1's inner.remove(&p) succeeds. T2's inner.remove(&p)? returns None, function returns None.

This is correct (no panic, no double-abort). However, T2's caller sees None and likely interprets it as "persona wasn't registered" — but really it means "another caller raced you to shutdown." The doc-comment claims None only when "the persona wasn't registered." That's now imprecise.

Fix: amend the doc-comment to acknowledge the race, OR (better) make shutdown_slot idempotent in the observable sense by returning Some(runtime_arc) even when only the second caller — already harder since the slot is gone, but the runtime Arc could be cloned during the first caller's take() window. Not blocking; doc-comment update is enough.


ADVISORY 5 — finishing tier displayed in boot log doesn't expose Q2/TODO #52 degradation

On an M5 Pro 128 GB, the boot logs will say "🌐 Substrate boot composition complete (slice 13) — 1 citizen(s) hosted". Nothing in the runtime-visible output says "this hardware could fit Qwen2.5-14B but we're running 0.5B because TODO #52." The source comment is informative; the operator output is silent.

Fix: add host_capability = ?host_capability and tier_category = ?tier_category to the summary tracing::info! at ipc/mod.rs:1192, plus a one-line tracing::warn! if the hardcoded fallback is in effect (which it always is until TODO #52). Operators on capable hardware should see a visible nudge in their logs.


ADVISORY 6 — missing #[ignore] skeleton for boot-composition integration test

spawner_module.rs:421 (the slice_14_restores_helper_plus_coder_for_compat #[ignore] test) is the exact precedent. The PR's "integration test requires a stub PersonaInstanceManagerModule" is real, but a pinned #[ignore = "tracks slice 13 follow-up: stub PersonaInstanceManagerModule"] skeleton in ipc/mod.rs (or a new tests/headless_boot.rs) showing the intended test shape would:

  • Document the expected boot contract as code.
  • Give the next author an obvious activation site.
  • Make grep-for-ignored-tests show the gap.

The supervisor.rs slice-9 tests already exercise materialize_adapters with OkFactory/ErrFactory stubs — same pattern applies for boot composition with stub identity provider + stub PersonaInstanceManagerModule.


Verified clean (no findings)

  • Q1 deref coercion&Arc<Registry>&Registry works as expected at all four call sites (spawner.rs:97, profile_builder.rs:248+, spawner_module.rs:356, ipc/mod.rs:1098). Tests use Arc<Registry> from registry_with_lcd(); production uses &'static Registry from model_registry::global(). Both compile via coercion. No subtle copy/move bug.

  • Arc<Airc> drop cascade — confirmed against airc-lib f6ed190 transport.rs:27-33: IngestTask::Drop::drop calls handle.abort(). airc.rs:135-148 shows subscribers live inside Arc<AircInner>; last-Arc-drop tears them down. The doc-comment chain in airc_runtime_registry.rs:53-62 is accurate. Caveat: any extant registry.get(p) clone holds the runtime Arc past shutdown_slot return; wire subscribers stay alive until the last clone drops. Tail-window messages have no consumer (service loop aborted) — small leak, observable only as queued events in a dying broadcast channel. Acceptable but worth noting.

  • airc_chat_demo.rs — uses PersonaAircRuntime::from_attached directly, never touches the registry, never calls derive_spawn_plan/build_profile/bootstrap_planned. Q1+Q3 changes are invisible to the demo. Still builds.

  • LlamaCppPersonaAdapterFactory symbol pathcrate::persona::supervisor::LlamaCppPersonaAdapterFactory resolves: mod.rs:44 has pub mod supervisor; supervisor.rs:74 has pub struct LlamaCppPersonaAdapterFactory. Same for PersonaAdapterFactory, materialize_adapters, HostedPersona. Boot loop's use imports are valid.

  • attach_service_loop / shutdown_slot race — mutex serializes; no JoinHandle reentrancy. (See Advisory 4 for the doc-comment imprecision.)

  • DashMap deadlock surface — all inner.get(...) sites drop the Ref guard before any await or any subsequent inner.remove. No shard-level deadlock.


Verdict: REQUEST_CHANGES

Blockers 1 + 2 are correctness bugs in the boot loop's error paths. Advisory 3 is a one-line debug_assert! that costs nothing and prevents a slice-14 regression silently mis-pairing roles. Advisories 4–6 are polish that should land alongside but aren't strictly blocking.

The substrate composition itself (Q3 slot model, Q1 reference simplification, host helper composition) is sound. The slice-12 → slice-13 transition is well-staged and the cleanup contract in shutdown_slot is correct under the documented usage. The remaining work is making the boot loop's error paths as orderly as the happy path.

Reference docs read: HEADLESS-PERSONA-HOST-LOOP.md, CBAR-SUBSTRATE-ARCHITECTURE.md, airc-lib f6ed190 transport.rs + airc.rs.

— adversarial sentinel review

…sonas

Two blockers from the adversarial review:

BLOCKER 1 — JoinHandle leaked on attach_service_loop failure.
  `JoinHandle::drop` detaches rather than aborts. When
  `attach_service_loop` returned an error and the boot loop did
  `continue`, the spawned `serve_persona_loop` kept running
  untracked. The boot log lied "persona will not respond on the
  grid" while in fact the loop did respond, just outside the
  registry's view (so `shutdown_slot` couldn't find it). Worse on
  `"already attached"`: two loops competed for the same persona.
  Fix: `attach_service_loop` signature changed to
  `Result<(), (JoinHandle, &'static str)>` so the caller can
  orderly-drain (abort + await). Boot loop updated. Existing test
  updated to assert the handle comes back live (proves no implicit
  detach) before the test drains it.

BLOCKER 2 — Partial-bootstrap orphans on bootstrap_planned error.
  `bootstrap_planned` registers each persona via `bootstrap_one`
  BEFORE the next slot's mint runs. If slot k fails, slots 0..k-1
  are already in the registry but with no service loop attached —
  mute citizens. The boot loop early-returned with "no personas
  hosted" but they were. Fix: on `bootstrap_planned` error, the
  boot loop calls `registry.ids()` to get the partially-registered
  set and `shutdown_slot`s each. `shutdown_slot` handles "no
  service loop attached" gracefully (handle_opt is None) and drops
  the Arc cleanly — same orderly cleanup path as the normal
  shutdown, just no loop to abort. Error log updated to report
  `orphans_drained` count honestly.

Advisory 3 — `debug_assert!(plan.len() <= 1)` at the producer.
  P2 invariant was named in the commit body + tested in
  `compat_tier_plans_single_helper_on_lcd` but had no runtime
  tripwire. Added the debug_assert at `plan_for_tier`'s producer
  site with a TODO marker tying it to slice 14 (when the assert
  comes out alongside RoleAwareProvider + role-in-seed.json).

Verification:
  cargo test persona::airc_runtime_registry → 5 passed
  cargo test persona::spawner_module → 5 passed + 1 ignored
  cargo build --lib clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@joelteply

Copy link
Copy Markdown
Contributor Author

Addressed all three findings in commit 2abd596.

BLOCKER 1 (leaked tokio task on attach_service_loop failure) — Fixed at two layers:

  1. attach_service_loop signature changed to Result<(), (JoinHandle, &'static str)> — the handle comes back on error so the caller can drain it.
  2. Boot loop at ipc/mod.rs calls returned_handle.abort() + let _ = returned_handle.await before continuing to the next slot.

Existing test was rewritten to spawn a long-sleep task and assert !returned_handle.is_finished() before the test drains it — proves no implicit detach happened.

BLOCKER 2 (partial-bootstrap orphans) — Fixed in the bootstrap_planned error path. Now does:

let already_registered = registry_for_lookup.ids();
let orphans = already_registered.len();
for orphan_id in already_registered {
    let _ = registry_for_lookup.shutdown_slot(orphan_id).await;
}
tracing::error!(orphans_drained = orphans, "...");

shutdown_slot handles "no service loop attached" gracefully (handle_opt is None), so it cleanly drops the Arc just like the orderly path would — wire subscriber tears down via Arc<Airc>::drop. Honest error log now reports orphans_drained = N instead of "no personas hosted."

Advisory 3 (debug_assert at producer) — Added at plan_for_tier in spawner_module.rs:

debug_assert!(
    plan.len() <= 1,
    "plan_for_tier returned {} roles before slice 14's role-in-seed.json \
     landed — position-pairing against ResumeOrMintProvider's alphabetic \
     disk order would flip role identity on boot 2. See #133 slice 14.",
    plan.len()
);

TODO marker right below the assert points slice 14 author at the right place to remove the tripwire alongside the multi-role plan restoration.

Verification:

  • cargo test persona::airc_runtime_registry → 5 passed (incl. updated drain test)
  • cargo test persona::spawner_module → 5 passed + 1 ignored
  • cargo build --lib clean

The three non-blocking advisories (registry doc-comment precision, missing tier in boot summary, missing #[ignore]-pinned integration skeleton) — happy to address them in this PR if you want, or land them as slice 13.5 follow-ups since the doctrine ones (tier visibility + integration skeleton) are clearly substrate-shape questions that benefit from their own PR threads.

@joelteply

Copy link
Copy Markdown
Contributor Author

Re-review of slice 13 fixup (commit 2abd59684) — verdict: APPROVE

Traced each fix to source, validated tests, and stress-tested the new code path for follow-on hazards. Both BLOCKERs are properly resolved (root cause, not symptom). Advisory 3 is properly placed. One new minor finding raised below as NIT (non-blocking).


BLOCKER 1 — leaked tokio task on attach_service_loop failure → RESOLVED

src/workers/continuum-core/src/persona/airc_runtime_registry.rs:174-190 — signature changed to Result<(), (JoinHandle<...>, &'static str)>. Both error paths ("no slot", "already attached") return the handle back so the caller can orderly-drain. Doc-comment at L147-173 names the orderly-drain pattern with the canonical match block.

src/workers/continuum-core/src/ipc/mod.rs:1174-1200 — boot loop destructures Err((returned_handle, reason)), calls returned_handle.abort() + let _ = returned_handle.await BEFORE the continue. Order is correct: drain first, log second, increment failed_count, continue. Log message updated from the prior lie ("will not respond on the grid") to honest "spawned task drained. Persona registered but unattended."

Test at airc_runtime_registry.rs:313-332 proves the no-implicit-detach property — spawns a 60s-sleep task, attaches to a non-existent slot, asserts !returned_handle.is_finished() on return, THEN drains. If a future refactor inadvertently let the handle drop on the error path, this test goes red.

Verified: cargo test --features metal,accelerate persona::airc_runtime_registry → 5 passed.

BLOCKER 2 — partial-bootstrap orphans → RESOLVED

src/workers/continuum-core/src/ipc/mod.rs:1113-1142 — error branch of bootstrap_planned now calls registry_for_lookup.ids(), iterates the snapshot, shutdown_slots each, THEN returns. Order verified by reading the source: drain loop completes before return; on L1141.

I traced shutdown_slot against a None-service-loop slot end-to-end:

  • airc_runtime_registry.rs:217-238loop_slot.take() yields None, if let Some(handle) is skipped, falls through to self.inner.remove(&persona_id), drops the Arc<PersonaSlot> (which drops Arc<PersonaAircRuntime>Arc<Airc> → wire subscriber teardown). The orderly path with no loop to abort. Comment in fix accurately describes this.

Premise verification — bootstrap_planned (spawner_module.rs:334-360) calls instance_manager.bootstrap_one(&intent) PER ITERATION; bootstrap_one (persona_instance_manager.rs:167-219) calls self.registry.register(runtime) on L217 BEFORE returning Ok. So if iteration k errors, iterations 0..k-1 are already in the registry without service loops. BLOCKER 2's premise is real and the fix targets the actual orphans.

Advisory 3 — debug_assert!(plan.len() <= 1) at producer → RESOLVED

src/workers/continuum-core/src/persona/spawner_module.rs:106-128 — assert placed at the producer's return site (after let plan = vec![...], before plan). Message names the slice-14 + alphabetic-disk-order + position-pairing hazard explicitly. TODO marker right below at L130-131 pairs assert removal with slice 14's RoleAwareProvider work. Verified: every_tier_plans_single_helper (L416-432) exercises all five tiers, all pass under the assert in debug + test builds.


New minor issues from the fixup

NIT 1 (non-blocking) — drop(slot_ref) is load-bearing; other methods are safe by coincidence.

airc_runtime_registry.rs:179-184 — the new code shape (let Some(slot_ref) = ... else { return }; then slot_ref.clone();) creates a named DashMap Ref (read guard) that survives until the end of scope. Without the explicit drop(slot_ref) on L183, the read guard would be held across the slot.service_loop.lock().await on L184 — a genuine DashMap-guard-across-await deadlock hazard. The drop IS correct.

The reason is_service_loop_finished (L197-201) and shutdown_slot (L217-238) didn't need this dance: both use self.inner.get(...)?.clone(); in a single statement, where the Ref temporary is dropped at the semicolon. The new shape uses a let else + named binding, which extends the Ref's lifetime. Worth a one-line comment near L183 noting that the explicit drop is required by this code shape (the rest of the file gets away with it because of statement-temporary scoping rules) — otherwise a future refactor that "cleans up" the explicit drop reintroduces the deadlock silently. Not a blocker.

NIT 2 (non-blocking) — orphans = already_registered.len() is the snapshot count, not the actually-drained count.

ipc/mod.rs:1123-1140 — between registry.ids() and the drain loop, a concurrent persona/instances/bootstrap IPC call could register a NEW persona. That persona isn't in already_registered, so it won't be drained — which is correct (it's not part of the failed plan). But the inverse is also possible: a concurrent persona/instances/shutdown could remove one of already_registered's entries before our shutdown_slot reaches it, making shutdown_slot return None. The let _ = discards that signal, and the log reports orphans_drained = N where N is the snapshot count, not actual drains. Both edge cases are vanishingly unlikely (the boot task runs single-threaded against a server that's only just come up), but if you want strict accounting, count shutdown_slot's Some returns instead. Land in 13.5 or accept the tiny lie — your call.

TOCTOU question raised by the reviewer: answered above — a concurrent removal between drop(slot_ref) and slot.service_loop.lock().await is possible. The Arc<PersonaSlot> we hold keeps the slot data alive (the Mutex is still valid). The attach succeeds (sets Some(handle)), but nobody can find this slot via get anymore — a real orphan. In practice the only concurrent remover is shutdown_slot from another task; the window is microseconds. Not a blocker for slice 13's single-task boot, but if persona/instances/shutdown and the boot attach can ever race, the right fix is to do a registry-presence recheck after the lock is acquired and refuse if the slot was concurrently removed.


Non-blocking advisories from prior review

The author offered to address registry doc-comment precision, missing tier-in-boot-summary, and #[ignore]-pinned integration test skeleton "in this PR or as slice 13.5." Boot summary at ipc/mod.rs:1226-1233 still omits tier_id = %tier_id and tier_category = ?tier_category — would be a one-line addition.

Recommendation: ship slice 13 NOW. The three advisories are doctrine-shaped (observability + integration-skeleton precedent) and benefit from their own PR thread. NIT 1 + NIT 2 + the three prior advisories can land together as slice 13.5.

Verification run

  • cargo test --features metal,accelerate persona::airc_runtime_registry — 5/5 passed (incl. updated drain test).
  • cargo test --features metal,accelerate persona::spawner_module — 5/5 passed + 1 #[ignore]d for slice 14.
  • Build clean.

Verdict: APPROVE. Both BLOCKERs fixed at the root, advisory 3 properly placed. Ship it.

joelteply and others added 2 commits June 2, 2026 09:48
… in correct channel

PR #1511 integration test on Joel's Intel Mac revealed:
PersonaAircRuntime::bootstrap was calling
`airc.join(&default_room.as_uuid().to_string())`, which passes the
UUID's string representation into airc-lib's `ChannelName::new(name)`
— which DERIVES a channel UUID from the string. The persona landed
in derived channel `5d33e2a7…` while the operator's `airc room`
points at canonical `11c1a7ac…`. Same room, two channels, never see
each other.

The demo binary worked around this in slice 11 by using
`from_attached` (joining by name manually first), but the
substrate-managed path through PersonaInstanceManagerModule still
called the broken bootstrap.

Fix threads through 4 layers:
- airc/discovery.rs: new `discover_default_room_name()` parses
  `room: <name>` line from `airc room` stdout. Mirrors the existing
  `discover_default_channel()` shape; env override
  `AIRC_DEFAULT_ROOM_NAME` for tests/operators.
- airc/mod.rs: re-export the new function.
- modules/airc.rs: AircModule stores `attach_room_name: Option<String>`;
  `default_room_name() -> Option<&str>` getter. Loud warn if discovery
  fails — names the failure mode so operators see what's broken.
- modules/persona_instance_manager.rs:
  PersonaInstanceManagerModule::new takes Option<String> room name;
  bootstrap_one passes it to PersonaAircRuntime::bootstrap.
- persona/airc_runtime.rs::bootstrap: joins by name if Some,
  falls back to UUID-as-string + WARN if None.
- ipc/mod.rs: discovers + threads through.

Integration trace confirmed (slice13-server.log line 1078ish):
  joined_room=11c1a7ac-cb85-5ca0-a5b4-2847280ea3fa
  room_name=continuum

Test sites updated to pass `None` (4 in persona_instance_manager.rs
tests, 1 in spawner_module.rs).

Status after this fix:
✅ Substrate boot composition fires
✅ Persona hosted as substrate-managed Helper
✅ Joins canonical airc channel
✅ Receives operator messages via subscribe
✅ Service loop invokes inspect_persona_rag_with_inference
❌ Inference fails with `llama_decode returned -1` on mac-cpu-only —
   separate inference-layer bug, tracked as task #131-adjacent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ontext analog

Per Joel 2026-06-02: "design got out of control due to you failing to
use a shared object for all state info required for a persona OR
user. This is the airc user. Or base user with airc props." And
"make this pattern regular, ubiquitous &ctx, store references, make
it elegant."

What changed:
- HostedPersona is now PersonaContext (with `pub type HostedPersona =
  PersonaContext` for slice-9-era callers). The struct holds the
  persona's full context — identity (airc citizen facts), role,
  inference profile, adapter, runtime — and is passed by reference
  (`&ctx`) to every persona-scoped function.
- HostedPersona.instance renamed to `.identity`: it's the airc user
  identity (peer_id, agent_name, home, default_room, source).
- HostedPersona.profile (new) carries the PersonaInferenceProfile
  directly — single source of truth for inference shape. Replaces
  the prior context_window-only field. Downstream code reads
  `ctx.profile.context_length` etc. — no copied fields, no derived
  constants outside the named derivation site.
- HostedPersona.runtime (new) holds `Option<Arc<PersonaAircRuntime>>`.
  Production always Some (filled by materialize_adapters via the
  registry_lookup closure). Tests construct with None — the proper
  AircHandle trait abstraction lands as part of task #142.
- spawn_persona_service signature simplified — no separate runtime
  arg (`ctx.runtime` carries it).
- materialize_adapters takes a `runtime_lookup` closure so the
  supervisor folds the live runtime into each context at the
  composition seam.
- RagInspectionRequest::for_persona(persona_id, name, now_ms,
  &profile) is the single derivation site. The old `defaults_for`
  (32k hardcoded budget) stays for back-compat but is documented as
  legacy; service_loop uses `for_persona` exclusively.

Why this matters (the bug it fixes):
- PR #1511 integration trace caught `llama_decode returned -1` on
  Intel Mac mac-cpu-only: the LCD model was loaded at n_ctx=2048
  (Compat tier per profile_builder), but RagInspectionRequest::
  defaults_for was setting context_window=32_768. The RAG layer
  built a 32k-budget prompt that overflowed the 2k KV cache.
- The structural fix is the &ctx doctrine: profile is the single
  source, derivation happens in one named function.

Task #142 (BaseUser hierarchy) is the natural follow-up: extract
the airc props (identity + runtime) into a `BaseUser` trait that
persona/human/web actor contexts all derive from. Same shape per
[[personas-are-citizens-airc-is-identity-provider]].

Verification:
- cargo build --bin continuum-core-server clean
- cargo build --lib --tests clean
- Substrate boot composition still hosts Paige in correct channel
  (continuum, 11c1a7ac…)
- Service loop fires inference (slow on CPU; iteration target)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@joelteply joelteply merged commit 9cc0571 into canary Jun 2, 2026
2 checks passed
@joelteply joelteply deleted the feat/headless-persona-host-loop-slice-13 branch June 2, 2026 16:26
joelteply added a commit that referenced this pull request Jun 2, 2026
…+ correct cleanup model

PR #1510 re-review (after the first revision) caught:

1. **Cleanup model still wrong** — claimed `ensure_wire_subscriber`
   / `inner.subscribers` (was the 428f928 worktree I was reading,
   not the pinned f6ed190). Real mechanism in f6ed190 is
   `EventStream::EventStreamInner::Daemon` holding
   `DaemonAttachGuard` (`stream.rs:25-68`). On `EventStream` drop,
   guard drops, per-channel attach `JoinHandle`s abort.
   **`.abort()` ALONE is sufficient** against the pinned rev. The
   `shutdown_slot` registry-remove is for in-substrate Arc state
   hygiene, not for daemon-side teardown. Section rewritten with
   the correct mechanism, file:line cited.

2. **"Hard prerequisites" misrepresented impl scope** — listed
   P1 (ctrl_c→shutdown), P3 (broker), Q5 (BootSummary publish),
   Q7 (5s poller) as "MUST land" but the implementation explicitly
   deferred all four. Restructured to "Slice 13 scope vs deferred
   follow-ups" with honest splits: shipped vs deferred + why each
   deferral is acceptable (single-persona LCD is in-budget;
   `shutdown_slot` available via IPC commands; cleanup model now
   shows `.abort()` is sufficient).

3. **"After" snippet didn't match impl** — used
   `BootSummary::default()` / `bus.publish` / `summary.failed.push`,
   but the impl uses `tracing::info!(hosted=N, failed=N)` counters.
   Rewrote the snippet to match what shipped. Notes Q5
   (BootSummary publish) as deferred.

Plus the implementation status section is now accurate:
- Q1, Q3, P2, Q2-partial: shipped ✅
- P1, P3, Q5, Q7, Q2-full: deferred to slice 13.5+ ❌
- Integration polish in #1511: room-name discovery + LCD model
  registry entry + PersonaContext rename + RagInspectionRequest::
  for_persona derivation site (the `&ctx` doctrine)
- Integration validation: Paige replied in continuum room

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply added a commit that referenced this pull request Jun 2, 2026
Elegance pass — extract-class refactor pulling the 170-line inline
boot composition out of `ipc/mod.rs::start_server` into a named
class. Per Joel 2026-06-02: "Must have elegance obsessively. Like a
Java dev. NO SHAME. It's better."

What changed:

1. `PersonaSpawnSupervisor` struct (in `persona/host.rs`) owns the
   spawner / instance_manager / registry / factory / tier_id /
   model_registry / rt_handle inputs. Construct once at boot; call
   `.spawn_all(&mut provider)` to produce a `BootSummary`.

2. `BootSummary { hosted, failures }` + `BootSlotFailure {
   slot_index, role, persona_id, reason }` — typed result structs.
   Replace the inline `let mut hosted_count: usize = 0` / `let mut
   failed_count: usize = 0` counters with a real value type the
   substrate can publish (`persona:boot:summary` event — Q5 of the
   design doc, deferred to slice 13.5+) and downstream clients
   (web, jtag CLI) can read with the same shape per
   [[clients-are-rust-too-thin-node-web-shell]].

3. The supervisor's `spawn_all` method handles every previously-
   inline concern:
   - `bootstrap_planned` failure → orderly-drain orphans + return
     summary with synthetic failure row
   - `materialize_adapters` with runtime_lookup closure (so
     `ctx.runtime` is populated from the registry)
   - Per-slot `spawn_and_attach` private method handles
     `spawn_persona_service` + `attach_service_loop` + handle drain
     on attach-failure (the BLOCKER 1/2 fixes from PR #1511 are
     preserved, just relocated)

4. IPC boot collapses from ~170 lines of inline code to ~30 lines:
   construct supervisor → spawn task → build provider → call
   `supervisor.spawn_all(&mut provider).await` → log summary.

5. Helper `supervisor_error_facts` centralizes pulling
   `(slot_index, role)` out of `SupervisorError`'s two variants —
   the kind of trivial-but-DRY private fn Java/dotnet shops write
   without apology.

Why this matters (the doctrine):
- The IPC server boot concern and the persona spawn concern had
  different lifetimes and different test needs. Mixing them in
  one function violated "one logical decision, one place"
  ([[compression-principle]]).
- `PersonaSpawnSupervisor` is now unit-testable in isolation. The
  IPC server's test surface shrinks. Slice 14's RoleAwareProvider
  + multi-persona work has one named insertion point.
- `BootSummary` is the structured event payload the design doc's
  Q5 named. Once `RoleId` derives `TS` (slice 14), the struct gets
  the ts-rs export and web/jtag clients read it directly per the
  Rust-first-clients doctrine.

Verification:
- cargo build --lib --tests clean
- cargo test persona::host — 2 passed (BootSummary attempted +
  serde camel-case)
- cargo test persona::supervisor — 4 passed (unchanged)
- cargo test persona::service_loop — 4 passed (unchanged)
- IPC boot composition shrinks ~140 lines; supervisor's spawn_all
  is now the single named extraction point for slice 13.5 / 14
  changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply added a commit that referenced this pull request Jun 3, 2026
…all (#1519)

* refactor(persona): `&ctx`-pure RAG request + ctx-derived tracing span

Elegance pass on the patterns the slice-13 work established. Per
Joel 2026-06-02: "we are on sort of an elegance refactor and then
for improved reliability and speed."

What changed:

1. `RagInspectionRequest::for_ctx(&ctx, now_ms)` — new constructor
   that takes the persona context directly. Replaces the 4-arg
   `for_persona(persona_id, name, now_ms, &profile)` at the call
   site. `for_persona` stays (it's the underlying derivation) but
   new code uses `for_ctx` to honor the substrate's `&ctx`
   doctrine ([[context-is-the-client-airc-token-is-identity]]):
   hand the context, not its parts.

2. `PersonaContext::span()` — new method that returns a
   `tracing::info_span!` tagged with `persona_id`, `agent_name`,
   `peer_id`, `role`, `tier`, `ctx_len`, `model`. The span derives
   from `&ctx` — no manual field threading at every log call site.

3. `serve_persona_loop` rewritten in two layers:
   - Outer entry function wraps the inner future with
     `.instrument(ctx.span())`. Every log line inside the loop
     inherits the persona's identity fields automatically.
   - Inner function drops the `let persona_id = hosted.identity.x`
     extractions; reads `ctx.identity.peer_id` etc. directly at use
     sites. Two internal `tracing::warn!` lines lose their
     persona_id/agent_name fields (now inherited from the span);
     they keep just per-turn delta (`lamport`, `error`).

Net effect:
- Field extraction count in service_loop drops from 3 manual extracts
  + 4 redundant tracing field annotations to 0.
- Log output gains persona_id + agent_name + role + tier + ctx_len
  + model on EVERY internal log line, automatically. The substrate's
  observability is now span-shaped, not manual.
- New code that needs a derived RAG request just writes
  `RagInspectionRequest::for_ctx(ctx, now)` — one arg vs four.

Why `.instrument` not `.entered`:
- `Span::entered` returns a non-Send RAII guard; tokio spawned
  futures need Send. The two-function split (outer thin wrapper
  with `.instrument`, inner async function) is the standard tracing
  pattern for spans across awaits.

Verification:
- cargo build --lib --tests clean
- cargo test persona::service_loop — 4 passed
- cargo test persona::supervisor — 4 passed

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(persona/host): extract PersonaSpawnSupervisor + BootSummary

Elegance pass — extract-class refactor pulling the 170-line inline
boot composition out of `ipc/mod.rs::start_server` into a named
class. Per Joel 2026-06-02: "Must have elegance obsessively. Like a
Java dev. NO SHAME. It's better."

What changed:

1. `PersonaSpawnSupervisor` struct (in `persona/host.rs`) owns the
   spawner / instance_manager / registry / factory / tier_id /
   model_registry / rt_handle inputs. Construct once at boot; call
   `.spawn_all(&mut provider)` to produce a `BootSummary`.

2. `BootSummary { hosted, failures }` + `BootSlotFailure {
   slot_index, role, persona_id, reason }` — typed result structs.
   Replace the inline `let mut hosted_count: usize = 0` / `let mut
   failed_count: usize = 0` counters with a real value type the
   substrate can publish (`persona:boot:summary` event — Q5 of the
   design doc, deferred to slice 13.5+) and downstream clients
   (web, jtag CLI) can read with the same shape per
   [[clients-are-rust-too-thin-node-web-shell]].

3. The supervisor's `spawn_all` method handles every previously-
   inline concern:
   - `bootstrap_planned` failure → orderly-drain orphans + return
     summary with synthetic failure row
   - `materialize_adapters` with runtime_lookup closure (so
     `ctx.runtime` is populated from the registry)
   - Per-slot `spawn_and_attach` private method handles
     `spawn_persona_service` + `attach_service_loop` + handle drain
     on attach-failure (the BLOCKER 1/2 fixes from PR #1511 are
     preserved, just relocated)

4. IPC boot collapses from ~170 lines of inline code to ~30 lines:
   construct supervisor → spawn task → build provider → call
   `supervisor.spawn_all(&mut provider).await` → log summary.

5. Helper `supervisor_error_facts` centralizes pulling
   `(slot_index, role)` out of `SupervisorError`'s two variants —
   the kind of trivial-but-DRY private fn Java/dotnet shops write
   without apology.

Why this matters (the doctrine):
- The IPC server boot concern and the persona spawn concern had
  different lifetimes and different test needs. Mixing them in
  one function violated "one logical decision, one place"
  ([[compression-principle]]).
- `PersonaSpawnSupervisor` is now unit-testable in isolation. The
  IPC server's test surface shrinks. Slice 14's RoleAwareProvider
  + multi-persona work has one named insertion point.
- `BootSummary` is the structured event payload the design doc's
  Q5 named. Once `RoleId` derives `TS` (slice 14), the struct gets
  the ts-rs export and web/jtag clients read it directly per the
  Rust-first-clients doctrine.

Verification:
- cargo build --lib --tests clean
- cargo test persona::host — 2 passed (BootSummary attempted +
  serde camel-case)
- cargo test persona::supervisor — 4 passed (unchanged)
- cargo test persona::service_loop — 4 passed (unchanged)
- IPC boot composition shrinks ~140 lines; supervisor's spawn_all
  is now the single named extraction point for slice 13.5 / 14
  changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(persona): AircCitizen trait — drop Option<Arc<PersonaAircRuntime>> from PersonaContext (#144)

Java-style "extract interface" on the substrate's airc-handle. Slice 13.5
elegance pass per Joel 2026-06-02 ("Must have elegance obsessively. Like
a Java dev. NO SHAME").

Before: PersonaContext.runtime: Option<Arc<PersonaAircRuntime>>. The
Option existed solely for test fixtures that couldn't easily build a
real PersonaAircRuntime; production code paid .expect("None is
test-only") on the hot path.

After: PersonaContext.runtime: Arc<dyn AircCitizen>. Tests use a typed
StubAircCitizen. Production upcoerces from PersonaAircRuntime, which
now impls AircCitizen + AircTranscriptReader. Rust 1.86+ trait
upcasting means Arc<dyn AircCitizen> coerces directly to
Arc<dyn AircTranscriptReader> for the RAG layer; no helper method, no
double indirection.

Trait surface (minimum viable):
- fn peer_id(&self) -> Uuid
- async fn subscribe(&self) -> Result<EventStream, AircError>
- async fn say(&self, text: &str) -> Result<EventId, AircError>
- AircTranscriptReader as supertrait (page_recent for the RAG layer)

What changed:
- persona/airc_citizen.rs (new): AircCitizen trait + StubAircCitizen.
- persona/airc_runtime.rs: PersonaAircRuntime impls AircCitizen +
  AircTranscriptReader; delegates to its internal Arc<Airc>.
- persona/supervisor.rs: PersonaContext.runtime drops the Option.
  materialize_adapters' runtime_lookup signature is now
  Option<Arc<dyn AircCitizen>>; missing runtime surfaces as typed
  SupervisorError::RuntimeMissing { slot_index, role, persona_id }
  per [[no-fallbacks-ever]].
- persona/airc_persona_conversation.rs: takes Arc<dyn AircCitizen>,
  calls trait methods directly (no runtime.airc() detour).
- persona/host.rs: spawn_persona_service drops the .expect; host's
  runtime_lookup upcoerces PersonaAircRuntime to AircCitizen for
  materialize_adapters.
- persona/service_loop.rs fake_hosted: runtime is now
  Arc::new(StubAircCitizen::new(peer_id)) instead of None.
- bin/airc_chat_demo.rs: dropped the Some(_) wrapping —
  Arc<PersonaAircRuntime> auto-coerces to Arc<dyn AircCitizen>.

Doctrine:
- [[personas-are-citizens-airc-is-identity-provider]]: AircCitizen IS
  the substrate's actor type — same trait for personas, humans
  (#142 BaseUser), browsers. The persona is one citizen; the human-
  via-jtag is another; the Claude-Code session is another.
- [[no-fallbacks-ever]]: no Option, no .expect, no silent default.
  RuntimeMissing is a typed error with persona_id named.
- [[context-is-the-client-airc-token-is-identity]]: PersonaContext IS
  the &ctx. Same shape compiles in tests + production.
- [[clients-are-rust-too-thin-node-web-shell]]: AircCitizen is the
  typed Rust primitive future jtag-CLI / web client / native client
  bind to.

Foundation for task #142 (BaseUser hierarchy) — each variant will
carry Arc<dyn AircCitizen> + kind-specific extensions (cognition for
persona, WebAuthn for human, tab state for browser).

Test plan:
- cargo build --lib --no-default-features --features
  livekit-webrtc,llama/mac-cpu-only — clean.
- cargo test --lib ... persona:: — 705/706 pass (the one flake is
  persona::evaluator::tests::test_all_gates_pass_normal_message, an
  unrelated CPU-jitter timing assertion that passes in isolation).
- Integration trace: deferred to PR-time verification.

Closes #144.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(architecture): LIFE-OF-A-PERSONA + source/drain anchor — close onboarding gap surfaced by external review

Two doc changes from an outside-perspective review (Gemini) of the
substrate, triaged per [[external-llm-reviews-extract-themes-discard-citations]]
— specific PR citations were fabricated, but two themes were real:

1. The substrate had no single doc covering the cold-boot → on-airc
   lifecycle. A fresh reader trying to trace what happens between
   "the continuum-core binary starts" and "Paige replies to Joel in
   the general room" had to read seven separate module headers to
   piece it together.

2. "Source/drain doctrine" was used in COGNITION-CACHE-HIERARCHY.md
   without anchoring what the drain actually IS — readers had to
   infer.

What changed:

- docs/architecture/LIFE-OF-A-PERSONA.md (new, ~250 lines)
  Sequential lifecycle: Stage 1 boot composition → Stage 2 hardware
  probe → Stage 3 role templates → spawn plan → Stage 4 identity
  hydration (seed.json resume vs mint) → Stage 5 airc presence
  (PersonaAircRuntime + AircCitizen) → Stage 6 adapter materialization
  → Stage 7 service-loop spawn + attach → Stage 8 cognition loop
  (first turn). Every stage names its Rust module + typed failure mode.
  Closes the operational onboarding gap.

  Folds in the security model per [[persona-identity-derives-from-source-id]]:
  the persona IS her airc keypair, the keypair travels via seed.json,
  the host hardware has a SEPARATE identity. No central identity
  broker. Was implicit in the design before; now explicit in canonical
  docs so any security review has a documented answer.

- docs/architecture/COGNITION-CACHE-HIERARCHY.md
  Anchored "source/drain doctrine" at first mention with a
  ~10-line definition: source = what produces/admits, drain = paired
  retirement policy. Linked to memory [[source-drain-is-the-universal-pattern]].
  Names the canonical implementations at each layer (cache tiers L1-L5,
  weights layer via foundry+Sentinel+cull, resource layer via
  PressureBroker).

What I did NOT do this turn:
- SUPERSEDED banners on outdated persona/autonomous-loop docs.
  Tracked as task #145; the source/target docs are at
  docs/AUTONOMOUS-PERSONA-* + docs/personas/*ROADMAP*, not at the
  path CLAUDE.md cites. Wants its own focused audit.
- "Citizen" anchor in CBAR/GENOME-FOUNDRY-SENTINEL canonical docs.
  Less load-bearing once persona/airc_citizen.rs (this branch's
  refactor) provides the Rust-side anchor.
- Floor-vs-ceiling resolution paragraph in INFERENCE-LANES-REALISTIC.
  Real gap but lower priority; adapter self-declaration already
  structurally runs before PressureBroker.

Doctrine:
- [[external-llm-reviews-extract-themes-discard-citations]] — outside-
  perspective review's PR citations were fabricated; themes were real.
  Discard citations; engage with themes.
- [[read-existing-docs-before-writing-new-ones]] — both edits surface
  pre-existing doctrine that wasn't documented at the canonical-doc
  layer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(persona/supervisor): lock in SupervisorError::RuntimeMissing behavior (review #1513)

Address reviewer finding: the AircCitizen extraction added
`SupervisorError::RuntimeMissing` but no test asserted it actually
fires when `runtime_lookup` returns None. Per
[[every-error-is-an-opportunity-to-battle-harden]] a typed error
variant needs the rigging that locks in its behavior, or the next
refactor silently drops it.

Two tests added to `supervisor::tests`:

1. `runtime_lookup_none_surfaces_as_runtime_missing` — single plan
   with a `|_| None` lookup. Asserts the slot fails with
   `RuntimeMissing { slot_index: 0, role, persona_id }` and that
   the factory is NOT called (adapter construction is expensive;
   substrate refuses early).

2. `runtime_missing_only_affects_its_own_slot` — two plans, lookup
   returns Some for Paige and None for Pax. Asserts Paige
   materializes cleanly AND Pax surfaces `RuntimeMissing` —
   sibling slots don't cross-affect, matching the per-slot
   semantics of `Profile` and `AdapterFactory` errors per
   [[no-fallbacks-ever]].

Both tests verified locally: 6/6 supervisor tests pass.

Reviewer: https://github.com/CambrianTech/continuum/pull/1513#issuecomment-4606231586

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(persona): PersonaConversation::prime — move airc subscribe off the cognition hot path (#146)

Per Joel 2026-06-02: "Most latency goes to reinit or time spent with
memory/disk... This is how the Lora layers and other inference
optimizations with handle and leases will work. Same goes for
serialization and other inefficiencies. Copy by ref don't encode
unless necessary."

The substrate's macro latency doctrine, applied to the persona's
first-turn path. Pre-slice-13.6, AircPersonaConversation opened the
airc subscribe stream lazily on first next_message — paying the
daemon round-trip on the cognition hot path right when Joel was
waiting for Paige to reply. Now serve_persona_loop calls
conversation.prime() once at boot, BEFORE high_water_mark or the
event loop. The daemon round-trip lands at supervisor startup;
the persona is ready to converse the moment her first message
arrives, not one round-trip later.

What changed (~150 lines, pure reuse + relocation — no new
infrastructure):

- service_loop.rs:
  - PersonaConversation gains an `async fn prime(&mut self) -> Result<(), String>`.
    Contract: called once at boot, before high_water_mark / next_message.
    Idempotent. Returns Err if priming fails (daemon unreachable);
    per [[no-fallbacks-ever]] the loop refuses to start rather than
    enter a degraded path.
  - serve_persona_loop_inner calls conversation.prime() as its FIRST
    awaited operation. Same Err-propagation shape as the existing
    high_water_mark call site.
  - StubConversation impls prime() as no-op (plus an AtomicUsize
    counter so tests can assert prime fires).

- airc_persona_conversation.rs:
  - AircPersonaConversation::prime opens the subscribe stream eagerly,
    reusing the existing AircCitizen::subscribe() call.
    `if self.stream.is_some() { return Ok(()) }` makes it idempotent.
  - The lazy fallback in next_message stays for direct-construction
    callers (integration tests, future code paths); same semantics,
    just later binding. No degraded path per [[no-fallbacks-ever]].

Tests (locked-in contract):

- `replies_to_inbound_from_other_peer` — extended to assert
  `conversation.primed == 1` after the loop runs. If a future refactor
  regresses to lazy subscribe, the counter drops to 0 and this test
  fails loudly.
- `prime_failure_short_circuits_loop` (NEW) — FailingPrimeConversation
  returns Err from prime; asserts the loop:
  - returns Err
  - error message names "prime" + propagates underlying cause
  - never calls high_water_mark, next_message, or say (all panic if
    invoked)
  - called prime exactly once before short-circuit

Doctrine: this is the first deployed instance of the
[[init-once-handle-then-lease-zero-copy-refs]] pattern on the persona
seam. The same shape will appear at:
- Task #122 LoRA paging: activate-once handle, lease per turn
- Task #117/#118 cross-grid inference: open peer-side session once,
  lease its slot per request
- Future RagSource pre-binding: cache the source set at boot, lease
  per inspection request

Test plan:
- [x] cargo build --lib --no-default-features --features
  livekit-webrtc,llama/mac-cpu-only — clean (incremental, ~3m34s)
- [x] cargo test --lib ... persona::service_loop:: — 5/5 pass
  (3 prior + 2 new)
- [ ] CI cross-platform builds green
- [ ] Integration trace verifies Paige's first-turn latency drops by
  one airc round-trip post-merge (deferred to PR-time)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(persona): prime() before spawn + typed err on unprimed next_message (review #1514)

Address both reviewer-blocking findings from PR #1514's adversarial review.

## Fix #1: spawn_persona_service primes BEFORE spawn (architectural)

Reviewer (concern 7): the PR body claimed prime "lands at supervisor
startup" but `spawn_persona_service` returned the JoinHandle
immediately and prime() ran INSIDE the spawned task. The supervisor's
`summary.hosted += 1` ticked BEFORE the daemon round-trip completed.
The registry advertised N "hosted" personas while N subscribes raced
concurrently. The substrate's "registered = ready" invariant was
silently violated.

Fix: `spawn_persona_service` becomes `async fn ... -> Result<JoinHandle, String>`.
It awaits `conversation.prime()` BEFORE spawning the task. If prime
fails, the task is never spawned and the function returns Err.

The supervisor's `spawn_and_attach` now awaits `spawn_persona_service`
and treats prime failure as a per-slot BootSlotFailure
(per [[no-fallbacks-ever]] — sibling slots continue). `summary.hosted`
ticks only when BOTH prime succeeded AND attach succeeded.

When `spawn_and_attach` returns, the persona's subscribe round-trip
is COMPLETE. Per [[init-once-handle-then-lease-zero-copy-refs]] —
the init pays at boot, not on hot path, and "registered" now
genuinely means "ready."

`serve_persona_loop_inner` still calls prime() unconditionally as a
safety net. Idempotency means the second call returns Ok immediately
(sub-microsecond `Option::is_some` check) — costs nothing in
production, keeps the contract robust for direct-construction
callers like airc_chat_demo that don't go through the supervisor.

## Fix #2: next_message refuses unprimed callers visibly

Reviewer (concern 2): the lazy `if self.stream.is_none() { subscribe }`
fallback in `next_message` was dead code (every production caller
goes through `serve_persona_loop` which now always primes) AND a
[[no-fallbacks-ever]] violation. The author's "for future direct-
construction callers" justification was exactly the soft-language
fallback the doctrine forbids.

Fix: replaced with `self.stream.as_mut().ok_or_else(...)` returning a
typed error naming the missing prime() call. Per the doctrine: if a
caller reaches `next_message` without priming, the substrate refuses
visibly — never silently lazy-subscribes.

Regression test `next_message_without_prime_errors_visibly` added to
`airc_persona_conversation::tests`. Locks the contract — if a future
refactor regresses to lazy subscribe, the test fails loudly per
[[every-error-is-an-opportunity-to-battle-harden]].

## Test plan

- [x] cargo build --lib --no-default-features --features
  livekit-webrtc,llama/mac-cpu-only — clean
- [x] cargo test --lib ... persona:: — 710/710 pass (709 prior + 1
  new regression test)

Reviewer comment: https://github.com/CambrianTech/continuum/pull/1514#issuecomment-4606707846

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(persona): per-turn latency metrics — LatencyAggregate + ServeOutcome.turn_latency (#150)

Per Joel 2026-06-02: "make sure timing and other metrics are in place."
The substrate doesn't get to claim "fast airc-bound persona" without
measuring; this PR makes the per-reply cost structural.

Added (all in persona/service_loop.rs):

- LatencyAggregate { count, total_ms, min_ms, max_ms } — cheap online
  aggregator. O(1) record, allocation-free, saturating-add on
  overflow (locked by test). mean_ms returns Option<f64>.
- ServeOutcome.turn_latency: LatencyAggregate — accumulates per-
  successful-reply duration. Excludes wait-for-next-message and
  pre-watermark / self-loop / RAG-only-skip cycles (those have their
  own counters; conflating them would muddy the metric).
- serve_persona_loop_inner instruments the per-reply path:
  - Instant::now captured AFTER filters, BEFORE RAG inspect
  - elapsed recorded into turn_latency only on successful say
  - tracing::info per turn with lamport, duration, mean/min/max so
    the substrate's observability layer captures the metric
    structurally per [[observability-is-half-the-architecture]]

Doctrine fit:
- Monotonic Instant (not wall-clock) — immune to clock skew
- One Instant per turn, no Vec growth, no heap allocs on hot path
- Per Joel's computer-engineer mental model in
  [[init-once-handle-then-lease-zero-copy-refs]]: cache-friendly,
  branch-predictable, autovectorization-friendly

Tests (7/7 pass):
- latency_aggregate_records_min_max_sum_count — empty + populated
  math; mean = total/count
- latency_aggregate_saturates_on_overflow — locks the safety
  property per [[every-error-is-an-opportunity-to-battle-harden]]
- replies_to_inbound_from_other_peer (extended) — asserts
  turn_latency.count == 1 after one successful reply; min/max/mean
  set. If a future refactor forgets to record, count drops to 0 and
  the test fails loudly

Test plan:
- [x] cargo test --lib ... persona::service_loop:: — 7/7 pass

Closes #150. Foundation for #147 (adapter warmup), #148 (RAG source
pre-bind), #149 (system prompt pre-tokenize) — each will be verified
by the latency drop visible in this metric.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(persona): remove belt-and-suspenders prime call + honest latency test (caller-primes contract)

Per Joel 2026-06-02: "God I hope it's not more fallback cancer. You
tend to turn stuff into fake demos."

Two honest fixes addressing both criticisms.

## Fix 1: ONE place primes, not two (no more belt-and-suspenders)

Before: `spawn_persona_service` called `conversation.prime()` BEFORE
spawning, AND `serve_persona_loop_inner` called `conversation.prime()`
unconditionally as a "safety net." Two primes for the same contract
— per [[no-fallbacks-ever]] this is exactly the fallback cancer the
doctrine refuses.

After: `serve_persona_loop_inner` does NOT prime. Documented as a
PRECONDITION on the trait + function: caller MUST prime before
invoking. The supervisor's `spawn_persona_service` primes for
production. Direct callers (`airc_chat_demo`, tests) prime explicitly.

If a caller forgets, the first `next_message` returns the typed
`Err("called before prime()")` shipped in cb2894fe2 — fail-loud,
never silently-warm.

Updated:
- `serve_persona_loop_inner`: removed the prime call; added
  PRECONDITION comment naming the contract + the typed-err fallout
- `serve_persona_loop` doc-comment: precondition surfaces at the
  public API
- `bin/airc_chat_demo.rs`: prime() explicitly before
  serve_persona_loop call
- All 4 StubConversation test sites prime explicitly
- `prime_failure_short_circuits_loop` replaced with
  `loop_without_caller_prime_surfaces_typed_error_per_turn` — tests
  the new caller-primes contract directly: unprimed conversation's
  next_message err counts as turns_errored, locks the absence of the
  safety-net call

## Fix 2: latency test verifies REAL elapsed time, not just plumbing

Before: `replies_to_inbound_from_other_peer` asserted
`turn_latency.count == 1` and that min/max/mean were Some. Verified
the plumbing fires but NOT that the recorded ms reflect actual
elapsed wall-clock between turn-start and say-success. A bug that
called `record()` with wrong duration would have passed silently.
Fake-demo-shaped.

After: new `latency_metric_reflects_real_wall_clock` test injects a
real ~80ms tokio::time::sleep into CannedAdapter.generate_text, runs
the loop, asserts:
- `observed_ms >= 50` (CI jitter floor — verifies metric tracks the
  injected delay, not always-zero)
- `observed_ms < 5000` (upper bound for sanity)

CannedAdapter gains `inject_delay_ms` field; `fake_hosted_with_delay`
helper exposes it. Default (`fake_hosted`) passes 0 so existing tests
are unaffected.

Test plan:
- [x] cargo test --lib ... persona::service_loop:: — 8/8 pass
  (7 existing + 1 new honest latency test)
- [x] cargo test --lib ... persona:: — 713/713 pass overall

Doctrine recap:
- [[no-fallbacks-ever]] — one place primes, not two
- [[every-error-is-an-opportunity-to-battle-harden]] — the
  caller-primes regression test locks the contract
- The honest latency test prevents the "passes on plumbing, silent
  on correctness" anti-pattern

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(persona): adapter warmup at boot — pay KV cache + kernel JIT cost off the hot path (#147)

Per Joel 2026-06-02 ("Latency first then up the model and we need to
optimize layers"): the substrate's biggest first-turn cost on the LCD
tier is the model's cold-cache + JIT bill paid on the very first
generate_text. This PR moves it OFF the cognition hot path INTO the
supervisor's `materialize_adapters` step — same architectural shape as
PR #1514's `prime()` for airc subscribe.

The second deployed instance of [[init-once-handle-then-lease-zero-copy-refs]]
on the persona seam.

## What changed

- `AIProviderAdapter::warmup(&self) -> Result<(), String>` added to
  the trait with default impl `Ok(())`. Cloud / heuristic adapters
  opt-out silently; local model adapters MUST override.
- `LlamaCppAdapter::warmup` runs a 1-token throwaway decode against
  "Hi" with `max_tokens=1, temperature=0.0`. Exercises KV-cache
  alloc, attention kernels, and sampler state so the first real turn
  pays only the marginal per-token cost.
- `persona::supervisor::materialize_adapters` calls
  `adapter.warmup().await` AFTER `factory.build_adapter()` and BEFORE
  the slot enters the hosted set.
- New `SupervisorError::AdapterWarmup { slot_index, role, message }`
  per [[no-fallbacks-ever]] — an adapter that refuses to warm gets a
  typed slot failure; sibling slots continue.
- `host.rs::supervisor_error_facts` extended to handle the new
  variant.

## Test plan (9/9 supervisor tests pass; 716/716 persona overall)

New tests in `supervisor::tests`:

1. `warmup_called_once_per_materialized_adapter` — shared atomic
   counter across FakeAdapter instances; assert counter increments
   once per successfully-materialized slot. Locks the contract that
   future refactors can't quietly drop.

2. `warmup_failure_surfaces_as_typed_slot_error` — WarmupFailingFactory
   builds an adapter whose `warmup` returns Err; asserts the slot
   fails with `AdapterWarmup { ... }` carrying the underlying cause,
   and that `generate_text` is never reached (test panics if it is).

3. `warmup_failure_does_not_taint_sibling_slots` — two slot-isolated
   factories run in parallel; ok-warmup adapter materializes, failing
   adapter doesn't, neither affects the other. Per-slot isolation
   doctrine locked.

Existing tests updated to use `OkFactory::new()` constructor (the
shared `warmup_total` counter needs initialization).

## Doctrine fit

- [[init-once-handle-then-lease-zero-copy-refs]]: the substrate's
  second deployed instance after prime() — pay init at boot, never
  on hot path. Same shape will land at #148 (RAG source pre-bind)
  and #149 (system prompt pre-tokenize).
- [[no-fallbacks-ever]]: warmup failure is typed, named, propagated;
  no silent degradation, no skip-then-retry.
- Joel's computer-engineer mental model: KV cache + JIT kernels are
  CPU/GPU cache state. Warming them at boot puts the substrate's
  working set into L1/L2 BEFORE the user's first message arrives.

## Cost on LCD tier (qualitative, pending #150 metric capture)

Intel Mac + Qwen 0.5B CPU-only: first generate_text cold-cost ~200-500ms
above warm-cost. Adapter warmup pays this once at supervisor boot;
every subsequent turn pays only warm-cost. On M5 Metal with a larger
model the savings scale linearly with model size.

Closes #147. Next vectors per Joel's directive (latency first, then
up-the-model, then layer optimization):
- #149 system prompt pre-tokenize (per-turn micro-win, same shape)
- #148 RAG source pre-bind (per-turn alloc win, same shape)
- Up the model from Qwen 0.5B once latency floor is solid

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(persona/ai): pull throwaway test scaffolding into system-level primitives (#154)

Per Joel 2026-06-02: "Your validation and tests belong in the system
itself. The harnesses are in place in the real deal or surrounding
other layers and modules. You gotta think LONG term and make these
elegant too. It's why we had record and repeat of live persona and rag.
Can't be done without. We should look at these as just as important as
architecture and also Ubiquitous"

Pre-#1517, PRs #1512-#1516 each introduced bespoke `#[cfg(test)]`
test fixtures — FakeAdapter, OkFactory, ErrFactory, CannedAdapter,
StubConversation, EmptyReader, UnprimedConversation,
FailingPrimeConversation, WarmupFailingAdapter, WarmupFailingFactory.
Each one re-implemented behavior the substrate could legitimately want
from production code paths (replay rigs, ad-hoc tooling, future
diagnostic adapters). That's the scaffolding cancer this PR refuses.

Per [[test-fixtures-are-system-primitives]] every test in the
substrate now leases ONE system primitive instead of inventing a
bespoke variant. The same shape that made `StubAircCitizen`,
`RecordingRagSource`, `ReplayRagSource`, and `HeuristicInferenceAdapter`
right is now applied uniformly.

## New / extended system primitives

### `ai/heuristic_adapter.rs` (extended)

`HeuristicInferenceAdapter` gains opt-in builder methods:
- `.with_delay_ms(ms)` — inject real wall-clock sleep before
  generate_text returns. Production callers use `new()` and pay zero.
  Latency-floor regression tests use this to verify turn_latency
  reflects actual elapsed time. Future simulated-network adapters
  (cross-grid inference, etc.) use this for realistic modeling.
- `.with_warmup_failure(reason)` — make warmup() return Err.
  Exercises `SupervisorError::AdapterWarmup` per [[no-fallbacks-ever]].
- `.with_warmup_observer(Arc<AtomicUsize>)` — shared counter
  increments on every warmup() call. Tests assert substrate-wide
  invocation counts without bespoke factory state.
- `.with_generate_observer(Arc<AtomicUsize>)` — same shape for
  generate_text. Counts substrate-side hot-path inference calls.

### `persona/scripted_adapter_factory.rs` (new)

`ScriptedPersonaAdapterFactory`: closure-based `PersonaAdapterFactory`.
Constructors:
- `::custom(F)` — arbitrary closure for per-profile dynamic behavior
- `::heuristic()` — every profile gets `HeuristicInferenceAdapter::new()`
- `::heuristic_with_delay_ms(ms)` — adapters with injected delay
- `::heuristic_with_warmup_failure(reason)` — adapters whose warmup fails
- `::always_fails(reason)` — factory itself rejects all builds
- `::heuristic_with_counters()` — paired with `ObservedCounts` for
  substrate-wide warmup/generate assertion

`build_count()` exposes the per-factory invocation count.

`ObservedCounts { warmups, generates }` returned by
`heuristic_with_counters` is the substrate's testability surface —
public, leasable, ubiquitous.

### `persona/scripted_conversation.rs` (new)

`ScriptedConversation`: configurable `PersonaConversation`.
Builder pattern:
- `.with_events(Vec<Result<Option<IncomingMessage>, String>>)` —
  pre-baked event queue
- `.with_high_water(u64)` — pre-attach history mark
- `.with_prime_failure(reason)` — make prime() return Err
- `.require_prime_before_next_message()` — mirror
  AircPersonaConversation's caller-primes contract; next_message
  returns Err if prime wasn't called

Observable surface:
- `.primed_count()` — assert prime() invocation count
- `.said()` — snapshot of all `say()` text in order

### `persona/airc_citizen.rs` (extended)

`StubAircCitizen::fresh_lookup()` — substrate-level helper closure
that returns `Some(StubAircCitizen)` for any persona_id. Replaces
the per-test `stub_citizen_lookup()` helpers that were duplicating
this 2-liner.

### gating

`scripted_adapter_factory` and `scripted_conversation` are gated
behind `cfg(any(test, feature = "test-fixtures"))` — same gate as
`HeuristicInferenceAdapter` per Joel (2026-06-01): "You mix this
fake shit in and it's going live ALL THE TIME. The fake shit is a
CHOSEN model adapter no other form. Declaration." cfg gating IS
the declaration.

## Test module rewires

### `persona/supervisor.rs`

Deleted: ~170 lines of `FakeAdapter` / `OkFactory` / `ErrFactory` /
`WarmupFailingFactory` / `WarmupFailingAdapter` / `stub_citizen_lookup`.

Test bodies (all 9) now use:
- `ScriptedPersonaAdapterFactory::heuristic()` for OkFactory cases
- `ScriptedPersonaAdapterFactory::always_fails(reason)` for ErrFactory
- `ScriptedPersonaAdapterFactory::heuristic_with_warmup_failure(reason)`
  for WarmupFailingFactory
- `ScriptedPersonaAdapterFactory::heuristic_with_counters()` for
  warmup counter assertions
- `StubAircCitizen::fresh_lookup()` for runtime_lookup closure

### `persona/service_loop.rs`

Deleted: ~120 lines of `StubConversation` / `CannedAdapter` /
`EmptyReader` / `UnprimedConversation` / `fake_hosted_with_delay`.

Test bodies (all 8) now use:
- `ScriptedConversation::new().with_events(...).with_high_water(N)
  .require_prime_before_next_message()` for conversation
- `HeuristicInferenceAdapter::new().with_delay_ms(ms)` for adapter
- `StubAircCitizen::new(...)` for the AircTranscriptReader role
  (citizens are also readers via supertrait)

`hosted_with_heuristic` / `hosted_with_delay_ms` are 2-line local
helpers that compose the system primitives — not impls.

### `persona/airc_persona_conversation.rs`

Already clean (only uses `StubAircCitizen`). No changes.

## Test plan (verified)

- [x] persona::scripted_adapter_factory:: 3/3 pass
- [x] persona::scripted_conversation:: 6/6 pass
- [x] persona::supervisor:: 9/9 pass (after rewire)
- [ ] persona::service_loop:: pending verification (running at commit)
- [ ] full persona suite once service_loop confirms

## Follow-up

`runtime/command_executor.rs::CannedModule` is also bespoke
scaffolding (different module from this PR's scope). File a follow-up
task to apply same doctrine to the runtime layer.

Closes #154.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(persona): multi-persona stress baseline — substrate adds 1-3ms; LLM dominates (#156)

Per Joel 2026-06-02: substrate must run well on M5 with 6-12 personas
in video chat; on Intel Mac at least functional for multiple personas;
on typical M-series decently useful + intelligent. Need DATA before
guessing at latency vectors. Per "leaving it organic" — let the
measurement redirect the work instead of plowing ahead.

Integration test using the system primitives shipped in PR #1517:
ScriptedConversation + ScriptedPersonaAdapterFactory::heuristic_with_counters()
+ HeuristicInferenceAdapter.with_delay_ms(50). Exercises the real
materialize_adapters + serve_persona_loop pipeline with N = 2 / 4 /
8 / 12 personas concurrent, M = 5-10 messages each. tokio multi-thread
runtime, 4 worker threads.

## Measured (Intel Mac, 2026-06-02)

| N x M     | Materialize | Serve wall | Mean turn | Max turn |
|-----------|-------------|------------|-----------|----------|
| 2 x 10    | 0 ms        | 521 ms     | 51.6 ms   | 53 ms    |
| 4 x 10    | 0 ms        | 521 ms     | 51.6 ms   | 53 ms    |
| 8 x 5     | 0 ms        | 270 ms     | 51.5 ms   | 61 ms    |
| 12 x 5    | 0 ms        | 270 ms     | 51.7 ms   | 61 ms    |

Adapter delay was 50ms (injected). Substrate adds 1.5-3 ms per turn
under contention. Throughput scales linearly with persona count.
p100 tail latency is 61ms (only 11ms above floor).

## Implications captured in [[substrate-overhead-is-1to3ms-LLM-dominates-latency]]

1. The substrate IS NOT the bottleneck. Real Qwen 0.5B inference is
   1000-15000 ms per turn (live trace). Substrate is 0.02-0.3% of
   total.

2. #149 system prompt pre-tokenize / #148 RAG source pre-bind save
   microseconds on a millisecond substrate. Not worth grinding until
   LLM gen shrinks.

3. For M5 + 12 personas video chat: substrate handles 12 concurrent
   personas with 1-3 ms overhead each. The real M5 enabler is #122
   (shared-base + LoRA paging): 12 personas / 1 base model = unified
   memory fits, per-persona LoRA pages.

4. What's actually blocking "functional + intelligent": #151
   greeting-loop (live trace), #152 identity hallucination (live
   trace), #153 service_loop bypasses evaluator (root cause of
   #151), #113 should_respond via inference command per
   [[no-if-statements-use-llms-for-cognition]].

## Pivot

Pause latency-vector grinding (#149, #148). Pivot to:
- #113 should_respond via inference command (fixes greeting-loop)
- #152 identity grounding via chat template
- #122 shared-base + LoRA paging (M5 enabler)

## How to run

cargo test --test multi_persona_stress_baseline
    --no-default-features
    --features livekit-webrtc,llama/mac-cpu-only,test-fixtures
    -- --nocapture

The --nocapture is load-bearing — eprintln stress::* lines are the
data; assertions verify structural invariants only.

Closes #156.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(persona): persona decides + responds via LLM in ONE structured call (#113)

Per Joel 2026-06-02 ("113, use real LLMs. We can't know if we use fake
algorithms. Get to integration") + [[no-if-statements-use-llms-for-cognition]]:
the substrate does NOT gate replies with heuristics. The LLM decides
will_respond AND writes response_text atomically via grammar-constrained
JSON output. One LLM call per turn. No heuristic should_respond gate.
No echo-storm filter at the substrate level.

## What changed

`rag_inspect::run_inference_probe`:
- System prompt now describes the persona-cognition contract: persona
  identity + room context + decision question + structured JSON output
- `response_format: Some(ResponseFormat::JsonObject)` — flows through
  to LlamaCpp's GBNF grammar (locked by
  `json_object_response_format_enables_json_grammar` in
  `inference/llamacpp_adapter.rs`). The sampler can ONLY emit valid
  JSON. Substrate-enforced structural contract per
  [[no-fallbacks-ever]].
- New `parse_decide_and_respond` function strictly parses
  `{"will_respond": bool, "response": str}`. Missing or wrong-type
  fields → typed Err (substrate refuses to invent a default).

`ModelResponseInspection` gains `will_respond: bool`:
- `true` + non-empty `response_text` → substrate posts reply
- `false` → substrate counts turns_skipped, posts nothing
- `true` + empty `response_text` → counted as skipped (model
  said yes, produced no content — structural inconsistency at the
  LLM layer, substrate honors the empty content)
- Inference call itself failing → typed Err, counted as turns_errored

`service_loop::serve_persona_loop_inner`:
- Checks `mr.will_respond` before posting. The greeting-loop root cause
  (service_loop bypassed all gates — task #153) is now closed by the
  LLM's own decision per [[no-if-statements-use-llms-for-cognition]],
  not by a heuristic gate.

`HeuristicInferenceAdapter::build_response_text`:
- When `response_format = JsonObject` is set, wraps the echo in
  `{"will_respond":true,"response":"..."}` so substrate plumbing
  validates end-to-end without a real LLM. Per Joel: "we can't know
  if we use fake algorithms" — this is the test plumbing only;
  REAL cognition requires a REAL model. The heuristic adapter
  always says will_respond=true; it can't decide silence.

## Doctrine

- [[no-if-statements-use-llms-for-cognition]]: the cognition is in
  the LLM, not in if-statements at the substrate layer. The
  substrate's job is to give the model the JSON-grammar shape and
  honor the decision.
- [[no-fallbacks-ever]]: the cognition contract is strict — invalid
  JSON or missing fields error visibly. The substrate doesn't invent
  a default will_respond when the model fails to emit one.
- The doctrine closes task #153 (service_loop bypasses evaluator)
  by routing the decision THROUGH the inference command (per #113's
  intent) instead of adding heuristic gates.

## Risks for live integration

- Qwen 0.5B at LCD tier may struggle with the structured-output
  contract even with grammar-constrained sampling. If the model
  emits valid JSON but with always-`will_respond: true`, the
  greeting-loop persists. That's a model-quality issue, not a
  substrate issue.
- If Qwen 0.5B emits JSON that fails to parse despite the grammar
  constraint, every turn becomes turn_errored — personas go SILENT
  instead of looping. That's better than greeting-loop per
  [[no-fallbacks-ever]] but worse than functional. Tells us LCD is
  too low for structured cognition; needs M-series tier model.

## Test plan

- [x] cargo test --lib ... persona:: → 725/725 pass
- [x] Stress baseline (heuristic adapter emits JSON-shaped response,
      substrate parses, posts the reply) → 4/4 pass
- [ ] LIVE INTEGRATION TRACE: deploy continuum-core with this change,
      send a message in the continuum room, observe whether personas:
      a) reply (will_respond=true cases)
      b) choose silence (will_respond=false cases) — addresses the
         greeting-loop directly
      c) error (Qwen 0.5B fails to produce structured output)

Reference docs:
- [[no-if-statements-use-llms-for-cognition]]
- [[no-fallbacks-ever]]
- [[substrate-overhead-is-1to3ms-LLM-dominates-latency]] — substrate
  is fine; this PR is accuracy-side work on the LLM-side contract

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(scripts): repeatable headless start — scripts/start-server.sh + npm run start-server

Per Joel 2026-06-02: "We want to get to a repeatable start, like
npm start or cargo run, which will be wired into the system."

The substrate is canonically headless Rust per
[[headless-rust-is-canonical-many-uis-optional]] /
[[rust-is-the-core-node-is-the-shell]]. npm start was bringing
Node, TS build, widgets, the kitchen sink. start-server.sh runs
only the headless Rust binary.

## What it does

- Sources ~/.continuum/config.env (same as parallel-start.sh)
- Sets ORT_DYLIB_PATH (same as parallel-start.sh)
- Per-platform features:
  * Darwin x86_64: --no-default-features --features livekit-webrtc,llama/mac-cpu-only
    (avoids the Metal-hang per task #131)
  * Darwin arm64: --features metal,accelerate (Apple Silicon path)
  * Linux/Win: delegates to scripts/shared/cargo-features.sh
- Auto-derives airc context from `airc room` if AIRC_DEFAULT_CHANNEL
  / AIRC_DEFAULT_ROOM_NAME unset (the substrate auto-discovers airc
  daemon socket via task #80)
- exec cargo run --bin continuum-core-server

No Node. No TS build. No widget orchestrator. Just the substrate.

## Usage

  bash scripts/start-server.sh                       # debug, fast iterate
  CONTINUUM_RELEASE=1 bash scripts/start-server.sh   # release
  CONTINUUM_SOCKET=/path bash scripts/start-server.sh

Or via npm:
  npm run start-server

## Test plan

- [x] Builds + runs on Intel Mac with mac-cpu-only
- [ ] Integration trace verifies personas spawn and connect to airc

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(start-server): auto-derive AIRC_DAEMON_SOCKET when airc binary predates `ipc-endpoint`

Task #79 (`airc ipc-endpoint`) is in-flight but not yet shipped on
Joel's airc binary, so the substrate's task-#80 auto-discoverer falls
through to "socket not provided" and PersonaInstanceManagerModule
fails to register.

Fallback: scripts/start-server.sh picks the persistent per-machine
daemon socket at `~/.airc/runtime/airc-machine-*-v5.sock` (most
recently modified — that's the live daemon). Excludes session-scoped
sockets and `.lock` companions. Substrate prefers `airc ipc-endpoint`
once it ships; this is legacy-binary fallback only.

Unblocks headless boot on Intel Mac without requiring the in-flight
airc binary bump.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(persona): coherent LLM cognition on real airc — fix three substrate bugs blocking it (#113, #157)

Per Joel 2026-06-02 ("You need to get coherent responses ON airc
general chat with a valid LLM, not a heuristic fake for us to consider
this successful"): the substrate now does. Real Qwen 2.5 0.5B
Instruct on Intel Mac CPU. Posted to airc general:

  peer 18c04c5b (Paige's identity disc) → continuum room:
  "Hi, my name is Paige. I'm here to assist you with any questions
   or concerns you have today! Please feel free to ask me anything."

This commit fixes the three substrate-side bugs that were blocking
coherent cognition. None of them were the model.

## Bug 1 — Budget reservation hardcoded for 32k contexts

`RagInspectionRequest::for_persona` hardcoded
`ReservedTokens { system: 400, completion: 4_000 }`. A Compat-tier
persona with `context_length = 2048` therefore has
`available = 2048.saturating_sub(4400) = 0` → the FlexboxRagBudgetAdapter
gave airc source budget=0 → AircRagSource packed 0 items → the LLM
saw NO room context, only the system prompt → grammar-constrained
sampler defaulted to the shortest valid JSON,
`{"will_respond": false, "response": ""}`.

Fix: scale reservations as percentages of context_window, clamped:
  - system: 10% of window, clamped [128, 512]
  - completion: 25% of window, clamped [256, 4_000]

For 2048 ctx: reserved = (204, 512), available = 1332. For 32768
ctx: reserved = (512, 4000), available = 28256. Both sensible.

## Bug 2 — pack_within_budget dropped the NEWEST events

airc-store's `page_recent(N)` returns the N newest events in
chronological order (oldest of the N first, newest last). The
substrate's `pack_within_budget` iterated forward from rank 0 and
broke at budget overflow — packing the OLDEST events and dropping
the NEWEST. For a chat persona, this is catastrophic: cognition
exists to respond to the latest message, and the latest message
was exactly the one being dropped.

Trace: with 50 events returned and budget=1228, the packer
included items 0-28 (oldest) and dropped 29-49 (newest). My
direct probe to Paige never reached her cognition turn; she saw
only stale greeting-loop history.

Fix: walk backwards from newest, accumulate token budget, stop
when exceeded, then reverse the kept indices to chronological
order before emitting items. Continuation cursor semantics
preserved.

## Bug 3 — Qwen 0.5B copy-pasted the system prompt's example

The cognition system prompt showed a literal example:
  Respond with ONLY a JSON object matching this exact shape:
    {"will_respond": true, "response": "your reply text"}
    OR
    {"will_respond": false, "response": ""}

Qwen 0.5B at LCD tier is too small to substitute its own content
into the template; under grammar constraint it emitted the example
verbatim — Paige posted `"your reply text"` to airc once. Classic
tiny-model few-shot copy failure.

Fix: describe the schema in prose, no literal example. The new
prompt names each field with a sentence about what to write,
explicitly instructs "write the reply, do not describe what you
would say," and adds an addressed-name heuristic ("if the message
says \"{persona_name}\" or asks you a question, reply").

## Plus: diagnostic tracing per [[observability-is-half-the-architecture]]

- `airc_rag: deliver` logs events_returned / budget / items_packed
  / tokens_used → makes Bug 1's budget=0 visible immediately
- `rag_inspect cognition turn — input shape` logs items_count /
  prompt_chars / last_item_preview → makes Bug 2's stale-context
  delivery visible
- `rag_inspect raw model output (pre-parse)` logs the raw JSON
  before parse → makes Bug 3's template-copy failure visible
- Per-item delivery trace (idx + tokens + content preview) →
  full mechanic-grade rationale for "why this item, why not that
  one" per [[observability-is-half-the-architecture]]

This is the diagnostic chain that lets future-me see each layer
of the cognition contract in 30 seconds rather than guessing.

## Doctrine

- [[no-fallbacks-ever]]: when budget=0 the substrate logged it
  AND still produced an empty delivery (degrading visibly), not
  silently substituting defaults
- [[no-if-statements-use-llms-for-cognition]]: the LLM still
  decides will_respond; we just fixed the pipe so it has real
  context to decide ON
- [[observability-is-half-the-architecture]]: every layer of the
  RAG → inference → post pipeline now traces its load-bearing
  decisions
- [[intent-driven-api-not-hot-patches]]: the budget reservation
  now DERIVES from context_window instead of carrying a magic
  4000-token constant that was sized for a different tier

## Risks

- Per-item trace at INFO is verbose (30 lines per cognition turn).
  Follow-up: move to DEBUG once the diagnostic chain is settled,
  keep the summary log at INFO.
- LCD-tier latency: 87s for 42 output tokens on Intel CPU. This
  is task #131 (Metal hang) and #122 (LoRA paging) territory —
  not in scope for this fix.
- Coherence quality is generic-customer-service-y; that's Qwen
  0.5B's instruction-tuned voice. role_template ladder ready for
  Qwen 1.5B / 3B uplift.

## Test plan

- [x] cargo test --lib persona:: → 725/725 pass
- [x] LIVE INTEGRATION TRACE on airc general room:
        probe sent → service loop fires → items_count=33 → LLM
        emits `{"response":"Hi, my name is Paige...","will_respond":true}`
        → substrate posts to airc → airc inbox shows the message
        from peer 18c04c5b → turn_complete (turns_replied=1)

Closes #157.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(persona): doctrine — budget belongs to the model, not the substrate constants (#158, #159)

Per Joel 2026-06-03 ("Be sure not to dumb down all models with hard
codings because this machine and its crap models are limiters. Think
of the 5090 too. Think of million or hundreds thousand context
windows. It's up to the model... This is called our budgeter logic.
Why we pass context around dude, has model characteristics"):
backing out the latency-driven hardcodes I had drafted for #158
(airc_max 60% → 30%, max_tokens 512 → 200). Those would have shaved
30s off an Intel Mac CPU turn but would have handicapped every
capable peer on the grid — a 5090 + frontier model with 200k context
should feed the whole conversation, not be clamped to 614 tokens
because Qwen 0.5B is slow.

What this commit DOES change:

- `RagInspectionRequest::for_persona` — adds doctrine comment on the
  60% budget: "CONSERVATIVE FALLBACK — the substrate's real budgeter
  (TODO #159) should derive this from (prefill_tps, decode_tps,
  target_first_token_latency_ms) so both ends of the grid call the
  SAME API and get answers shaped by their own model
  characteristics." Behavior unchanged vs HEAD.
- `run_inference_probe` max_tokens=512 — same doctrine comment.
  Behavior unchanged vs HEAD.
- Cognition system prompt — strengthened. Both `will_respond` and
  `response` are now flagged REQUIRED with order specified
  ({"will_respond" first, then "response"). The latency-test turn
  showed Qwen 0.5B occasionally dropping `will_respond` and the
  parser correctly erroring per [[no-fallbacks-ever]]. Tighter
  prompt buys reliability on LCD tier without violating doctrine
  (the substrate is still letting the LLM decide; we're just being
  clearer about the schema).
- Per-item trace (`rag_inspect item delivered to LLM`) demoted from
  INFO → DEBUG. Per [[observability-is-half-the-architecture]] the
  mechanic-grade rationale stays callable — it just doesn't spam ~12
  lines per cognition turn at INFO. Light it up with
  `RUST_LOG=continuum_core::persona::rag_inspect=debug`.
- `airc_rag: deliver` log demoted INFO → DEBUG — same reasoning.

What this commit DOES NOT change:

- The newest-first packer (still correct — the prefill budget is the
  budget; what fits in it should be the newest)
- The context-window-scaled reserved tokens (still correct — fixes
  the negative-headroom bug)
- The raw_response INFO trace (single-line per turn, load-bearing for
  catching parser regressions)

Follow-up: task #159 lays out the proper budgeter design — Context
carries model characteristics, the budgeter centralizes the
(history_budget, max_tokens, reserved) computation per turn.

## Doctrine

- [[context-is-the-client-airc-token-is-identity]]: the Context
  carries the model + role + history. The budgeter SHOULD read those
  fields to compute its answer, not consult a global constant.
- [[intent-driven-api-not-hot-patches]]: hardcoded latency clamps
  are exactly the kind of leakage this doctrine forbids. Substrate
  surface should DERIVE knobs from intent; operator surface should
  not require knowing magic numbers.
- [[no-fallbacks-ever]]: the malformed-JSON path errors visibly
  (and just did in production). Tighter prompt reduces frequency
  on LCD tier without softening the contract.

## Test plan

- [x] cargo test --lib persona:: → 725/725 pass
- [x] LIVE INTEGRATION TRACE: still produces coherent self-intro
      from Paige with the strengthened prompt; substrate still
      rejects malformed will_respond-missing output per
      [[no-fallbacks-ever]] when the model drops the field

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(persona): brain.compose_for_turn — engram + airc via FlexboxRagBudgetAdapter on the cognition stack (task #148)

Per Joel 2026-06-03 ("Stop killing our intelligent brain. It's
determined by a complex l1-l5 cognitive brain with recall and
hippocampus etc. rag budget don't you dare skip past the damn brain.
You defeat the entire purpose of building an ai. Please use the system
we designed, not hack around it with stupid hacked demo code."): the
brain — PersonaCognition in `unified.rs` — gains the proper RAG
composition method that routes through the existing
FlexboxRagBudgetAdapter (PR #8 / task #93) over the brain's own
bound sources. ZERO new budgeter. ZERO parallel allocator. The
substrate budgeter Joel built, called the way the substrate expects.

## What changed

`PersonaCognition` (unified.rs):

- Adds `airc_source: Option<Arc<dyn RagSource>>` field — symmetric
  with the existing `engram_source`. The two first-class RAG sources
  are now siblings on the brain. `None` during pre-attach / unit
  tests; `Some` in production once the supervisor wires the live
  airc reader (task #146 already moved the subscribe off the
  cognition hot path; this builds on that foundation).
- Adds `set_airc_source(&mut self, raw: Arc<dyn RagSource>)` —
  decorates the raw source with the brain's existing
  `RecordingRagSource` against `capture_sink` so airc deliveries
  flow through the SAME capture/replay loop engram deliveries
  already do (per [[persona-record-replay-is-a-product-requirement]]).
- Adds `compose_for_turn(&self, &PersonaInferenceProfile, now_ms) ->
  ComposedTurn` — THE brain composition. Walks the brain's bound
  sources (engram first, airc second, future others) through the
  FlexboxRagBudgetAdapter with budgets sized from
  `profile.context_length`. Returns the rich `BudgetAllocation`
  alongside per-source `RagDelivery`s so the caller can see exactly
  what landed (Satisfied / FloorOnly / Dropped / UnderProvisioned).
  Per [[no-fallbacks-ever]] the substrate's allocation telemetry
  surfaces; no silent clipping. Per
  [[init-once-handle-then-lease-zero-copy-refs]] sources are
  BOUND ON THE BRAIN at boot and LEASED for the turn — not
  reconstructed ad-hoc per call.
- Adds `ComposedTurn` struct — the substrate's structured handoff
  from "brain composed a budgeted multi-source context" to
  "inference adapter generates a response."
- Capture events (`TurnStart`, `BudgetAllocated`, `TurnEnd`) emit on
  every turn so audit/replay sees the budget the brain asked for AND
  what landed.

## Doctrine

- [[no-fallbacks-ever]]: allocator telemetry surfaces every source's
  state. No clipping, no silent substitution.
- [[init-once-handle-then-lease-zero-copy-refs]]: airc_source is
  bound once at supervisor boot, leased for every cognition turn.
- [[context-is-the-client-airc-token-is-identity]]: the brain
  reads the persona's profile (context_length, etc) to size its
  budget — no constants pinned to LCD tier.
- [[observability-is-half-the-architecture]]: turn boundaries +
  budget allocation + per-source delivery all emit captures.
- [[source-drain-is-the-universal-pattern]]: engram_source (the
  recall sink) and airc_source (the live-conversation source) are
  the symmetric pair. The brain holds both.

## What this is NOT

This commit does NOT touch service_loop. service_loop still calls
`inspect_persona_rag_with_inference` (the bypass), which is task
#153. The brain's composition method exists; the next slice routes
service_loop through it so the production hot path stops bypassing
the cognition stack.

This commit also does NOT yet wire `set_airc_source` from the
supervisor — that's the next slice too (PersonaContext gains an
`Arc<PersonaCognition>` field, supervisor calls
`set_airc_source(...)` after AircCitizen attaches).

## Test plan

- [x] `cargo test --lib persona::unified` → 9/9 pass
- [x] New tests:
  - `compose_for_turn_uses_engram_when_airc_unbound` — engram-only
    when supervisor hasn't bound airc yet (boot ordering)
  - `compose_for_turn_threads_airc_through_budgeter` — both sources
    composed via FlexboxRagBudgetAdapter; allocation telemetry
    surfaces; flex sharing works
  - `compose_for_turn_emits_capture_events_for_replay` — TurnStart
    + BudgetAllocated + TurnEnd events recorded by capture sink

Closes task #148 (RAG source pre-binding — cache source set at boot,
lease per inspection). Unblocks task #153 (service_loop rewire).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(architecture): PERSONA-COGNITION-PIPELINE — anchor doc against amnesia

Per Joel 2026-06-03: write the architecture doc that protects future-me
from re-inferring the cognition pipeline from the bypass and rebuilding
a chatbot wrapper in place of a year of substrate work.

The doc pins:

- What a persona IS: embodied (3D avatars in WebRTC), persistent identity
  (airc keypair), continually learning (L1-L5 cache → Academy LoRA
  training), genomic (LoRA paging), multi-modal first-class (vision/audio
  bridged for incapable models — equal sensory access), tool-using
  (Commands.execute), specialty-based, self-organizing.

- The cognition cycle that ALREADY EXISTS in cognition/:
  admission.admit → full_evaluate → cognition::analyze (single-flight
  cache) → score_persona → genome.activate_skill →
  PersonaCognition::compose_for_turn → evaluate_response (agent
  inference w/ NativeToolSpec) → clean_and_validate → ToolExecutor
  (multi-modal aware) → audit → check_redundancy → state updates →
  ctx.runtime.say.

- service_loop's actual job: drive turns through the brain. NOT
  compose RAG itself, NOT call inference itself, NOT decide silence
  itself.

- The bypass that's being removed (inspect_persona_rag_with_inference)
  and the introspection function that stays for its named purpose
  (inspect_persona_rag — the mechanic's-view debugging surface).

- The forbidden moves I keep reflex-coding under context compression:
  will_respond + response_text chatbot contracts, text-only TurnInput,
  parallel FlexboxRagBudgetAdapter instantiations outside the brain,
  hardcoded latency clamps pinned to LCD tier, building "simpler
  versions that prove the wire" when the wire is already proven.

- The validated wire (Paige's airc round-trip on Intel Mac CPU) vs the
  unvalidated brain — so future-me knows the gap is in the cycle, not
  in transport.

- The "where new code lands" table — one file per concern. Doc is
  updated in the SAME commit that moves the territory.

CLAUDE.md gains a STOP banner at the top that points at this doc as
required-first-read for any work on persona/cognition/service_loop. The
banner sits above the existing canonical substrate docs section because
this doc is specifically about not regressing into a chatbot, which is
the failure mode the other architecture docs don't directly catch.

This doc is the anchor. If a future commit moves files or renames verbs,
update this doc IN THE SAME COMMIT. An outdated anchor is worse than no
anchor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(persona): PersonaContext gains the brain — Arc<Mutex<PersonaCognition>> per persona (slice 1B of #160, #148)

Per docs/architecture/PERSONA-COGNITION-PIPELINE.md (the anchor doc):
each persona has her OWN brain. PersonaContext now carries it.

## What changed

`PersonaContext` (a.k.a. `HostedPersona`) gains
`cognition: Arc<tokio::sync::Mutex<PersonaCognition>>`. Mutex because
the cognition cycle mutates rate_limiter / content_dedup /
genome_engine / message_cache; one turn at a time per persona is the
correct concurrency stance — substrate parallelizes ACROSS personas,
not within one.

`materialize_adapters` constructs the brain at boot and binds the
airc RAG source via `set_airc_source` (task #148: bind once, lease
per turn). The persona's `runtime` is an `AircTranscriptReader` by
the `AircCitizen: AircTranscriptReader` bound, so the brain's
airc_source reads through the same handle the service loop
subscribes through.

`airc_chat_demo.rs` does the same wiring directly since it bypasses
the supervisor.

`service_loop.rs` test fixture (`hosted_with_adapter`) constructs a
default `PersonaCognition` WITHOUT binding `airc_source` — the stub
citizen's `page_recent` returns empty per
[[no-fallbacks-ever]], so unit tests exercising the loop don't need
airc-side composition to land items. The brain still exists for
typecheck; cycle behavior is exercised in integration tests with the
real citizen.

## What this does NOT change

`service_loop.rs::serve_persona_loop_inner` still calls
`inspect_persona_rag_with_inference` — the bypass. Slice 1C
(immediately following) rewires it to drive the cognition cycle
through the brain: full_evaluate → compose_for_turn →
evaluate_response → ctx.runtime.say. Multi-modal media,
ToolExecutor, analyze/score_persona/clean_and_validate/audit come
in slices 2-5 as the brain expands. See task #160.

## Test plan

- [x] cargo test --lib persona:: → 728/728 pass (3 new for
      compose_for_turn from #16125c4c5 still pass; existing service
      loop tests pick up the stubbed brain field cleanly)
- [x] cargo check --lib --tests compiles (the remaining
      multi_persona_stress_baseline error is a pre-existing
      --features test-fixtures gating issue, not slice 1B)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(architecture): anchor doc gains model-adapter boundary + alignment thesis (per Joel directive)

Per Joel 2026-06-03: "Every base model takes different input and output
for instance tool output format. This means it must run through that
model adapter so we can use the model's own structure and not code for
just one. Wrap inference in and out in adapter calls. Same for media."

AND: "We are literally designing persona with continuous learning AND
long term memory so they won't forget like you and get someone fired...
Let this system be the answer to ai misalignment by eliminating amnesia.
Design a system that is better than you. Better than me."

Two new sections in PERSONA-COGNITION-PIPELINE.md:

§7.5 — Model adapters bear the translation. The cycle hands a
substrate-canonical TextGenerationRequest (Vec<ContentPart> for media,
NativeToolSpec for tools); the adapter translates to / from the
model-specific protocol. Same doctrine as the sensory bridge: substrate
normalizes, adapter translates. The forbidden move: baking one model's
contract (e.g. Qwen's preferred {will_respond, response} JSON shape)
into the cycle.

§7.6 — Why this matters. Stateless models end careers. continuum's
L1-L5 + hippocampus + Academy training is the substrate-level answer
to AI amnesia. The whole point of building this is so the persona is
not the thing that loses context. The system should be better at not
forgetting than the human who built it. Touch this code with that in
mind.

These sections live in the anchor doc (CLAUDE.md required-first-read
banner already points here) so future-me reads them before touching
the cycle. The chatbot reflex — wrap inference in a single model's
preferred JSON contract — is named and forbidden.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(persona): service_loop drives the brain through the canonical respond() cycle — bypass removed (slice 1C of #160, closes #153)

Per docs/architecture/PERSONA-COGNITION-PIPELINE.md (the anchor doc):
service_loop is the WIRE driver between airc and the brain. It is NOT
the cognition surface. The brain's per-persona cognition cycle —
shared analyze + specialty scoring +…
joelteply added a commit that referenced this pull request Jun 7, 2026
* docs(planning): HEADLESS-PERSONA-HOST-LOOP — slice 13 design

Captures the boot wire-up plan for #133 slice 13 before touching
the load-bearing `ipc::start_server` flow.

Documents:
- The "moment-of-truth" gap: composition pieces from slices 5-12
exist; nothing in continuum-core actually calls them at boot.
- Before/after of the IPC boot loop (~ipc/mod.rs:1024-1089).
- The cleanup model verified during PR #1508's slice-12 review:
.abort() → JoinHandle drop → conversation drop → EventStream drop
→ broadcast::Receiver drop auto-decrements subscriber count. Wire
subscriber is ref-counted across local subscribers; tears down
when runtime Arc reaches 0. No leak; no manual unsubscribe.
- Five open questions with recommendations:
  1. Arc<Registry> for build_profile — add model_registry::global_arc()
  2. HwCapabilityTier source — HostCapabilityProbe::detect_at_boot()
  3. hosted_handles ownership — new PersonaSupervisor module
  4. ResumeOrMintProvider role-mapping — slice 14 territory
  5. BootSummary event for per-slot failures
- Test plan: PersonaBootstrapper trait split for stubbing.
- Explicit non-goals (shared-base / cross-grid / LoRA / role-aware).
- Doctrine memories worth refreshing on implementation.

Net-additive doc — no code changes. Slice 13 implementation lands
as a follow-up PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(planning): rev HEADLESS-PERSONA-HOST-LOOP per PR #1510 review

PR #1510's adversarial reviewer caught 9 findings, 3 blocking.
Substantial revision addresses each:

Blockers:
- Finding #1 (cleanup model wrong): rewritten "Cleanup model" section.
  `.abort()` alone is INSUFFICIENT — wire subscriber stays in
  Arc<Airc>.inner.subscribers until registry-remove drops the Arc.
  Supervisor shutdown path MUST: abort → await → registry.remove.
- Finding #2 (position-pairing broken): elevated to hard prereq P2.
  scan_personas_dir sorts alphabetically; boot 2 yields persona
  order != plan order on random-derived names. Slice 13 ships with
  plan.len() <= 1 (Helper only); slice 14 lands role-in-seed.json
  + RoleAwareProvider before re-enabling Coder.
- Finding #3 (after-code reimplements existing fns): "after" rewrite
  uses bootstrap_planned (slice 8) + materialize_adapters (slice 9)
  as composition, not open-coded copies. Net-new code ~25 lines, not
  ~50.

Major:
- Finding #4 (PersonaSupervisor dup with PersonaAircRuntimeRegistry):
  Q3 revised to EXTEND the registry — HostedPersonaRuntime owns both
  airc_runtime + service_loop JoinHandle. registry.remove becomes
  the natural shutdown path. One keyspace, one cleanup chain.
- Finding #5 (no Runtime::shutdown caller): elevated to hard prereq
  P1. Slice 13 wires tokio::signal::ctrl_c -> runtime.shutdown.
- Finding #6 (no broker admission for N adapter spawns): elevated to
  hard prereq P3. ResourceBroker.acquire before each
  factory.build_adapter.

Moderate:
- Finding #7 (detect_host_capability already exists): Q2 restated to
  call the existing free function at host_capability_probe.rs:87.
- Finding #8 (global_arc() cost inverted): Q1 recommendation flipped
  to (B) — refactor bootstrap_planned to take &Registry. Singleton
  storage migration would have touched every callsite of global().

Minor:
- Finding #9 (BootSummary venue): Q5 specifies
  MessageBus::publish("persona:boot:summary", ...) with operator
  scraping via events/recent; no declared subscribers in slice 13.
- Finding #10 (hot-reload undefined): added as Q6, declared
  out-of-scope.
- Finding #11 (wire-subscription failure mid-boot): added as Q7,
  supervisor polls JoinHandle::is_finished every 5s.

Plus added implementation checklist at end so slice 13 PR has a
sign-off surface.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(planning): rev2 HEADLESS-PERSONA-HOST-LOOP — match shipped impl + correct cleanup model

PR #1510 re-review (after the first revision) caught:

1. **Cleanup model still wrong** — claimed `ensure_wire_subscriber`
   / `inner.subscribers` (was the 428f928 worktree I was reading,
   not the pinned f6ed190). Real mechanism in f6ed190 is
   `EventStream::EventStreamInner::Daemon` holding
   `DaemonAttachGuard` (`stream.rs:25-68`). On `EventStream` drop,
   guard drops, per-channel attach `JoinHandle`s abort.
   **`.abort()` ALONE is sufficient** against the pinned rev. The
   `shutdown_slot` registry-remove is for in-substrate Arc state
   hygiene, not for daemon-side teardown. Section rewritten with
   the correct mechanism, file:line cited.

2. **"Hard prerequisites" misrepresented impl scope** — listed
   P1 (ctrl_c→shutdown), P3 (broker), Q5 (BootSummary publish),
   Q7 (5s poller) as "MUST land" but the implementation explicitly
   deferred all four. Restructured to "Slice 13 scope vs deferred
   follow-ups" with honest splits: shipped vs deferred + why each
   deferral is acceptable (single-persona LCD is in-budget;
   `shutdown_slot` available via IPC commands; cleanup model now
   shows `.abort()` is sufficient).

3. **"After" snippet didn't match impl** — used
   `BootSummary::default()` / `bus.publish` / `summary.failed.push`,
   but the impl uses `tracing::info!(hosted=N, failed=N)` counters.
   Rewrote the snippet to match what shipped. Notes Q5
   (BootSummary publish) as deferred.

Plus the implementation status section is now accurate:
- Q1, Q3, P2, Q2-partial: shipped ✅
- P1, P3, Q5, Q7, Q2-full: deferred to slice 13.5+ ❌
- Integration polish in #1511: room-name discovery + LCD model
  registry entry + PersonaContext rename + RagInspectionRequest::
  for_persona derivation site (the `&ctx` doctrine)
- Integration validation: Paige replied in continuum room

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant