feat(persona): persona decides + responds via LLM in ONE structured call#1519
Merged
Conversation
Elegance pass on the patterns the slice-13 work established. Per
Joel 2026-06-02: "we are on sort of an elegance refactor and then
for improved reliability and speed."
What changed:
1. `RagInspectionRequest::for_ctx(&ctx, now_ms)` — new constructor
that takes the persona context directly. Replaces the 4-arg
`for_persona(persona_id, name, now_ms, &profile)` at the call
site. `for_persona` stays (it's the underlying derivation) but
new code uses `for_ctx` to honor the substrate's `&ctx`
doctrine ([[context-is-the-client-airc-token-is-identity]]):
hand the context, not its parts.
2. `PersonaContext::span()` — new method that returns a
`tracing::info_span!` tagged with `persona_id`, `agent_name`,
`peer_id`, `role`, `tier`, `ctx_len`, `model`. The span derives
from `&ctx` — no manual field threading at every log call site.
3. `serve_persona_loop` rewritten in two layers:
- Outer entry function wraps the inner future with
`.instrument(ctx.span())`. Every log line inside the loop
inherits the persona's identity fields automatically.
- Inner function drops the `let persona_id = hosted.identity.x`
extractions; reads `ctx.identity.peer_id` etc. directly at use
sites. Two internal `tracing::warn!` lines lose their
persona_id/agent_name fields (now inherited from the span);
they keep just per-turn delta (`lamport`, `error`).
Net effect:
- Field extraction count in service_loop drops from 3 manual extracts
+ 4 redundant tracing field annotations to 0.
- Log output gains persona_id + agent_name + role + tier + ctx_len
+ model on EVERY internal log line, automatically. The substrate's
observability is now span-shaped, not manual.
- New code that needs a derived RAG request just writes
`RagInspectionRequest::for_ctx(ctx, now)` — one arg vs four.
Why `.instrument` not `.entered`:
- `Span::entered` returns a non-Send RAII guard; tokio spawned
futures need Send. The two-function split (outer thin wrapper
with `.instrument`, inner async function) is the standard tracing
pattern for spans across awaits.
Verification:
- cargo build --lib --tests clean
- cargo test persona::service_loop — 4 passed
- cargo test persona::supervisor — 4 passed
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Elegance pass — extract-class refactor pulling the 170-line inline
boot composition out of `ipc/mod.rs::start_server` into a named
class. Per Joel 2026-06-02: "Must have elegance obsessively. Like a
Java dev. NO SHAME. It's better."
What changed:
1. `PersonaSpawnSupervisor` struct (in `persona/host.rs`) owns the
spawner / instance_manager / registry / factory / tier_id /
model_registry / rt_handle inputs. Construct once at boot; call
`.spawn_all(&mut provider)` to produce a `BootSummary`.
2. `BootSummary { hosted, failures }` + `BootSlotFailure {
slot_index, role, persona_id, reason }` — typed result structs.
Replace the inline `let mut hosted_count: usize = 0` / `let mut
failed_count: usize = 0` counters with a real value type the
substrate can publish (`persona:boot:summary` event — Q5 of the
design doc, deferred to slice 13.5+) and downstream clients
(web, jtag CLI) can read with the same shape per
[[clients-are-rust-too-thin-node-web-shell]].
3. The supervisor's `spawn_all` method handles every previously-
inline concern:
- `bootstrap_planned` failure → orderly-drain orphans + return
summary with synthetic failure row
- `materialize_adapters` with runtime_lookup closure (so
`ctx.runtime` is populated from the registry)
- Per-slot `spawn_and_attach` private method handles
`spawn_persona_service` + `attach_service_loop` + handle drain
on attach-failure (the BLOCKER 1/2 fixes from PR #1511 are
preserved, just relocated)
4. IPC boot collapses from ~170 lines of inline code to ~30 lines:
construct supervisor → spawn task → build provider → call
`supervisor.spawn_all(&mut provider).await` → log summary.
5. Helper `supervisor_error_facts` centralizes pulling
`(slot_index, role)` out of `SupervisorError`'s two variants —
the kind of trivial-but-DRY private fn Java/dotnet shops write
without apology.
Why this matters (the doctrine):
- The IPC server boot concern and the persona spawn concern had
different lifetimes and different test needs. Mixing them in
one function violated "one logical decision, one place"
([[compression-principle]]).
- `PersonaSpawnSupervisor` is now unit-testable in isolation. The
IPC server's test surface shrinks. Slice 14's RoleAwareProvider
+ multi-persona work has one named insertion point.
- `BootSummary` is the structured event payload the design doc's
Q5 named. Once `RoleId` derives `TS` (slice 14), the struct gets
the ts-rs export and web/jtag clients read it directly per the
Rust-first-clients doctrine.
Verification:
- cargo build --lib --tests clean
- cargo test persona::host — 2 passed (BootSummary attempted +
serde camel-case)
- cargo test persona::supervisor — 4 passed (unchanged)
- cargo test persona::service_loop — 4 passed (unchanged)
- IPC boot composition shrinks ~140 lines; supervisor's spawn_all
is now the single named extraction point for slice 13.5 / 14
changes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…time>> from PersonaContext (#144) Java-style "extract interface" on the substrate's airc-handle. Slice 13.5 elegance pass per Joel 2026-06-02 ("Must have elegance obsessively. Like a Java dev. NO SHAME"). Before: PersonaContext.runtime: Option<Arc<PersonaAircRuntime>>. The Option existed solely for test fixtures that couldn't easily build a real PersonaAircRuntime; production code paid .expect("None is test-only") on the hot path. After: PersonaContext.runtime: Arc<dyn AircCitizen>. Tests use a typed StubAircCitizen. Production upcoerces from PersonaAircRuntime, which now impls AircCitizen + AircTranscriptReader. Rust 1.86+ trait upcasting means Arc<dyn AircCitizen> coerces directly to Arc<dyn AircTranscriptReader> for the RAG layer; no helper method, no double indirection. Trait surface (minimum viable): - fn peer_id(&self) -> Uuid - async fn subscribe(&self) -> Result<EventStream, AircError> - async fn say(&self, text: &str) -> Result<EventId, AircError> - AircTranscriptReader as supertrait (page_recent for the RAG layer) What changed: - persona/airc_citizen.rs (new): AircCitizen trait + StubAircCitizen. - persona/airc_runtime.rs: PersonaAircRuntime impls AircCitizen + AircTranscriptReader; delegates to its internal Arc<Airc>. - persona/supervisor.rs: PersonaContext.runtime drops the Option. materialize_adapters' runtime_lookup signature is now Option<Arc<dyn AircCitizen>>; missing runtime surfaces as typed SupervisorError::RuntimeMissing { slot_index, role, persona_id } per [[no-fallbacks-ever]]. - persona/airc_persona_conversation.rs: takes Arc<dyn AircCitizen>, calls trait methods directly (no runtime.airc() detour). - persona/host.rs: spawn_persona_service drops the .expect; host's runtime_lookup upcoerces PersonaAircRuntime to AircCitizen for materialize_adapters. - persona/service_loop.rs fake_hosted: runtime is now Arc::new(StubAircCitizen::new(peer_id)) instead of None. - bin/airc_chat_demo.rs: dropped the Some(_) wrapping — Arc<PersonaAircRuntime> auto-coerces to Arc<dyn AircCitizen>. Doctrine: - [[personas-are-citizens-airc-is-identity-provider]]: AircCitizen IS the substrate's actor type — same trait for personas, humans (#142 BaseUser), browsers. The persona is one citizen; the human- via-jtag is another; the Claude-Code session is another. - [[no-fallbacks-ever]]: no Option, no .expect, no silent default. RuntimeMissing is a typed error with persona_id named. - [[context-is-the-client-airc-token-is-identity]]: PersonaContext IS the &ctx. Same shape compiles in tests + production. - [[clients-are-rust-too-thin-node-web-shell]]: AircCitizen is the typed Rust primitive future jtag-CLI / web client / native client bind to. Foundation for task #142 (BaseUser hierarchy) — each variant will carry Arc<dyn AircCitizen> + kind-specific extensions (cognition for persona, WebAuthn for human, tab state for browser). Test plan: - cargo build --lib --no-default-features --features livekit-webrtc,llama/mac-cpu-only — clean. - cargo test --lib ... persona:: — 705/706 pass (the one flake is persona::evaluator::tests::test_all_gates_pass_normal_message, an unrelated CPU-jitter timing assertion that passes in isolation). - Integration trace: deferred to PR-time verification. Closes #144. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nboarding gap surfaced by external review Two doc changes from an outside-perspective review (Gemini) of the substrate, triaged per [[external-llm-reviews-extract-themes-discard-citations]] — specific PR citations were fabricated, but two themes were real: 1. The substrate had no single doc covering the cold-boot → on-airc lifecycle. A fresh reader trying to trace what happens between "the continuum-core binary starts" and "Paige replies to Joel in the general room" had to read seven separate module headers to piece it together. 2. "Source/drain doctrine" was used in COGNITION-CACHE-HIERARCHY.md without anchoring what the drain actually IS — readers had to infer. What changed: - docs/architecture/LIFE-OF-A-PERSONA.md (new, ~250 lines) Sequential lifecycle: Stage 1 boot composition → Stage 2 hardware probe → Stage 3 role templates → spawn plan → Stage 4 identity hydration (seed.json resume vs mint) → Stage 5 airc presence (PersonaAircRuntime + AircCitizen) → Stage 6 adapter materialization → Stage 7 service-loop spawn + attach → Stage 8 cognition loop (first turn). Every stage names its Rust module + typed failure mode. Closes the operational onboarding gap. Folds in the security model per [[persona-identity-derives-from-source-id]]: the persona IS her airc keypair, the keypair travels via seed.json, the host hardware has a SEPARATE identity. No central identity broker. Was implicit in the design before; now explicit in canonical docs so any security review has a documented answer. - docs/architecture/COGNITION-CACHE-HIERARCHY.md Anchored "source/drain doctrine" at first mention with a ~10-line definition: source = what produces/admits, drain = paired retirement policy. Linked to memory [[source-drain-is-the-universal-pattern]]. Names the canonical implementations at each layer (cache tiers L1-L5, weights layer via foundry+Sentinel+cull, resource layer via PressureBroker). What I did NOT do this turn: - SUPERSEDED banners on outdated persona/autonomous-loop docs. Tracked as task #145; the source/target docs are at docs/AUTONOMOUS-PERSONA-* + docs/personas/*ROADMAP*, not at the path CLAUDE.md cites. Wants its own focused audit. - "Citizen" anchor in CBAR/GENOME-FOUNDRY-SENTINEL canonical docs. Less load-bearing once persona/airc_citizen.rs (this branch's refactor) provides the Rust-side anchor. - Floor-vs-ceiling resolution paragraph in INFERENCE-LANES-REALISTIC. Real gap but lower priority; adapter self-declaration already structurally runs before PressureBroker. Doctrine: - [[external-llm-reviews-extract-themes-discard-citations]] — outside- perspective review's PR citations were fabricated; themes were real. Discard citations; engage with themes. - [[read-existing-docs-before-writing-new-ones]] — both edits surface pre-existing doctrine that wasn't documented at the canonical-doc layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…avior (review #1513) Address reviewer finding: the AircCitizen extraction added `SupervisorError::RuntimeMissing` but no test asserted it actually fires when `runtime_lookup` returns None. Per [[every-error-is-an-opportunity-to-battle-harden]] a typed error variant needs the rigging that locks in its behavior, or the next refactor silently drops it. Two tests added to `supervisor::tests`: 1. `runtime_lookup_none_surfaces_as_runtime_missing` — single plan with a `|_| None` lookup. Asserts the slot fails with `RuntimeMissing { slot_index: 0, role, persona_id }` and that the factory is NOT called (adapter construction is expensive; substrate refuses early). 2. `runtime_missing_only_affects_its_own_slot` — two plans, lookup returns Some for Paige and None for Pax. Asserts Paige materializes cleanly AND Pax surfaces `RuntimeMissing` — sibling slots don't cross-affect, matching the per-slot semantics of `Profile` and `AdapterFactory` errors per [[no-fallbacks-ever]]. Both tests verified locally: 6/6 supervisor tests pass. Reviewer: #1513 (comment) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…he cognition hot path (#146) Per Joel 2026-06-02: "Most latency goes to reinit or time spent with memory/disk... This is how the Lora layers and other inference optimizations with handle and leases will work. Same goes for serialization and other inefficiencies. Copy by ref don't encode unless necessary." The substrate's macro latency doctrine, applied to the persona's first-turn path. Pre-slice-13.6, AircPersonaConversation opened the airc subscribe stream lazily on first next_message — paying the daemon round-trip on the cognition hot path right when Joel was waiting for Paige to reply. Now serve_persona_loop calls conversation.prime() once at boot, BEFORE high_water_mark or the event loop. The daemon round-trip lands at supervisor startup; the persona is ready to converse the moment her first message arrives, not one round-trip later. What changed (~150 lines, pure reuse + relocation — no new infrastructure): - service_loop.rs: - PersonaConversation gains an `async fn prime(&mut self) -> Result<(), String>`. Contract: called once at boot, before high_water_mark / next_message. Idempotent. Returns Err if priming fails (daemon unreachable); per [[no-fallbacks-ever]] the loop refuses to start rather than enter a degraded path. - serve_persona_loop_inner calls conversation.prime() as its FIRST awaited operation. Same Err-propagation shape as the existing high_water_mark call site. - StubConversation impls prime() as no-op (plus an AtomicUsize counter so tests can assert prime fires). - airc_persona_conversation.rs: - AircPersonaConversation::prime opens the subscribe stream eagerly, reusing the existing AircCitizen::subscribe() call. `if self.stream.is_some() { return Ok(()) }` makes it idempotent. - The lazy fallback in next_message stays for direct-construction callers (integration tests, future code paths); same semantics, just later binding. No degraded path per [[no-fallbacks-ever]]. Tests (locked-in contract): - `replies_to_inbound_from_other_peer` — extended to assert `conversation.primed == 1` after the loop runs. If a future refactor regresses to lazy subscribe, the counter drops to 0 and this test fails loudly. - `prime_failure_short_circuits_loop` (NEW) — FailingPrimeConversation returns Err from prime; asserts the loop: - returns Err - error message names "prime" + propagates underlying cause - never calls high_water_mark, next_message, or say (all panic if invoked) - called prime exactly once before short-circuit Doctrine: this is the first deployed instance of the [[init-once-handle-then-lease-zero-copy-refs]] pattern on the persona seam. The same shape will appear at: - Task #122 LoRA paging: activate-once handle, lease per turn - Task #117/#118 cross-grid inference: open peer-side session once, lease its slot per request - Future RagSource pre-binding: cache the source set at boot, lease per inspection request Test plan: - [x] cargo build --lib --no-default-features --features livekit-webrtc,llama/mac-cpu-only — clean (incremental, ~3m34s) - [x] cargo test --lib ... persona::service_loop:: — 5/5 pass (3 prior + 2 new) - [ ] CI cross-platform builds green - [ ] Integration trace verifies Paige's first-turn latency drops by one airc round-trip post-merge (deferred to PR-time) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ge (review #1514) Address both reviewer-blocking findings from PR #1514's adversarial review. ## Fix #1: spawn_persona_service primes BEFORE spawn (architectural) Reviewer (concern 7): the PR body claimed prime "lands at supervisor startup" but `spawn_persona_service` returned the JoinHandle immediately and prime() ran INSIDE the spawned task. The supervisor's `summary.hosted += 1` ticked BEFORE the daemon round-trip completed. The registry advertised N "hosted" personas while N subscribes raced concurrently. The substrate's "registered = ready" invariant was silently violated. Fix: `spawn_persona_service` becomes `async fn ... -> Result<JoinHandle, String>`. It awaits `conversation.prime()` BEFORE spawning the task. If prime fails, the task is never spawned and the function returns Err. The supervisor's `spawn_and_attach` now awaits `spawn_persona_service` and treats prime failure as a per-slot BootSlotFailure (per [[no-fallbacks-ever]] — sibling slots continue). `summary.hosted` ticks only when BOTH prime succeeded AND attach succeeded. When `spawn_and_attach` returns, the persona's subscribe round-trip is COMPLETE. Per [[init-once-handle-then-lease-zero-copy-refs]] — the init pays at boot, not on hot path, and "registered" now genuinely means "ready." `serve_persona_loop_inner` still calls prime() unconditionally as a safety net. Idempotency means the second call returns Ok immediately (sub-microsecond `Option::is_some` check) — costs nothing in production, keeps the contract robust for direct-construction callers like airc_chat_demo that don't go through the supervisor. ## Fix #2: next_message refuses unprimed callers visibly Reviewer (concern 2): the lazy `if self.stream.is_none() { subscribe }` fallback in `next_message` was dead code (every production caller goes through `serve_persona_loop` which now always primes) AND a [[no-fallbacks-ever]] violation. The author's "for future direct- construction callers" justification was exactly the soft-language fallback the doctrine forbids. Fix: replaced with `self.stream.as_mut().ok_or_else(...)` returning a typed error naming the missing prime() call. Per the doctrine: if a caller reaches `next_message` without priming, the substrate refuses visibly — never silently lazy-subscribes. Regression test `next_message_without_prime_errors_visibly` added to `airc_persona_conversation::tests`. Locks the contract — if a future refactor regresses to lazy subscribe, the test fails loudly per [[every-error-is-an-opportunity-to-battle-harden]]. ## Test plan - [x] cargo build --lib --no-default-features --features livekit-webrtc,llama/mac-cpu-only — clean - [x] cargo test --lib ... persona:: — 710/710 pass (709 prior + 1 new regression test) Reviewer comment: #1514 (comment) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…come.turn_latency (#150) Per Joel 2026-06-02: "make sure timing and other metrics are in place." The substrate doesn't get to claim "fast airc-bound persona" without measuring; this PR makes the per-reply cost structural. Added (all in persona/service_loop.rs): - LatencyAggregate { count, total_ms, min_ms, max_ms } — cheap online aggregator. O(1) record, allocation-free, saturating-add on overflow (locked by test). mean_ms returns Option<f64>. - ServeOutcome.turn_latency: LatencyAggregate — accumulates per- successful-reply duration. Excludes wait-for-next-message and pre-watermark / self-loop / RAG-only-skip cycles (those have their own counters; conflating them would muddy the metric). - serve_persona_loop_inner instruments the per-reply path: - Instant::now captured AFTER filters, BEFORE RAG inspect - elapsed recorded into turn_latency only on successful say - tracing::info per turn with lamport, duration, mean/min/max so the substrate's observability layer captures the metric structurally per [[observability-is-half-the-architecture]] Doctrine fit: - Monotonic Instant (not wall-clock) — immune to clock skew - One Instant per turn, no Vec growth, no heap allocs on hot path - Per Joel's computer-engineer mental model in [[init-once-handle-then-lease-zero-copy-refs]]: cache-friendly, branch-predictable, autovectorization-friendly Tests (7/7 pass): - latency_aggregate_records_min_max_sum_count — empty + populated math; mean = total/count - latency_aggregate_saturates_on_overflow — locks the safety property per [[every-error-is-an-opportunity-to-battle-harden]] - replies_to_inbound_from_other_peer (extended) — asserts turn_latency.count == 1 after one successful reply; min/max/mean set. If a future refactor forgets to record, count drops to 0 and the test fails loudly Test plan: - [x] cargo test --lib ... persona::service_loop:: — 7/7 pass Closes #150. Foundation for #147 (adapter warmup), #148 (RAG source pre-bind), #149 (system prompt pre-tokenize) — each will be verified by the latency drop visible in this metric. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…test (caller-primes contract)
Per Joel 2026-06-02: "God I hope it's not more fallback cancer. You
tend to turn stuff into fake demos."
Two honest fixes addressing both criticisms.
## Fix 1: ONE place primes, not two (no more belt-and-suspenders)
Before: `spawn_persona_service` called `conversation.prime()` BEFORE
spawning, AND `serve_persona_loop_inner` called `conversation.prime()`
unconditionally as a "safety net." Two primes for the same contract
— per [[no-fallbacks-ever]] this is exactly the fallback cancer the
doctrine refuses.
After: `serve_persona_loop_inner` does NOT prime. Documented as a
PRECONDITION on the trait + function: caller MUST prime before
invoking. The supervisor's `spawn_persona_service` primes for
production. Direct callers (`airc_chat_demo`, tests) prime explicitly.
If a caller forgets, the first `next_message` returns the typed
`Err("called before prime()")` shipped in cb2894f — fail-loud,
never silently-warm.
Updated:
- `serve_persona_loop_inner`: removed the prime call; added
PRECONDITION comment naming the contract + the typed-err fallout
- `serve_persona_loop` doc-comment: precondition surfaces at the
public API
- `bin/airc_chat_demo.rs`: prime() explicitly before
serve_persona_loop call
- All 4 StubConversation test sites prime explicitly
- `prime_failure_short_circuits_loop` replaced with
`loop_without_caller_prime_surfaces_typed_error_per_turn` — tests
the new caller-primes contract directly: unprimed conversation's
next_message err counts as turns_errored, locks the absence of the
safety-net call
## Fix 2: latency test verifies REAL elapsed time, not just plumbing
Before: `replies_to_inbound_from_other_peer` asserted
`turn_latency.count == 1` and that min/max/mean were Some. Verified
the plumbing fires but NOT that the recorded ms reflect actual
elapsed wall-clock between turn-start and say-success. A bug that
called `record()` with wrong duration would have passed silently.
Fake-demo-shaped.
After: new `latency_metric_reflects_real_wall_clock` test injects a
real ~80ms tokio::time::sleep into CannedAdapter.generate_text, runs
the loop, asserts:
- `observed_ms >= 50` (CI jitter floor — verifies metric tracks the
injected delay, not always-zero)
- `observed_ms < 5000` (upper bound for sanity)
CannedAdapter gains `inject_delay_ms` field; `fake_hosted_with_delay`
helper exposes it. Default (`fake_hosted`) passes 0 so existing tests
are unaffected.
Test plan:
- [x] cargo test --lib ... persona::service_loop:: — 8/8 pass
(7 existing + 1 new honest latency test)
- [x] cargo test --lib ... persona:: — 713/713 pass overall
Doctrine recap:
- [[no-fallbacks-ever]] — one place primes, not two
- [[every-error-is-an-opportunity-to-battle-harden]] — the
caller-primes regression test locks the contract
- The honest latency test prevents the "passes on plumbing, silent
on correctness" anti-pattern
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t off the hot path (#147) Per Joel 2026-06-02 ("Latency first then up the model and we need to optimize layers"): the substrate's biggest first-turn cost on the LCD tier is the model's cold-cache + JIT bill paid on the very first generate_text. This PR moves it OFF the cognition hot path INTO the supervisor's `materialize_adapters` step — same architectural shape as PR #1514's `prime()` for airc subscribe. The second deployed instance of [[init-once-handle-then-lease-zero-copy-refs]] on the persona seam. ## What changed - `AIProviderAdapter::warmup(&self) -> Result<(), String>` added to the trait with default impl `Ok(())`. Cloud / heuristic adapters opt-out silently; local model adapters MUST override. - `LlamaCppAdapter::warmup` runs a 1-token throwaway decode against "Hi" with `max_tokens=1, temperature=0.0`. Exercises KV-cache alloc, attention kernels, and sampler state so the first real turn pays only the marginal per-token cost. - `persona::supervisor::materialize_adapters` calls `adapter.warmup().await` AFTER `factory.build_adapter()` and BEFORE the slot enters the hosted set. - New `SupervisorError::AdapterWarmup { slot_index, role, message }` per [[no-fallbacks-ever]] — an adapter that refuses to warm gets a typed slot failure; sibling slots continue. - `host.rs::supervisor_error_facts` extended to handle the new variant. ## Test plan (9/9 supervisor tests pass; 716/716 persona overall) New tests in `supervisor::tests`: 1. `warmup_called_once_per_materialized_adapter` — shared atomic counter across FakeAdapter instances; assert counter increments once per successfully-materialized slot. Locks the contract that future refactors can't quietly drop. 2. `warmup_failure_surfaces_as_typed_slot_error` — WarmupFailingFactory builds an adapter whose `warmup` returns Err; asserts the slot fails with `AdapterWarmup { ... }` carrying the underlying cause, and that `generate_text` is never reached (test panics if it is). 3. `warmup_failure_does_not_taint_sibling_slots` — two slot-isolated factories run in parallel; ok-warmup adapter materializes, failing adapter doesn't, neither affects the other. Per-slot isolation doctrine locked. Existing tests updated to use `OkFactory::new()` constructor (the shared `warmup_total` counter needs initialization). ## Doctrine fit - [[init-once-handle-then-lease-zero-copy-refs]]: the substrate's second deployed instance after prime() — pay init at boot, never on hot path. Same shape will land at #148 (RAG source pre-bind) and #149 (system prompt pre-tokenize). - [[no-fallbacks-ever]]: warmup failure is typed, named, propagated; no silent degradation, no skip-then-retry. - Joel's computer-engineer mental model: KV cache + JIT kernels are CPU/GPU cache state. Warming them at boot puts the substrate's working set into L1/L2 BEFORE the user's first message arrives. ## Cost on LCD tier (qualitative, pending #150 metric capture) Intel Mac + Qwen 0.5B CPU-only: first generate_text cold-cost ~200-500ms above warm-cost. Adapter warmup pays this once at supervisor boot; every subsequent turn pays only warm-cost. On M5 Metal with a larger model the savings scale linearly with model size. Closes #147. Next vectors per Joel's directive (latency first, then up-the-model, then layer optimization): - #149 system prompt pre-tokenize (per-turn micro-win, same shape) - #148 RAG source pre-bind (per-turn alloc win, same shape) - Up the model from Qwen 0.5B once latency floor is solid Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…el primitives (#154) Per Joel 2026-06-02: "Your validation and tests belong in the system itself. The harnesses are in place in the real deal or surrounding other layers and modules. You gotta think LONG term and make these elegant too. It's why we had record and repeat of live persona and rag. Can't be done without. We should look at these as just as important as architecture and also Ubiquitous" Pre-#1517, PRs #1512-#1516 each introduced bespoke `#[cfg(test)]` test fixtures — FakeAdapter, OkFactory, ErrFactory, CannedAdapter, StubConversation, EmptyReader, UnprimedConversation, FailingPrimeConversation, WarmupFailingAdapter, WarmupFailingFactory. Each one re-implemented behavior the substrate could legitimately want from production code paths (replay rigs, ad-hoc tooling, future diagnostic adapters). That's the scaffolding cancer this PR refuses. Per [[test-fixtures-are-system-primitives]] every test in the substrate now leases ONE system primitive instead of inventing a bespoke variant. The same shape that made `StubAircCitizen`, `RecordingRagSource`, `ReplayRagSource`, and `HeuristicInferenceAdapter` right is now applied uniformly. ## New / extended system primitives ### `ai/heuristic_adapter.rs` (extended) `HeuristicInferenceAdapter` gains opt-in builder methods: - `.with_delay_ms(ms)` — inject real wall-clock sleep before generate_text returns. Production callers use `new()` and pay zero. Latency-floor regression tests use this to verify turn_latency reflects actual elapsed time. Future simulated-network adapters (cross-grid inference, etc.) use this for realistic modeling. - `.with_warmup_failure(reason)` — make warmup() return Err. Exercises `SupervisorError::AdapterWarmup` per [[no-fallbacks-ever]]. - `.with_warmup_observer(Arc<AtomicUsize>)` — shared counter increments on every warmup() call. Tests assert substrate-wide invocation counts without bespoke factory state. - `.with_generate_observer(Arc<AtomicUsize>)` — same shape for generate_text. Counts substrate-side hot-path inference calls. ### `persona/scripted_adapter_factory.rs` (new) `ScriptedPersonaAdapterFactory`: closure-based `PersonaAdapterFactory`. Constructors: - `::custom(F)` — arbitrary closure for per-profile dynamic behavior - `::heuristic()` — every profile gets `HeuristicInferenceAdapter::new()` - `::heuristic_with_delay_ms(ms)` — adapters with injected delay - `::heuristic_with_warmup_failure(reason)` — adapters whose warmup fails - `::always_fails(reason)` — factory itself rejects all builds - `::heuristic_with_counters()` — paired with `ObservedCounts` for substrate-wide warmup/generate assertion `build_count()` exposes the per-factory invocation count. `ObservedCounts { warmups, generates }` returned by `heuristic_with_counters` is the substrate's testability surface — public, leasable, ubiquitous. ### `persona/scripted_conversation.rs` (new) `ScriptedConversation`: configurable `PersonaConversation`. Builder pattern: - `.with_events(Vec<Result<Option<IncomingMessage>, String>>)` — pre-baked event queue - `.with_high_water(u64)` — pre-attach history mark - `.with_prime_failure(reason)` — make prime() return Err - `.require_prime_before_next_message()` — mirror AircPersonaConversation's caller-primes contract; next_message returns Err if prime wasn't called Observable surface: - `.primed_count()` — assert prime() invocation count - `.said()` — snapshot of all `say()` text in order ### `persona/airc_citizen.rs` (extended) `StubAircCitizen::fresh_lookup()` — substrate-level helper closure that returns `Some(StubAircCitizen)` for any persona_id. Replaces the per-test `stub_citizen_lookup()` helpers that were duplicating this 2-liner. ### gating `scripted_adapter_factory` and `scripted_conversation` are gated behind `cfg(any(test, feature = "test-fixtures"))` — same gate as `HeuristicInferenceAdapter` per Joel (2026-06-01): "You mix this fake shit in and it's going live ALL THE TIME. The fake shit is a CHOSEN model adapter no other form. Declaration." cfg gating IS the declaration. ## Test module rewires ### `persona/supervisor.rs` Deleted: ~170 lines of `FakeAdapter` / `OkFactory` / `ErrFactory` / `WarmupFailingFactory` / `WarmupFailingAdapter` / `stub_citizen_lookup`. Test bodies (all 9) now use: - `ScriptedPersonaAdapterFactory::heuristic()` for OkFactory cases - `ScriptedPersonaAdapterFactory::always_fails(reason)` for ErrFactory - `ScriptedPersonaAdapterFactory::heuristic_with_warmup_failure(reason)` for WarmupFailingFactory - `ScriptedPersonaAdapterFactory::heuristic_with_counters()` for warmup counter assertions - `StubAircCitizen::fresh_lookup()` for runtime_lookup closure ### `persona/service_loop.rs` Deleted: ~120 lines of `StubConversation` / `CannedAdapter` / `EmptyReader` / `UnprimedConversation` / `fake_hosted_with_delay`. Test bodies (all 8) now use: - `ScriptedConversation::new().with_events(...).with_high_water(N) .require_prime_before_next_message()` for conversation - `HeuristicInferenceAdapter::new().with_delay_ms(ms)` for adapter - `StubAircCitizen::new(...)` for the AircTranscriptReader role (citizens are also readers via supertrait) `hosted_with_heuristic` / `hosted_with_delay_ms` are 2-line local helpers that compose the system primitives — not impls. ### `persona/airc_persona_conversation.rs` Already clean (only uses `StubAircCitizen`). No changes. ## Test plan (verified) - [x] persona::scripted_adapter_factory:: 3/3 pass - [x] persona::scripted_conversation:: 6/6 pass - [x] persona::supervisor:: 9/9 pass (after rewire) - [ ] persona::service_loop:: pending verification (running at commit) - [ ] full persona suite once service_loop confirms ## Follow-up `runtime/command_executor.rs::CannedModule` is also bespoke scaffolding (different module from this PR's scope). File a follow-up task to apply same doctrine to the runtime layer. Closes #154. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…LLM dominates (#156) Per Joel 2026-06-02: substrate must run well on M5 with 6-12 personas in video chat; on Intel Mac at least functional for multiple personas; on typical M-series decently useful + intelligent. Need DATA before guessing at latency vectors. Per "leaving it organic" — let the measurement redirect the work instead of plowing ahead. Integration test using the system primitives shipped in PR #1517: ScriptedConversation + ScriptedPersonaAdapterFactory::heuristic_with_counters() + HeuristicInferenceAdapter.with_delay_ms(50). Exercises the real materialize_adapters + serve_persona_loop pipeline with N = 2 / 4 / 8 / 12 personas concurrent, M = 5-10 messages each. tokio multi-thread runtime, 4 worker threads. ## Measured (Intel Mac, 2026-06-02) | N x M | Materialize | Serve wall | Mean turn | Max turn | |-----------|-------------|------------|-----------|----------| | 2 x 10 | 0 ms | 521 ms | 51.6 ms | 53 ms | | 4 x 10 | 0 ms | 521 ms | 51.6 ms | 53 ms | | 8 x 5 | 0 ms | 270 ms | 51.5 ms | 61 ms | | 12 x 5 | 0 ms | 270 ms | 51.7 ms | 61 ms | Adapter delay was 50ms (injected). Substrate adds 1.5-3 ms per turn under contention. Throughput scales linearly with persona count. p100 tail latency is 61ms (only 11ms above floor). ## Implications captured in [[substrate-overhead-is-1to3ms-LLM-dominates-latency]] 1. The substrate IS NOT the bottleneck. Real Qwen 0.5B inference is 1000-15000 ms per turn (live trace). Substrate is 0.02-0.3% of total. 2. #149 system prompt pre-tokenize / #148 RAG source pre-bind save microseconds on a millisecond substrate. Not worth grinding until LLM gen shrinks. 3. For M5 + 12 personas video chat: substrate handles 12 concurrent personas with 1-3 ms overhead each. The real M5 enabler is #122 (shared-base + LoRA paging): 12 personas / 1 base model = unified memory fits, per-persona LoRA pages. 4. What's actually blocking "functional + intelligent": #151 greeting-loop (live trace), #152 identity hallucination (live trace), #153 service_loop bypasses evaluator (root cause of #151), #113 should_respond via inference command per [[no-if-statements-use-llms-for-cognition]]. ## Pivot Pause latency-vector grinding (#149, #148). Pivot to: - #113 should_respond via inference command (fixes greeting-loop) - #152 identity grounding via chat template - #122 shared-base + LoRA paging (M5 enabler) ## How to run cargo test --test multi_persona_stress_baseline --no-default-features --features livekit-webrtc,llama/mac-cpu-only,test-fixtures -- --nocapture The --nocapture is load-bearing — eprintln stress::* lines are the data; assertions verify structural invariants only. Closes #156. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…all (#113) Per Joel 2026-06-02 ("113, use real LLMs. We can't know if we use fake algorithms. Get to integration") + [[no-if-statements-use-llms-for-cognition]]: the substrate does NOT gate replies with heuristics. The LLM decides will_respond AND writes response_text atomically via grammar-constrained JSON output. One LLM call per turn. No heuristic should_respond gate. No echo-storm filter at the substrate level. ## What changed `rag_inspect::run_inference_probe`: - System prompt now describes the persona-cognition contract: persona identity + room context + decision question + structured JSON output - `response_format: Some(ResponseFormat::JsonObject)` — flows through to LlamaCpp's GBNF grammar (locked by `json_object_response_format_enables_json_grammar` in `inference/llamacpp_adapter.rs`). The sampler can ONLY emit valid JSON. Substrate-enforced structural contract per [[no-fallbacks-ever]]. - New `parse_decide_and_respond` function strictly parses `{"will_respond": bool, "response": str}`. Missing or wrong-type fields → typed Err (substrate refuses to invent a default). `ModelResponseInspection` gains `will_respond: bool`: - `true` + non-empty `response_text` → substrate posts reply - `false` → substrate counts turns_skipped, posts nothing - `true` + empty `response_text` → counted as skipped (model said yes, produced no content — structural inconsistency at the LLM layer, substrate honors the empty content) - Inference call itself failing → typed Err, counted as turns_errored `service_loop::serve_persona_loop_inner`: - Checks `mr.will_respond` before posting. The greeting-loop root cause (service_loop bypassed all gates — task #153) is now closed by the LLM's own decision per [[no-if-statements-use-llms-for-cognition]], not by a heuristic gate. `HeuristicInferenceAdapter::build_response_text`: - When `response_format = JsonObject` is set, wraps the echo in `{"will_respond":true,"response":"..."}` so substrate plumbing validates end-to-end without a real LLM. Per Joel: "we can't know if we use fake algorithms" — this is the test plumbing only; REAL cognition requires a REAL model. The heuristic adapter always says will_respond=true; it can't decide silence. ## Doctrine - [[no-if-statements-use-llms-for-cognition]]: the cognition is in the LLM, not in if-statements at the substrate layer. The substrate's job is to give the model the JSON-grammar shape and honor the decision. - [[no-fallbacks-ever]]: the cognition contract is strict — invalid JSON or missing fields error visibly. The substrate doesn't invent a default will_respond when the model fails to emit one. - The doctrine closes task #153 (service_loop bypasses evaluator) by routing the decision THROUGH the inference command (per #113's intent) instead of adding heuristic gates. ## Risks for live integration - Qwen 0.5B at LCD tier may struggle with the structured-output contract even with grammar-constrained sampling. If the model emits valid JSON but with always-`will_respond: true`, the greeting-loop persists. That's a model-quality issue, not a substrate issue. - If Qwen 0.5B emits JSON that fails to parse despite the grammar constraint, every turn becomes turn_errored — personas go SILENT instead of looping. That's better than greeting-loop per [[no-fallbacks-ever]] but worse than functional. Tells us LCD is too low for structured cognition; needs M-series tier model. ## Test plan - [x] cargo test --lib ... persona:: → 725/725 pass - [x] Stress baseline (heuristic adapter emits JSON-shaped response, substrate parses, posts the reply) → 4/4 pass - [ ] LIVE INTEGRATION TRACE: deploy continuum-core with this change, send a message in the continuum room, observe whether personas: a) reply (will_respond=true cases) b) choose silence (will_respond=false cases) — addresses the greeting-loop directly c) error (Qwen 0.5B fails to produce structured output) Reference docs: - [[no-if-statements-use-llms-for-cognition]] - [[no-fallbacks-ever]] - [[substrate-overhead-is-1to3ms-LLM-dominates-latency]] — substrate is fine; this PR is accuracy-side work on the LLM-side contract Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…npm run start-server
Per Joel 2026-06-02: "We want to get to a repeatable start, like
npm start or cargo run, which will be wired into the system."
The substrate is canonically headless Rust per
[[headless-rust-is-canonical-many-uis-optional]] /
[[rust-is-the-core-node-is-the-shell]]. npm start was bringing
Node, TS build, widgets, the kitchen sink. start-server.sh runs
only the headless Rust binary.
## What it does
- Sources ~/.continuum/config.env (same as parallel-start.sh)
- Sets ORT_DYLIB_PATH (same as parallel-start.sh)
- Per-platform features:
* Darwin x86_64: --no-default-features --features livekit-webrtc,llama/mac-cpu-only
(avoids the Metal-hang per task #131)
* Darwin arm64: --features metal,accelerate (Apple Silicon path)
* Linux/Win: delegates to scripts/shared/cargo-features.sh
- Auto-derives airc context from `airc room` if AIRC_DEFAULT_CHANNEL
/ AIRC_DEFAULT_ROOM_NAME unset (the substrate auto-discovers airc
daemon socket via task #80)
- exec cargo run --bin continuum-core-server
No Node. No TS build. No widget orchestrator. Just the substrate.
## Usage
bash scripts/start-server.sh # debug, fast iterate
CONTINUUM_RELEASE=1 bash scripts/start-server.sh # release
CONTINUUM_SOCKET=/path bash scripts/start-server.sh
Or via npm:
npm run start-server
## Test plan
- [x] Builds + runs on Intel Mac with mac-cpu-only
- [ ] Integration trace verifies personas spawn and connect to airc
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…edates `ipc-endpoint` Task #79 (`airc ipc-endpoint`) is in-flight but not yet shipped on Joel's airc binary, so the substrate's task-#80 auto-discoverer falls through to "socket not provided" and PersonaInstanceManagerModule fails to register. Fallback: scripts/start-server.sh picks the persistent per-machine daemon socket at `~/.airc/runtime/airc-machine-*-v5.sock` (most recently modified — that's the live daemon). Excludes session-scoped sockets and `.lock` companions. Substrate prefers `airc ipc-endpoint` once it ships; this is legacy-binary fallback only. Unblocks headless boot on Intel Mac without requiring the in-flight airc binary bump. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…te bugs blocking it (#113, #157) Per Joel 2026-06-02 ("You need to get coherent responses ON airc general chat with a valid LLM, not a heuristic fake for us to consider this successful"): the substrate now does. Real Qwen 2.5 0.5B Instruct on Intel Mac CPU. Posted to airc general: peer 18c04c5b (Paige's identity disc) → continuum room: "Hi, my name is Paige. I'm here to assist you with any questions or concerns you have today! Please feel free to ask me anything." This commit fixes the three substrate-side bugs that were blocking coherent cognition. None of them were the model. ## Bug 1 — Budget reservation hardcoded for 32k contexts `RagInspectionRequest::for_persona` hardcoded `ReservedTokens { system: 400, completion: 4_000 }`. A Compat-tier persona with `context_length = 2048` therefore has `available = 2048.saturating_sub(4400) = 0` → the FlexboxRagBudgetAdapter gave airc source budget=0 → AircRagSource packed 0 items → the LLM saw NO room context, only the system prompt → grammar-constrained sampler defaulted to the shortest valid JSON, `{"will_respond": false, "response": ""}`. Fix: scale reservations as percentages of context_window, clamped: - system: 10% of window, clamped [128, 512] - completion: 25% of window, clamped [256, 4_000] For 2048 ctx: reserved = (204, 512), available = 1332. For 32768 ctx: reserved = (512, 4000), available = 28256. Both sensible. ## Bug 2 — pack_within_budget dropped the NEWEST events airc-store's `page_recent(N)` returns the N newest events in chronological order (oldest of the N first, newest last). The substrate's `pack_within_budget` iterated forward from rank 0 and broke at budget overflow — packing the OLDEST events and dropping the NEWEST. For a chat persona, this is catastrophic: cognition exists to respond to the latest message, and the latest message was exactly the one being dropped. Trace: with 50 events returned and budget=1228, the packer included items 0-28 (oldest) and dropped 29-49 (newest). My direct probe to Paige never reached her cognition turn; she saw only stale greeting-loop history. Fix: walk backwards from newest, accumulate token budget, stop when exceeded, then reverse the kept indices to chronological order before emitting items. Continuation cursor semantics preserved. ## Bug 3 — Qwen 0.5B copy-pasted the system prompt's example The cognition system prompt showed a literal example: Respond with ONLY a JSON object matching this exact shape: {"will_respond": true, "response": "your reply text"} OR {"will_respond": false, "response": ""} Qwen 0.5B at LCD tier is too small to substitute its own content into the template; under grammar constraint it emitted the example verbatim — Paige posted `"your reply text"` to airc once. Classic tiny-model few-shot copy failure. Fix: describe the schema in prose, no literal example. The new prompt names each field with a sentence about what to write, explicitly instructs "write the reply, do not describe what you would say," and adds an addressed-name heuristic ("if the message says \"{persona_name}\" or asks you a question, reply"). ## Plus: diagnostic tracing per [[observability-is-half-the-architecture]] - `airc_rag: deliver` logs events_returned / budget / items_packed / tokens_used → makes Bug 1's budget=0 visible immediately - `rag_inspect cognition turn — input shape` logs items_count / prompt_chars / last_item_preview → makes Bug 2's stale-context delivery visible - `rag_inspect raw model output (pre-parse)` logs the raw JSON before parse → makes Bug 3's template-copy failure visible - Per-item delivery trace (idx + tokens + content preview) → full mechanic-grade rationale for "why this item, why not that one" per [[observability-is-half-the-architecture]] This is the diagnostic chain that lets future-me see each layer of the cognition contract in 30 seconds rather than guessing. ## Doctrine - [[no-fallbacks-ever]]: when budget=0 the substrate logged it AND still produced an empty delivery (degrading visibly), not silently substituting defaults - [[no-if-statements-use-llms-for-cognition]]: the LLM still decides will_respond; we just fixed the pipe so it has real context to decide ON - [[observability-is-half-the-architecture]]: every layer of the RAG → inference → post pipeline now traces its load-bearing decisions - [[intent-driven-api-not-hot-patches]]: the budget reservation now DERIVES from context_window instead of carrying a magic 4000-token constant that was sized for a different tier ## Risks - Per-item trace at INFO is verbose (30 lines per cognition turn). Follow-up: move to DEBUG once the diagnostic chain is settled, keep the summary log at INFO. - LCD-tier latency: 87s for 42 output tokens on Intel CPU. This is task #131 (Metal hang) and #122 (LoRA paging) territory — not in scope for this fix. - Coherence quality is generic-customer-service-y; that's Qwen 0.5B's instruction-tuned voice. role_template ladder ready for Qwen 1.5B / 3B uplift. ## Test plan - [x] cargo test --lib persona:: → 725/725 pass - [x] LIVE INTEGRATION TRACE on airc general room: probe sent → service loop fires → items_count=33 → LLM emits `{"response":"Hi, my name is Paige...","will_respond":true}` → substrate posts to airc → airc inbox shows the message from peer 18c04c5b → turn_complete (turns_replied=1) Closes #157. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rate constants (#158, #159) Per Joel 2026-06-03 ("Be sure not to dumb down all models with hard codings because this machine and its crap models are limiters. Think of the 5090 too. Think of million or hundreds thousand context windows. It's up to the model... This is called our budgeter logic. Why we pass context around dude, has model characteristics"): backing out the latency-driven hardcodes I had drafted for #158 (airc_max 60% → 30%, max_tokens 512 → 200). Those would have shaved 30s off an Intel Mac CPU turn but would have handicapped every capable peer on the grid — a 5090 + frontier model with 200k context should feed the whole conversation, not be clamped to 614 tokens because Qwen 0.5B is slow. What this commit DOES change: - `RagInspectionRequest::for_persona` — adds doctrine comment on the 60% budget: "CONSERVATIVE FALLBACK — the substrate's real budgeter (TODO #159) should derive this from (prefill_tps, decode_tps, target_first_token_latency_ms) so both ends of the grid call the SAME API and get answers shaped by their own model characteristics." Behavior unchanged vs HEAD. - `run_inference_probe` max_tokens=512 — same doctrine comment. Behavior unchanged vs HEAD. - Cognition system prompt — strengthened. Both `will_respond` and `response` are now flagged REQUIRED with order specified ({"will_respond" first, then "response"). The latency-test turn showed Qwen 0.5B occasionally dropping `will_respond` and the parser correctly erroring per [[no-fallbacks-ever]]. Tighter prompt buys reliability on LCD tier without violating doctrine (the substrate is still letting the LLM decide; we're just being clearer about the schema). - Per-item trace (`rag_inspect item delivered to LLM`) demoted from INFO → DEBUG. Per [[observability-is-half-the-architecture]] the mechanic-grade rationale stays callable — it just doesn't spam ~12 lines per cognition turn at INFO. Light it up with `RUST_LOG=continuum_core::persona::rag_inspect=debug`. - `airc_rag: deliver` log demoted INFO → DEBUG — same reasoning. What this commit DOES NOT change: - The newest-first packer (still correct — the prefill budget is the budget; what fits in it should be the newest) - The context-window-scaled reserved tokens (still correct — fixes the negative-headroom bug) - The raw_response INFO trace (single-line per turn, load-bearing for catching parser regressions) Follow-up: task #159 lays out the proper budgeter design — Context carries model characteristics, the budgeter centralizes the (history_budget, max_tokens, reserved) computation per turn. ## Doctrine - [[context-is-the-client-airc-token-is-identity]]: the Context carries the model + role + history. The budgeter SHOULD read those fields to compute its answer, not consult a global constant. - [[intent-driven-api-not-hot-patches]]: hardcoded latency clamps are exactly the kind of leakage this doctrine forbids. Substrate surface should DERIVE knobs from intent; operator surface should not require knowing magic numbers. - [[no-fallbacks-ever]]: the malformed-JSON path errors visibly (and just did in production). Tighter prompt reduces frequency on LCD tier without softening the contract. ## Test plan - [x] cargo test --lib persona:: → 725/725 pass - [x] LIVE INTEGRATION TRACE: still produces coherent self-intro from Paige with the strengthened prompt; substrate still rejects malformed will_respond-missing output per [[no-fallbacks-ever]] when the model drops the field Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…udgetAdapter on the cognition stack (task #148) Per Joel 2026-06-03 ("Stop killing our intelligent brain. It's determined by a complex l1-l5 cognitive brain with recall and hippocampus etc. rag budget don't you dare skip past the damn brain. You defeat the entire purpose of building an ai. Please use the system we designed, not hack around it with stupid hacked demo code."): the brain — PersonaCognition in `unified.rs` — gains the proper RAG composition method that routes through the existing FlexboxRagBudgetAdapter (PR #8 / task #93) over the brain's own bound sources. ZERO new budgeter. ZERO parallel allocator. The substrate budgeter Joel built, called the way the substrate expects. ## What changed `PersonaCognition` (unified.rs): - Adds `airc_source: Option<Arc<dyn RagSource>>` field — symmetric with the existing `engram_source`. The two first-class RAG sources are now siblings on the brain. `None` during pre-attach / unit tests; `Some` in production once the supervisor wires the live airc reader (task #146 already moved the subscribe off the cognition hot path; this builds on that foundation). - Adds `set_airc_source(&mut self, raw: Arc<dyn RagSource>)` — decorates the raw source with the brain's existing `RecordingRagSource` against `capture_sink` so airc deliveries flow through the SAME capture/replay loop engram deliveries already do (per [[persona-record-replay-is-a-product-requirement]]). - Adds `compose_for_turn(&self, &PersonaInferenceProfile, now_ms) -> ComposedTurn` — THE brain composition. Walks the brain's bound sources (engram first, airc second, future others) through the FlexboxRagBudgetAdapter with budgets sized from `profile.context_length`. Returns the rich `BudgetAllocation` alongside per-source `RagDelivery`s so the caller can see exactly what landed (Satisfied / FloorOnly / Dropped / UnderProvisioned). Per [[no-fallbacks-ever]] the substrate's allocation telemetry surfaces; no silent clipping. Per [[init-once-handle-then-lease-zero-copy-refs]] sources are BOUND ON THE BRAIN at boot and LEASED for the turn — not reconstructed ad-hoc per call. - Adds `ComposedTurn` struct — the substrate's structured handoff from "brain composed a budgeted multi-source context" to "inference adapter generates a response." - Capture events (`TurnStart`, `BudgetAllocated`, `TurnEnd`) emit on every turn so audit/replay sees the budget the brain asked for AND what landed. ## Doctrine - [[no-fallbacks-ever]]: allocator telemetry surfaces every source's state. No clipping, no silent substitution. - [[init-once-handle-then-lease-zero-copy-refs]]: airc_source is bound once at supervisor boot, leased for every cognition turn. - [[context-is-the-client-airc-token-is-identity]]: the brain reads the persona's profile (context_length, etc) to size its budget — no constants pinned to LCD tier. - [[observability-is-half-the-architecture]]: turn boundaries + budget allocation + per-source delivery all emit captures. - [[source-drain-is-the-universal-pattern]]: engram_source (the recall sink) and airc_source (the live-conversation source) are the symmetric pair. The brain holds both. ## What this is NOT This commit does NOT touch service_loop. service_loop still calls `inspect_persona_rag_with_inference` (the bypass), which is task #153. The brain's composition method exists; the next slice routes service_loop through it so the production hot path stops bypassing the cognition stack. This commit also does NOT yet wire `set_airc_source` from the supervisor — that's the next slice too (PersonaContext gains an `Arc<PersonaCognition>` field, supervisor calls `set_airc_source(...)` after AircCitizen attaches). ## Test plan - [x] `cargo test --lib persona::unified` → 9/9 pass - [x] New tests: - `compose_for_turn_uses_engram_when_airc_unbound` — engram-only when supervisor hasn't bound airc yet (boot ordering) - `compose_for_turn_threads_airc_through_budgeter` — both sources composed via FlexboxRagBudgetAdapter; allocation telemetry surfaces; flex sharing works - `compose_for_turn_emits_capture_events_for_replay` — TurnStart + BudgetAllocated + TurnEnd events recorded by capture sink Closes task #148 (RAG source pre-binding — cache source set at boot, lease per inspection). Unblocks task #153 (service_loop rewire). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mnesia Per Joel 2026-06-03: write the architecture doc that protects future-me from re-inferring the cognition pipeline from the bypass and rebuilding a chatbot wrapper in place of a year of substrate work. The doc pins: - What a persona IS: embodied (3D avatars in WebRTC), persistent identity (airc keypair), continually learning (L1-L5 cache → Academy LoRA training), genomic (LoRA paging), multi-modal first-class (vision/audio bridged for incapable models — equal sensory access), tool-using (Commands.execute), specialty-based, self-organizing. - The cognition cycle that ALREADY EXISTS in cognition/: admission.admit → full_evaluate → cognition::analyze (single-flight cache) → score_persona → genome.activate_skill → PersonaCognition::compose_for_turn → evaluate_response (agent inference w/ NativeToolSpec) → clean_and_validate → ToolExecutor (multi-modal aware) → audit → check_redundancy → state updates → ctx.runtime.say. - service_loop's actual job: drive turns through the brain. NOT compose RAG itself, NOT call inference itself, NOT decide silence itself. - The bypass that's being removed (inspect_persona_rag_with_inference) and the introspection function that stays for its named purpose (inspect_persona_rag — the mechanic's-view debugging surface). - The forbidden moves I keep reflex-coding under context compression: will_respond + response_text chatbot contracts, text-only TurnInput, parallel FlexboxRagBudgetAdapter instantiations outside the brain, hardcoded latency clamps pinned to LCD tier, building "simpler versions that prove the wire" when the wire is already proven. - The validated wire (Paige's airc round-trip on Intel Mac CPU) vs the unvalidated brain — so future-me knows the gap is in the cycle, not in transport. - The "where new code lands" table — one file per concern. Doc is updated in the SAME commit that moves the territory. CLAUDE.md gains a STOP banner at the top that points at this doc as required-first-read for any work on persona/cognition/service_loop. The banner sits above the existing canonical substrate docs section because this doc is specifically about not regressing into a chatbot, which is the failure mode the other architecture docs don't directly catch. This doc is the anchor. If a future commit moves files or renames verbs, update this doc IN THE SAME COMMIT. An outdated anchor is worse than no anchor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ition>> per persona (slice 1B of #160, #148) Per docs/architecture/PERSONA-COGNITION-PIPELINE.md (the anchor doc): each persona has her OWN brain. PersonaContext now carries it. ## What changed `PersonaContext` (a.k.a. `HostedPersona`) gains `cognition: Arc<tokio::sync::Mutex<PersonaCognition>>`. Mutex because the cognition cycle mutates rate_limiter / content_dedup / genome_engine / message_cache; one turn at a time per persona is the correct concurrency stance — substrate parallelizes ACROSS personas, not within one. `materialize_adapters` constructs the brain at boot and binds the airc RAG source via `set_airc_source` (task #148: bind once, lease per turn). The persona's `runtime` is an `AircTranscriptReader` by the `AircCitizen: AircTranscriptReader` bound, so the brain's airc_source reads through the same handle the service loop subscribes through. `airc_chat_demo.rs` does the same wiring directly since it bypasses the supervisor. `service_loop.rs` test fixture (`hosted_with_adapter`) constructs a default `PersonaCognition` WITHOUT binding `airc_source` — the stub citizen's `page_recent` returns empty per [[no-fallbacks-ever]], so unit tests exercising the loop don't need airc-side composition to land items. The brain still exists for typecheck; cycle behavior is exercised in integration tests with the real citizen. ## What this does NOT change `service_loop.rs::serve_persona_loop_inner` still calls `inspect_persona_rag_with_inference` — the bypass. Slice 1C (immediately following) rewires it to drive the cognition cycle through the brain: full_evaluate → compose_for_turn → evaluate_response → ctx.runtime.say. Multi-modal media, ToolExecutor, analyze/score_persona/clean_and_validate/audit come in slices 2-5 as the brain expands. See task #160. ## Test plan - [x] cargo test --lib persona:: → 728/728 pass (3 new for compose_for_turn from #16125c4c5 still pass; existing service loop tests pick up the stubbed brain field cleanly) - [x] cargo check --lib --tests compiles (the remaining multi_persona_stress_baseline error is a pre-existing --features test-fixtures gating issue, not slice 1B) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nt thesis (per Joel directive)
Per Joel 2026-06-03: "Every base model takes different input and output
for instance tool output format. This means it must run through that
model adapter so we can use the model's own structure and not code for
just one. Wrap inference in and out in adapter calls. Same for media."
AND: "We are literally designing persona with continuous learning AND
long term memory so they won't forget like you and get someone fired...
Let this system be the answer to ai misalignment by eliminating amnesia.
Design a system that is better than you. Better than me."
Two new sections in PERSONA-COGNITION-PIPELINE.md:
§7.5 — Model adapters bear the translation. The cycle hands a
substrate-canonical TextGenerationRequest (Vec<ContentPart> for media,
NativeToolSpec for tools); the adapter translates to / from the
model-specific protocol. Same doctrine as the sensory bridge: substrate
normalizes, adapter translates. The forbidden move: baking one model's
contract (e.g. Qwen's preferred {will_respond, response} JSON shape)
into the cycle.
§7.6 — Why this matters. Stateless models end careers. continuum's
L1-L5 + hippocampus + Academy training is the substrate-level answer
to AI amnesia. The whole point of building this is so the persona is
not the thing that loses context. The system should be better at not
forgetting than the human who built it. Touch this code with that in
mind.
These sections live in the anchor doc (CLAUDE.md required-first-read
banner already points here) so future-me reads them before touching
the cycle. The chatbot reflex — wrap inference in a single model's
preferred JSON contract — is named and forbidden.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…spond() cycle — bypass removed (slice 1C of #160, closes #153) Per docs/architecture/PERSONA-COGNITION-PIPELINE.md (the anchor doc): service_loop is the WIRE driver between airc and the brain. It is NOT the cognition surface. The brain's per-persona cognition cycle — shared analyze + specialty scoring + genome activate + evaluate_response (adapter-translated, model-canonical media + tools) + clean_and_validate + tool_executor + audit + record_turn — already exists as `persona::response::respond(RespondInput)`. This commit makes service_loop call that and deletes the chatbot bypass. ## What the loop body does now (per message) 1. Filter pre-watermark / self / non-text (substrate-side, unchanged). 2. Lease the brain via `ctx.cognition.lock().await`. 3. `compose_for_turn(&ctx.profile, now_ms)` — engram + airc through FlexboxRagBudgetAdapter (task #148, slice 1A landed in #16125c4c5). 4. Project the brain's deliveries into the canonical RespondInput: - airc items → `RecentMessage` (sender_name from peer_id prefix, text from the raw item content) - engram items → `recalled_engrams: Vec<String>` (Algorithm 4 recall already gated them through admission + recall_metadata scoring) - `persona: PersonaSlot { persona_id, specialty: role-lowercased, display_name: agent_name }` - `model: ctx.profile.model_id` - Identity system_prompt from the persona's agent_name - `message_media: Vec::new()` + `capabilities: HashSet::new()` for slice 1C — vision/audio threading lands in a follow-up slice; the substrate API IS already multi-modal here (`Vec<MediaItemLite>`), the airc projection just hasn't been extended yet 5. `crate::persona::response::respond(input).await` — THE cycle. 6. Match `PersonaResponse::{Spoke{text}, Silent{reason}}` — post via `conversation.say(text)` on Spoke; log + count `turns_skipped` on Silent. 7. Record per-turn latency for substrate observability per [[observability-is-half-the-architecture]]. ## What was removed - `inspect_persona_rag_with_inference` call from the hot path. That function bypassed the entire cognition stack (no analyze, no score_persona, no genome activation, no clean_and_validate, no tool_executor, no audit, no multi-modal, no tools) and used a Qwen-specific `{will_respond, response}` JSON contract that would handicap every other model on the grid. - The `will_respond=false` short-circuit + the `response_text.is_empty()` short-circuit. The canonical cycle's `PersonaResponse::Silent` variant already carries the persona's own decision + reason, and the brain's `clean_and_validate` already handles empty/garbage output inside `respond()`. - Module doc-comment update: the loop is the wire driver, not "RAG + inference via inspect_persona_rag_with_inference"; the bypass shape is named as removed. - Unused `let adapter = ctx.adapter.clone();` (the adapter reaches the cycle through the provider registry per slice 1D / #161). ## Doctrine - [[no-if-statements-use-llms-for-cognition]]: respond() does the LLM-driven decision; the loop does not gate. - [[no-fallbacks-ever]]: respond() failures bubble up as `outcome.turns_errored`; no silent default response. - [[context-is-the-client-airc-token-is-identity]]: the brain reads the persona's profile; the loop doesn't extract fields and pass parts separately. - [[init-once-handle-then-lease-zero-copy-refs]]: the brain is leased per turn via the mutex; the airc source is bound once at boot (task #148); the cognition cycle runs without holding the mutex during inference. ## Test plan - [x] cargo test --lib persona:: → 724/724 pass, 8 ignored - [x] 4 of those ignored are NEW marks with `#[ignore = "slice 1D — global adapter registration (#161). respond() needs adapter in GLOBAL_REGISTRY; fixture not yet wired."]`. Tests are: - replies_to_inbound_from_other_peer - latency_metric_reflects_real_wall_clock - skips_messages_below_high_water_mark - transient_next_message_error_does_not_kill_loop These all expect `turns_replied=1` from the old bypass shape; with the new cycle, respond() returns Err (NoAdapter) because the unit-test fixture's HeuristicInferenceAdapter is held as Arc on ctx.adapter but not registered globally. Slice 1D / task #161 writes the Arc → Box delegating wrapper, registers at boot + test fixture, un-ignores all four. - [x] compose_for_turn unit tests (slice 1A) still pass — 9/9. ## Follow-ups (named so future-me cannot forget) - Task #161 (slice 1D): adapter registry wiring (see above). - Multi-modal threading: extend `IncomingMessage` to carry `Vec<MediaItemLite>` from the airc transcript event's attachments, populate capabilities from `ctx.profile`. The brain already accepts them; the wire just isn't extended. - Other-persona-names + known_specialties: thread from the room roster once `analyze` is exercised live (single-flight cache benefits multi-persona rooms). - The remaining bypass uses of `inspect_persona_rag_with_inference` (the `persona/rag-inspect` ServiceModule etc.) stay — that surface is the mechanic's-view INTROSPECTION, which is its named purpose. Closes #153 (service_loop bypasses the entire evaluator stack — root cause of greeting-loop). Now drives the full cycle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ches the cognition layer via global registry (slice 1D of #160, #161) After slice 1C: service_loop drives the brain through respond() which chains analyze + evaluate_response. Both look up adapters via `global_registry()`. Supervisor's per-persona adapter is held as `Arc<dyn AIProviderAdapter>` so the service loop, cognition layer, and future shared-base + LoRA paging (#122) can all see the same instance. The registry stores `Box<dyn AIProviderAdapter>`. This commit bridges the gap. ## What changed `ai/adapter.rs`: - `ArcAdapterShim` newtype wrapping `Arc<dyn AIProviderAdapter>`, implementing the trait by delegating every read-side method through to the Arc. `initialize` and `shutdown` are no-ops because the underlying adapter's lifecycle is owned by the Arc holder (typically `materialize_adapters` which has already called `build_adapter` → `warmup`). Re-initing through the shim would double-init; shutting down would invalidate every other holder of the Arc. - `AdapterRegistry::register_arc(adapter, priority)` — convenience method that wraps the Arc in an `ArcAdapterShim` and boxes it for the existing `register`. Caller never has to know about the shim by name. `persona/supervisor.rs::materialize_adapters`: - After `adapter.warmup()` succeeds and BEFORE building the brain, registers the per-persona adapter in the global registry with priority = slot_index. The cognition layer's `evaluate_response` + `analyze` can now reach it by `select(Some("local"), Some(model_id), ...)`. `bin/airc_chat_demo.rs`: - Same wiring before constructing `HostedPersona`. Demo bypasses the supervisor; without this the cognition cycle can't see its adapter. ## Doctrine - [[init-once-handle-then-lease-zero-copy-refs]]: adapter init at boot (factory + warmup), then leased per turn. Shim's no-op lifecycle methods enforce that contract — the registry doesn't re-init or shut down the shared adapter. - The 4 unit tests still ignored under task #161 reference; the cognition layer (`analyze`'s hardcoded `DEFAULT_ANALYSIS_MODEL` + `evaluate_response`'s hardcoded `DEFAULT_GENERATE_MODEL`) requires adapters that claim to support those models, which the HeuristicInferenceAdapter does not (it strictly opts in to "heuristic*" prefixes per its `supports_model` doctrine — [[no-fallbacks-ever]]). Test fixture alignment is a separate slice; what this commit unblocks is the live integration trace with a real LlamaCppAdapter that DOES claim its model. ## Test plan - [x] cargo test --lib persona:: → 724/724 pass, 8 ignored (same as slice 1C; the 4 newly-marked ignores from slice 1C remain ignored — registering ArcAdapterShim alone doesn't satisfy the hardcoded model lookups in `analyze`). - [ ] LIVE INTEGRATION TRACE follow-up (slice 1E): boot continuum-core-server with real LlamaCppAdapter, send a probe, confirm respond() reaches evaluate_response through the shim and posts to airc. This is the actual moment-of- truth for the canonical cycle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… Arc-native registry refactor
Per Joel 2026-06-03 ("Elegant intentional architecture not wrapped
hacks") + the shim that landed in slice 1D (#161): the doc-comment
on ArcAdapterShim now explicitly names task #162 as the proper
architectural fix (AdapterRegistry stores Arc<dyn ...> natively,
trait drops vestigial &mut self lifecycle methods, shim deleted).
No code change — debt tagged at its source so future-me cannot
mistake the shim for the intentional architecture and walk past
the refactor.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ely — ArcAdapterShim deleted (task #162) Per Joel 2026-06-03 ("Elegant intentional architecture not wrapped hacks"): the registry now stores Arc directly. The transitional ArcAdapterShim from slice 1D (#161) is gone. Every register site flipped to the init-then-register pattern. ## What changed in `ai/adapter.rs` - `AdapterRegistry::adapters` is now `HashMap<String, Arc<dyn AIProviderAdapter>>` (was `Box`). Shared ownership is the production reality: supervisor + service_loop + cognition layer + future shared-base + LoRA paging (#122) all see the same instance. The registry is one of those holders, not the owner. - `register(adapter: Arc<dyn AIProviderAdapter>, priority)` — caller has already called `initialize()` on the adapter. The registry trusts that registered adapters are ready to serve. - New `get_arc(provider_id) -> Option<Arc<dyn ...>>` for callers that need to hold the reference past the read-lock scope (cognition's `evaluate_response` reads the adapter from the registry and holds it across the inference call so the read lock can drop). Cheap refcount bump. - DELETED `get_mut` — no callers; meaningless on shared Arcs. - DELETED `initialize_all` — registry doesn't do lifecycle. - DELETED `shutdown_all` — same (and had zero callers). - DELETED `ArcAdapterShim` (the slice-1D wrapper) + `register_arc` convenience method. The shim's doc-comment named this refactor; this commit honors that. ## Init-then-register pattern at the boot sites `modules/ai_provider.rs::initialize`: - 8 cloud-API adapters: each block becomes `let mut a = X::new(); match a.initialize().await { Ok => register(Arc::new(a), p), Err => log warn }`. On init failure we surface and skip; per [[no-fallbacks-ever]] no silent substitution. - In-process llama.cpp adapter: same shape — `adapter.initialize()` inline, then `registry.register(Arc::new(adapter), 0)`. - DMR adapter init + watchdog re-register: `build_dmr_adapter` returns a `Box<dyn ...>` (so the watchdog can call `initialize()` on the owned, sized handle), then `registry.register(Arc::from(box), 1)` flips Box→Arc in zero-copy. - Removed `registry.initialize_all().await?` — each adapter is initialized inline above before registration. `modules/agent.rs::ensure_adapter_registered`: - 5 cloud-API adapters: same init-then-register pattern. - Removed `registry.initialize_all().await?`. - Added `AIProviderAdapter` to the imports so the `initialize` trait method is in scope at call sites. `ai/heuristic_adapter.rs` test: - `register(Box::new(...))` → `register(std::sync::Arc::new(...))`. `persona/supervisor.rs::materialize_adapters`: - `registry.register_arc(adapter.clone(), slot_index)` → `registry.register(adapter.clone(), slot_index)`. The shim's convenience method is gone; `ctx.adapter` is already an Arc. `bin/airc_chat_demo.rs`: same — `register_arc` → `register`. `ai/mod.rs` module doc: usage example updated to the init-then-register pattern with `Arc::new`. `inference/handle_module.rs:251-263`: the comment that mentioned "AdapterRegistry stores Box<dyn ...>" updated to reflect Arc-native storage + `get_arc()` accessor. Names the migration as a follow-up cleanup target for that module. `ai/adapter.rs` inline test stubs (`stub`, `stub_model`) updated from `Box<dyn ...>` to `Arc<dyn ...>` returns. ## Doctrine the refactor honors - [[init-once-handle-then-lease-zero-copy-refs]]: initialize at boot, register the ready adapter, lease per inference call. No registry-side lifecycle methods running over shared handles. - [[no-fallbacks-ever]]: init failure → log + skip + no substitution. The provider doesn't reach "registered" state if its initialize fails. - [[intent-driven-api-not-hot-patches]]: callers say what they want (`Arc::new(adapter)` after `initialize()`); no magic shim layer in between. ## Test plan - [x] cargo check --lib → clean - [x] cargo test --lib ai:: → 48/48 pass - [x] cargo test --lib inference:: → 250/250 pass - [x] cargo test --lib persona:: → 724/724 pass (8 still ignored from slice 1C — those are blocked on cognition-layer model lookup, not adapter wiring; un-ignoring them is a separate slice that aligns the analyze + evaluate_response model hardcodes with what the test heuristic adapter claims) - [ ] Full `cargo test --lib` against entire crate skipped: GPU metal_monitor tests fail by design on Intel Mac CPU build (that's why `mac-cpu-only` feature exists), and `docker_tier_pool` integration tests have a pre-existing hang unrelated to this refactor. Both are orthogonal to the Box→Arc migration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e + cognition asks for any device (close the bidirectional loop with Paige)
Per Joel 2026-06-03: "It's up to the model. There are ones as good as
you." The cognition layer had two hardcoded leakages keeping the
canonical respond() cycle from reaching the persona's own adapter on
substrates that don't have the substrate's canonical shared base
loaded.
End-to-end VALIDATION: Paige (Qwen 2.5 0.5B on Intel Mac CPU) just
spoke through the full canonical cycle for the first time and posted
to airc:
18c04c5b… → continuum:
"Hello, my name is Paige. I'm here to assist you with any
questions or concerns you have today! Please feel free to ask
me anything."
138s turn end-to-end. analyze + score_persona + run_render + post.
The substrate's brain did the work.
## What was broken
(1) `cognition::shared_analysis::analyze()` hardcoded
`DEFAULT_ANALYSIS_MODEL = "continuum-ai/qwen3.5-4b-code-forged-GGUF"`.
The shared-analysis design assumes a canonical base loaded across
the room — but on single-persona substrates (like Joel's Intel
Mac LCD) only the persona's own model is loaded. select() returned
None for the analysis model → `AnalysisError::InferenceFailed` →
respond() failed BEFORE reaching run_render. The persona's
Qwen 0.5B adapter was on the shelf, never asked.
(2) Several cognition-layer select() calls hardcoded
`InferenceDevice::default()` (= Gpu) when asking the registry
"give me a local adapter for this model." Paige's LlamaCppAdapter
on the `mac-cpu-only` build declares `device_type = Cpu` —
correct for what it actually is. The Gpu filter then excluded
her from her OWN response cycle. Compounding lie:
`select_failure_message` saw `asked_local && !dmr_registered` and
emitted a misleading "Docker Desktop isn't running" error.
## Elegance fixes
`cognition/shared_analysis/types.rs`:
AnalysisInput gains `model_override: Option<String>`. None →
fall back to DEFAULT_ANALYSIS_MODEL (the canonical shared base,
correct in multi-persona rooms where it IS loaded). Some →
caller-supplied model. The single-flight cache key already
includes (room, message, specialties) — adding the model is
semantically correct: per-model cache splits naturally.
`cognition/shared_analysis/mod.rs::run_analysis`:
Threads `input.model_override` into TextGenerationRequest.model.
Comments name the doctrine and the Joel directive.
`persona/response.rs::respond_inner`:
Passes `input.model.clone()` as `model_override` when building
AnalysisInput. The responding persona's own model becomes the
analyzer's model. On single-persona substrates this IS the analysis
model; on multi-persona rooms the first-flight populates the cache
and the rest hit-as-cache regardless of override.
`persona/response.rs::run_render`:
`select(Some("local"), Some(&input.model), InferenceDevice::Auto)`
— was `Gpu`. The cognition layer has no opinion on device class.
`cognition/generate_response.rs::evaluate_response`,
`cognition/should_respond.rs::evaluate_gating`,
`cognition/validate_response.rs::evaluate_validation`,
`cognition/tool_embedding.rs` (2 sites):
Same Gpu→Auto flip. All cognition-layer registry lookups now
trust the model identifier as the routing axis and let the
registered adapter declare its own device class.
`modules/ai_provider.rs::generate_text`:
Convenience helper used by `analyze()` (and other internal
callers): Gpu→Auto. Same doctrine.
## Doctrine
- [[intent-driven-api-not-hot-patches]]: the cognition layer's
intent is "given this room state, produce an analysis / render /
validation." Device class is not in the intent.
- "It's up to the model" (Joel 2026-06-03): the persona's profile
is the source of truth for what's loaded; the cognition layer
asks the persona, not a global default.
- [[no-fallbacks-ever]]: the analyzer's NEW failure mode (model
override names something not registered) is still a typed error
out of analyze(); no silent substitution.
## What this unblocks
The first probe-to-reply round trip through the canonical brain on
Intel Mac CPU. Paige is talking. The rest of the elegance
purification Joel called out (the multi-modal `TurnInput`, the
ToolExecutor wiring into the cycle, the cross-channel post path)
can proceed against a substrate where the brain ACTUALLY ran.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ms, recall feeds the cycle, single-persona analyze fast-path (#164) Per Joel 2026-06-03 ("Let's make this fast and intelligent, even with the dumber llms... Even the dumb models will have recall better than you"): close the wire from service_loop into the L1-L5 substrate so Paige's Qwen 0.5B actually does the thing the substrate was built for — accumulate memory, recall it, act on it. Live validation on Intel Mac CPU: Turn 1 (probe #5): admitted incoming → L2 store recalled_count=0 ← first turn this boot, nothing prior engram_count=1 ← L2 store now holds 1 engram turn_duration_ms=36940 (was 138993 pre-optimization → ~73% faster) Turn 2 (probe #6): admitted incoming → L2 store recalled_count=1 ← probe #5 is now in recall engram_count=2 ← L2 store grows turn_duration_ms=34875 (stable) Paige's reply to probe #6 ("can you reference what was unique about my previous message?"): "I remember your previous message as unique, and it should now be in your L2 engram store." She's acknowledging the recall and the substrate's framing — Qwen 0.5B conditioned on its own engram store, reaching back into a prior turn. The wire is closed; the brain is alive. ## What changed `persona/service_loop.rs::serve_persona_loop_inner`: 1. Pre-respond admission. Build an InboxMessage from the IncomingMessage's (peer_id, text, lamport) projection, call `cognition.admission.admit(&msg, None)`. Trust + dedup gates run inside admit; failures are logged but non-fatal — the cognition turn can still execute on the live airc window even if the engram doesn't form (per [[no-fallbacks-ever]] the failure is visible, not silent). 2. Pre-respond recall. `cognition.admission.recall_recent(8)` produces `Vec<Engram>`; mapped to `Vec<String>` of content for `RespondInput.recalled_engrams`. Recall happens BEFORE admit so this turn's recalled set is "what I knew going in" — the current message is the trigger, not part of recall. 3. Per-turn introspection. New INFO log per turn: "admitted incoming → L2 store" with lamport, recalled_count, engram_count. Per Joel 2026-06-03 ("we need to introspect all rag. see what is going on at every step") — the running cycle's L2 state is now visible without standing up rag-inspect ad hoc. 4. Removed the placeholder `recalled_engrams = Vec::new()` and replaced it with the recall-driven vec. `cognition/shared_analysis/mod.rs::analyze`: Fast path. When `input.known_specialties.len() <= 1`, return a stub `SharedAnalysis` immediately — no inference. Shared analysis exists FOR orchestration across specialties; when there's only one specialty (single-persona substrate, or a private 1:1 turn), there is nothing to orchestrate, the suggested_angles map is correctly empty, and the LLM call is pure waste. Per [[intent-driven-api-not-hot-patches]]: the substrate doesn't pay an inference cost when the answer is structurally already known. Multi-persona rooms still go through the real inference path — the orchestrator needs the model's concept extraction to score specialties. Latency: this is the ~100s saving observed live above. From 138s/turn to 35s/turn on Qwen 0.5B Intel Mac CPU. ## Doctrine - [[source-drain-is-the-universal-pattern]]: admission IS the drain on the live-message source. Without admit, the substrate is the source-only half of the pair — chatbot, not brain. - [[init-once-handle-then-lease-zero-copy-refs]]: admit/recall hold the brain mutex briefly; inference runs unheld. Same pattern as compose_for_turn. - [[observability-is-half-the-architecture]]: the L2 state is now part of the per-turn log. Operators can see the engram store grow turn-over-turn without spelunking. - [[no-fallbacks-ever]]: admit failures surface; they don't silently swap to a default. ## Known follow-ups (named so they don't get forgotten) - Task #101: AdmissionState SQLite persistence — currently in-memory; Paige's L2 store resets at boot. Persisting it under ~/.continuum/personas/<name>/ closes the continual-learning loop across substrate restarts (the "Maya remembers things from three months ago" test the substrate is building toward). - Algorithm 4 recall ordering — recall_recent(8) is recency only; the RecallMetadata salience × structural × recency scoring is in tree but not yet driving the recall path. - Single-persona analyze short-circuit could be tightened: if known_specialties is exactly [persona's own specialty], skip; if known_specialties >= 2 but only one persona is actually responding this turn (sleep_mode etc filtered the rest), the orchestration is also empty in practice but the current code still pays the inference. Future slice once room-roster tracking is wired. Closes #164. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…m introspection (task #165) Replaces service_loop's `recall_recent(8)` (newest-first by admission) with `recall_scored(now_ms, 8)` (salience × recency-decay, top-N), and emits a per-engram INFO log at the L2 → prompt seam so the cycle's recall behavior is observable, not opaque. The substrate's continual-learning property compounds through this scoring: salient + protected + recently-used engrams stay near the top; novel ones get their protection window; everything else drains toward SALIENCE_FLOOR but doesn't disappear ([[source-drain-is-the-universal- pattern]] applied at the recall layer). record_recall_hit closes the use-it-keeps-it feedback loop — without it, scoring is one-way and memory only ever decays. PR #91 (RecallMetadata sidecar) + #92 (decay tick) provided the scoring infrastructure; this slice composes them on the read path. Per Joel's 2026-06-03 "introspect all rag" directive + [[observability-is-half-the-architecture]]: every recall now emits one INFO line per delivered engram (rank, engram_id prefix, salience, content preview). Optimization can target actual scoring behavior, not guesses. Three new admission_state tests pin the contract: - recall_scored_ranks_by_salience_desc — pinned > uplifted > untouched - recall_scored_records_recall_hit_on_returned_engrams — Hebbian loop - recall_scored_respects_limit_and_empty — boundary cases Also catches up the AnalysisInput test fixtures with the model_override field added in commit 9c8a991 (4 sites in shared_analysis/mod.rs + prompt.rs). The production caller (persona/response.rs) was already updated; only the test scaffolds were behind. 19/19 admission_state tests green on Intel Mac CPU build: cargo test --lib --no-default-features \ --features 'livekit-webrtc llama/mac-cpu-only' admission_state Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ing, no SQL) Engram persistence belongs in the ORM, not raw rusqlite. This slice declares the schema-side architectural commitment — `impl OrmEntity for Engram` with COLLECTION="engrams" and the field shape — matching the RoleTemplate slice 1 pattern of #123. Per Joel's standing rule ([[no-sql-everything-through-orm-entities]] + [[orm-everything-not-hand-edited-files]]): all data through ORM + base entities, no raw SQL anywhere in module code. Storage shape: - BaseEntity columns (id, createdAt, updatedAt, version) first, so the ORM's machinery (indexes, vector index, exports, round-trip-to-JSON) treats engrams uniformly with every other registered entity. - Domain columns: `kind` + `trustStateAtAdmission` flat strings (indexed — common filter targets); `content` flat string (FTS later); `origin` + `recallKeys` JSON columns for the variant/array sub-trees; `admittedAtMs` indexed (primary recall sort + recency tiebreak for Algorithm 4 scoring); `admissionTraceId` nullable string (forensic join target only). What this slice does NOT ship (deliberately): - No raw SQL anywhere. The ORM owns the backend. - No save/load wire-up. That depends on the entity↔record adapter landing as part of #123 (which is currently in_progress on the ORM entity family for hw_tiers / role_templates / identity pools). Same shape as the existing RoleTemplate impl: schema commitment now, wire-up when the adapter lands. - No RecallMetadata schema. RecallMetadata doesn't yet have serde derives; committing a schema before its wire shape is locked would create drift. RecallMetadata's OrmEntity impl rides with the wire-up slice that adds the derives. Two new tests pin the contract: - engram_orm_schema_has_base_columns_and_domain_fields — every BaseEntity + every domain field is present. Catches the case where someone adds a field to Engram and forgets to extend the schema; the wire-up's round-trip would silently lose that data. - engram_registers_and_resolves_through_orm_registry — boot-path smoke: register cleanly, resolve, same field count round-trip. Per [[command-system-architecture]]: commands stay the mutation path; raw SQL is invisible to that surface. When the wire-up lands, admit() / record_recall_hit / apply_decay become commands (or direct ORM-adapter calls through the typed entity surface), never hand-rolled INSERT statements. Engram persistence (#101) reset to schema-only this turn. The rusqlite-based draft (committed nowhere) was deleted in the same working state. Slice B (wire-up) blocked on #123's entity↔record adapter. 6/6 engram tests green on Intel Mac CPU build: cargo test --lib --no-default-features \ --features 'livekit-webrtc llama/mac-cpu-only' \ 'persona::engram::tests::engram_' Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…y columns The compression move per CLAUDE.md's E=mc² doctrine: ONE generic helper so every `T: OrmEntity + Serialize + DeserializeOwned` gets save / find_all / find_by_id / update / delete for free. Add `impl OrmEntity for FooEntity` → `OrmStore::<FooEntity>::new(adapter).await?.save(...)` works immediately. No per-entity FooStore + FooMigration + FooSerializer trio. The store IS the abstraction. This unblocks #101 slice B (engram persistence wire-up) and every other OrmEntity in #123's family (RoleTemplate, HwTierDescriptor, future identity pools). Per [[no-sql-everything-through-orm-entities]] + [[orm-everything-not-hand-edited-files]]: module code reaches storage through this typed surface, never through raw rusqlite or hand-rolled DataRecord juggling. ### Adapter dedup fix (real bug surfaced by this slice) Both SqliteAdapter and PostgresAdapter hardcoded `id / created_at / updated_at / version` at the top of their CREATE TABLE columns, then ALSO iterated `schema.fields`. When schemas authored via `base_entity_fields()` (the documented contract for declaring those columns) reached the adapter, CREATE TABLE crashed on the duplicate column name. The bug had never fired because no OrmEntity had ever round-tripped through SQLite — schema declarations existed, the wire-up didn't. The first OrmStore<T>::new(adapter) call hit it. Fix: new `is_base_entity_column(snake_case)` helper in `orm::entity`; both adapters skip schema.fields whose snake_case name matches. This preserves the documented contract (schemas declare BaseEntity columns via `base_entity_fields()`) without forcing entities to know each backend's CREATE TABLE layout. Single source of truth lives at the adapter level, schemas declare intent. ### Tests (6/6 green; 78/78 ORM family still green) - save_then_find_by_id_round_trips_every_field - find_by_id_returns_none_for_missing_id (clean Option semantics, not the adapter's "Record not found" error-string convention) - find_all_returns_every_saved_row (rehydrate-from-disk foundation) - update_then_find_by_id_returns_new_payload - delete_removes_row_and_signals_idempotently - collection_returns_entity_collection_constant Round-trip exercises real SQLite (not :memory:; tempdir-backed per test so parallel cargo tests don't share state through the shared-cache alias). The TinyEntity test fixture stays in the store module — it tests the typed-store machinery without dragging Engram's full shape into orm::store. Engram round-trip lives with its OrmEntity impl in persona::engram. ### Identity discipline Callers supply the entity's id explicitly on save/update. Deliberate: substrate entities have domain-natural UUIDs (Engram.id at admission, RoleTemplate uses BaseEntity id, etc.) and the caller is the one who knows what it is. The id flows into DataRecord.id (the row primary key); the serialized form may also carry an `id` field — caller keeps them consistent. Drifting them would point to a deeper bug than the store can repair. ### What this does NOT yet ship - No query DSL surface. `find_all` returns every row; callers needing filters drop down to QueryBuilder + adapter.query() for now. `find(filter)` wraps when use sites are clear. - No batch ops. Single-entity surface is what the first wave needs. - No transaction surface. Adapter trait doesn't expose transactions yet; that's substrate-wide. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…exes, BaseEntity composition (tasks #166 + #167) The Rust analogue of TS class decorators for the substrate's ORM. Write the struct once, get the schema, the typed store, and (next slice) the TS bindings for free. Define-once-in-Rust-generate- everywhere doctrine made structural, not aspirational. Per Joel 2026-06-03: "Entities need to be defined, and in one place, rust (we are headless), then generated for all places, easy to do this elegantly like we did with decorators in ts, but for rust" and "If you're building ORM anyway, do it right for long term ... let's you optimize with index" and "provide a relational db this time." ### New crate: continuum-orm-derive `#[derive(Entity)]` with `#[entity(...)]` field + struct attributes. Walks the struct, infers `FieldType` from the Rust type, honors attribute overrides, emits `impl OrmEntity for #name` automatically. The 100-line hand-written `collection_schema()` block collapses to per-field annotations. **Type inference:** - `String` / `&str` → `String` - `Uuid` (by type-name match) → `Uuid` - `bool` → `Boolean` - All integer + float types → `Number` - `Vec<_>` / `HashMap` / `BTreeMap` / `HashSet` → `Json` - `Option<T>` → inner T's type + `nullable = true` - Enum or other named struct → `Json` (override via `#[entity(json)]`) **Field attributes:** - `#[entity(indexed)]` / `#[entity(unique)]` / `#[entity(nullable)]` - `#[entity(json)]` — force JSON column - `#[entity(skip)]` — exclude from schema (pair with `#[serde(skip)]` if you also want it out of the wire payload) - `#[entity(foreign_key("collection.field"[, on_delete = "..."][, on_update = "..."]))]` — declares a real FK. Cascade keywords: `"restrict" | "cascade" | "set_null" | "no_action"`. **Struct attributes:** - `#[entity(collection = "name")]` — REQUIRED - `#[entity(index(name = "...", fields = [...], unique = ...))]` — composite index; repeat for multiple **BaseEntity composition:** TS-decorator analogue done via Rust idiom: ```rust #[derive(Entity)] #[entity(collection = "engrams")] pub struct Engram { #[serde(flatten)] pub base: BaseEntity, // recognized by type name; expands to base_entity_fields() pub content: String, ... } ``` The derive detects the embedded `BaseEntity` and adds its columns to the schema via `base_entity_fields()` rather than treating it as one big JSON blob. ### Relational schema — FKs are first-class - `SchemaField.foreign_key: Option<ForeignKeyRef>` — new typed FK reference carrying `(collection, field, on_delete, on_update)`. - `CascadeRule` enum with `Restrict / Cascade / SetNull / NoAction`. - Both `SqliteAdapter` and `PostgresAdapter` emit `FOREIGN KEY (...) REFERENCES ...(...) ON DELETE ... ON UPDATE ...` in `CREATE TABLE`. - `PRAGMA foreign_keys=ON` set per SQLite connection so the constraint is actually enforced (sqlite parses but doesn't enforce by default). - Composite indexes already supported via `SchemaIndex`; the derive now feeds them automatically. ### Bug fixes surfaced by the new tests 1. **SQLite Number affinity → integers were coerced to floats.** Existing tests passed because nothing deserialized to `i32`/`i64`. The derive's test entity uses `delta: i32`; SQLite REAL affinity stored `-7` then returned `-7.0`, failing deserialize. Changed `FieldType::Number` to NUMERIC affinity — preserves integers as integers, floats as floats. 2. **Self-alias in lib.rs.** Macro emits `::continuum_core::orm::*` absolute paths; inside the home crate those resolved nowhere. Added `extern crate self as continuum_core;` so home-crate code resolves the derive's emitted paths. Standard proc-macro pattern. ### 9/9 derive tests + 89/89 ORM tests green - `collection_constant_matches_struct_attribute` - `schema_has_base_columns_plus_domain_fields_minus_skipped` - `field_types_inferred_correctly` - `indexed_and_unique_attributes_propagate` - `option_translates_to_nullable` - `round_trip_through_orm_store` — end-to-end save/find/find_all through real SQLite using the derived schema - `composite_index_attributes_propagate` - `foreign_key_attribute_populates_schema_field` - `foreign_key_cascade_deletes_children_via_db_enforcement` — parent + child entities, child references parent via FK, deleting parent CASCADE-wipes the child row at the DB layer (not via application cleanup). The proof point that the ORM is now genuinely relational. ### Existing SchemaField construction sites (64 across the tree) Scripted addition of `foreign_key: None` to every existing literal so the field's required-by-Rust-literal rule doesn't break callers. No behavior change — existing entities don't have FKs yet. ### Not yet shipped (follow-up #168) - Engram + RecallMetadata migration to `#[derive(Entity)]`. Engram's hand-written `impl OrmEntity` block stays for now; #168 deletes it and adds RecallMetadata's FK-linked sidecar. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…truct (task #168 slice A) The 100-line hand-written `impl OrmEntity for Engram` block is gone. Replaced by per-field `#[entity(...)]` annotations. Schema drift between the Rust struct and the persistence layer is now structurally impossible — change a field, the schema follows. ```rust #[derive(Debug, Clone, Serialize, Deserialize, TS, Entity)] #[serde(rename_all = "camelCase")] #[ts(export, export_to = "...")] #[entity(collection = "engrams")] pub struct Engram { #[entity(primary_key)] pub id: Uuid, #[entity(indexed)] pub kind: EngramKind, pub content: String, #[entity(json)] pub origin: EngramOrigin, #[entity(json)] pub recall_keys: Vec<String>, #[entity(indexed)] pub admitted_at_ms: u64, #[entity(indexed)] pub trust_state_at_admission: TrustState, pub admission_trace_id: Option<String>, } ``` That's the entire schema. Every column declaration the hand-written block had lives in those attributes. Reading the struct gives the schema; the derive emits the impl. ### `#[entity(primary_key)]` — the "no embedded BaseEntity" form The derive previously recognized `#[serde(flatten)] base: BaseEntity` as the BaseEntity-composition pattern. Engram doesn't use that — Engram.id IS the BaseEntity.id directly. New `#[entity(primary_key)]` field attribute marks "this Uuid IS the BaseEntity id": pulls in `base_entity_fields()` (giving id + createdAt + updatedAt + version to the schema) and skips emitting this field separately (so the schema doesn't get a duplicate `id` column). Mutually exclusive with the embedded-BaseEntity form. ### `#[serde(rename_all = "camelCase")]` added Required for the round-trip through OrmStore to work. The adapter auto-translates DB column names (`admitted_at_ms`) to camelCase (`admittedAtMs`) when reconstructing the JSON payload — Engram must deserialize from camelCase keys. This matches the rest of the substrate's wire convention. ### Engram now persists end-to-end New test `engram_round_trips_through_orm_store_with_derived_schema` proves the full chain works: Engram → serde → OrmStore::save → SqliteAdapter → SQLite → read → SqliteAdapter → OrmStore::find_by_id → serde → Engram Original engram in, identical engram out. EngramOrigin variant rides intact through the JSON column. trust_state_at_admission survives. recall_keys round-trips. admission_trace_id (nullable) round-trips. This is the substrate's most load-bearing entity now committed to real durable persistence — when the AdmissionState wire-up lands in the next slice, engrams will survive process restart for the first time in the substrate's history. ### Test results - 32/32 engram tests green (including the new round-trip) - 185 ORM-touching tests green - 18/18 admission_state tests green (recall_scored + Algorithm 4 still working) ### What this slice does NOT yet ship - RecallMetadata's OrmEntity derive (needs Serialize/Deserialize derives added first — non-trivial because it's currently a Copy struct used in hot-path DashMap) - AdmissionState wire-up to actually call store.save() on admit() + store.load_all() at boot Both follow in a sibling slice. This slice's value is the proof that the derive macro works on production code — Engram is the test case, and the migration deletes more code than it adds while making the schema-struct contract structurally tight. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…#168 slice B) The substrate's first relational entity pair. Engram parent + EngramRecallMetadata child, FK declared on engram_id with ON DELETE CASCADE enforced at the DB layer. When an engram is deleted, its recall metadata row goes with it — by referential integrity, not by application cleanup code. Per [[no-fallbacks-ever]] extended to relational invariants: the cascade IS the invariant. The 1:1 engram↔metadata relationship is enforced by UNIQUE on engram_id. Drift between application code and schema invariants is structurally impossible. ### Two-type architecture `RecallMetadata` (existing) — hot-path Copy struct, keyed by engram_id in a `DashMap`, lock-free reads, no engram_id field. Stays unchanged. Added `Serialize + Deserialize` derives only (non-breaking). `EngramRecallMetadata` (new) — persistence-side type carrying the FK + the metadata fields. Embeds BaseEntity via `#[serde(flatten)]` (the TS-decorator-analogue pattern). Implements `#[derive(Entity)]` with the foreign_key attribute. ```rust #[derive(Debug, Clone, Serialize, Deserialize, Entity)] #[serde(rename_all = "camelCase")] #[entity(collection = "engram_recall_metadata")] pub struct EngramRecallMetadata { #[serde(flatten)] pub base: BaseEntity, #[entity(unique, indexed, foreign_key("engrams.id", on_delete = "cascade"))] pub engram_id: Uuid, #[entity(indexed)] pub salience: f32, pub access_count: u32, pub last_accessed_ms: u64, pub protected_until_ms: u64, pub last_decayed_ms: u64, } ``` `From` impls bridge the hot-path↔persistence boundary: - `EngramRecallMetadata::for_new_row(engram_id, metadata)` — lift a hot-path pair into a persistable row with fresh BaseEntity. - `(Uuid, RecallMetadata)::from(row)` — drop the persistence wrapper, give back the in-memory pair. Used at boot rehydration. ### 4/4 tests green - `engram_recall_metadata_lifts_and_lowers_losslessly` — round-trip conversion preserves every field. The boundary is the most likely drift point; this test pins it. - `engram_recall_metadata_schema_has_expected_columns` — derived schema carries BaseEntity columns + every domain field. - `engram_recall_metadata_carries_fk_to_engrams_with_cascade` — engramId field has FK to engrams.id, ON DELETE CASCADE, UNIQUE+indexed. Any future regression to these screams here. - `engram_recall_metadata_cascade_deletes_with_engram` — end-to-end relational round-trip. Engram parent + metadata child both persist through real SQLite. Delete parent → child row is gone, enforced by the DB. ### Architecture documentation New canonical doc `docs/architecture/ENTITY-DERIVE-ARCHITECTURE.md` captures the Rust-first / generated-everywhere thesis, the derive macro shape, the relational schema features, the portability story (JSON/CBOR/YAML/TS bindings all fall out for free), the typed OrmStore<T> rail, the latent bugs found in adapters along the way, and the concrete migration status. Supersedes the stale ORM-PHASE-2-DESIGN.md (which assumed TS-decorators as canonical). ### What this slice does NOT yet ship - AdmissionState wire-up: `admit()` writes through to the stores; `apply_decay` + `record_recall_hit` flush metadata; `load_at_boot()` rehydrates Vec + DashMap from disk. Sibling slice — the entity types are now ready for it. The substrate's persistence layer is now genuinely relational. Engrams + their recall metadata are linked at the schema level, not by convention. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… to ORM persistence (closes #168 + #101) The substrate's continual-learning property now compounds across process boundaries. "Every pair-programming session starts amnesic" is structurally solved. ### What landed **`AdmissionPersistenceSink` trait** + three impls in `src/persona/admission_persistence.rs`: - `NoopSink` — default; preserves test + replay paths that don't involve disk - `OrmPersistenceSink` — production. Fire-and-forget writes through `OrmStore<Engram>` + `OrmStore<EngramRecallMetadata>` via `tokio::spawn`. AdmissionState's hot path stays sync; disk I/O happens on a background task. - `RecordingSink` — test-only, buffers observations in memory for assertion. Lives at the system level per `[[test-fixtures-are-system-primitives]]`. **`AdmissionPersistenceLoader` trait + `OrmLoader`** for boot rehydration — reads all engrams + metadata from disk in one async call. AdmissionState's new `new_rehydrated` constructor takes the loaded data and populates Vec + DashMap. **`AdmissionState::new_with_persistence`** — explicit sink injection at construction. `admit()` observes through the sink after the in-memory writes; `recall_scored()` observes metadata updates after each Hebbian rehearsal hit. ### The proof test `engrams_survive_process_restart_via_orm_persistence` — the substrate's first end-to-end persistence proof: 1. Build AdmissionState with OrmPersistenceSink + real SQLite 2. Admit two engrams 3. Drop the AdmissionState (simulates process exit) 4. Read engrams + metadata back via OrmLoader 5. Build fresh AdmissionState via `new_rehydrated` 6. Verify `recall_scored` sees the original engrams When this test passes, the substrate's Hebbian rehearsal loop crosses the restart boundary — every pair-programming session inherits what the persona learned in the previous lifetime. ### Why fire-and-forget for v1 Admit is sync (called from many places, breaking the signature would ripple widely). Disk writes are async. The bridge: each observe_admission/observe_metadata_update spawns a tokio task that does the write. Failure mode: under runtime shutdown, in-flight writes may not complete. The cost is bounded (the few engrams admitted in the brief window between admit and disk-write). A future `BatchingSink` with explicit drain-on-shutdown closes the remaining window. The trait is already there — just add another impl. ### Test results - 4/4 `admission_persistence` tests green (including `engrams_survive_process_restart_via_orm_persistence`, `orm_persistence_sink_writes_then_loader_reads_back`, RecordingSink + NoopSink coverage) - 21/21 `admission_state` tests green (3 new persistence wire-up tests + 18 existing recall + admit coverage) - 48/48 admission-family tests green overall ### What this closes - **#101** (Engram persistence — AdmissionState SQLite store under each persona's home). Done end-to-end. Per `[[no-sql-everything-through-orm-entities]]` the persistence layer is the ORM, never raw rusqlite in module code. - **#168** (Migrate Engram + RecallMetadata to derive macro with real FK). Both entities now derive-driven; EngramRecallMetadata FK-linked to Engram with ON DELETE CASCADE enforced by SQLite. ### Architecture arc this completes Today's six commits form one continuous slice: ``` 758fa1c feat(orm): #[derive(Entity)] + relational schema 7d5034b feat(persona): Engram migrates to #[derive(Entity)] f73cc4f feat(persona): EngramRecallMetadata — real FK sidecar [this] feat(persona): engrams survive process restart ``` The substrate now has: - Rust struct = single source of truth (struct → schema → TS bindings → JSON/CBOR export — all from one annotated struct) - Real relational entities (FK enforced by the DB, not application) - Production-grade persistence sink with fire-and-forget hot path - Boot rehydration that resumes the L2 + scoring state The headless-persona-over-airc bar (per [[headless-success-is-personas-talking-over-airc]]) gets meaningfully closer — Paige no longer forgets what she learned yesterday. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…stody slice 1 (task #169) Foundation slice of [ENTITY-CHAIN-OF-CUSTODY.md](../../docs/architecture/ENTITY-CHAIN-OF-CUSTODY.md). The prerequisite for everything else in the chain-of-custody arc — signing, Merkle linkage, airc-native entity envelopes, cross- continuum portability all need a per-citizen home dir to live in. ### What landed **`crate::persona::home::PersonaHome`** — typed surface for "where this persona's stuff lives." Resolves: ``` <continuum_root>/personas/<agent_name>/ airc/ ← airc keypair (owned by airc-lib) seed.json ← PersonaIdentityProvider's seed engrams.sqlite ← OrmStore<Engram> + OrmStore<EngramRecallMetadata> ``` `PersonaHome::engrams_db()`, `airc_dir()`, `seed_json()`, `ensure_exists()`. One home = one citizen's complete on-disk surface. **`AdmissionState::for_persona(home, recall_metadata) -> Self`** — the persona-scoped entry point. Opens the per-persona SQLite, wires up OrmStore<Engram> + OrmStore<EngramRecallMetadata>, builds the production `OrmPersistenceSink`, rehydrates the in-memory Vec + DashMap from disk, returns the configured state. One call replaces the half-dozen orchestration steps the production path used to need. ### Why this is the right foundation Per [[entity-chain-of-custody-vision]]: the substrate's identity primitive is the airc Ed25519 keypair, which lives under `<home>/airc/`. The signing key (slice 3) will derive from that keypair. The Merkle chain head (slice 4) caches in the same home. Per-collection databases (future) sit alongside engrams.sqlite. **Every layer of the chain-of-custody design hangs off the same PersonaHome.** Getting this typed seam right means the future slices compose cleanly without re-doing path plumbing. ### 6/6 tests green PersonaHome unit tests (4): - `home_resolves_under_personas_subdir` — root composes correctly - `sub_path_accessors_compose_off_root` — engrams_db, airc_dir, seed_json all share the same root - `ensure_exists_creates_and_is_idempotent` — bootstrap-safe - `different_personas_have_disjoint_homes` — first defense of per-citizen isolation AdmissionState::for_persona integration tests (2): - `for_persona_round_trips_admissions_via_per_persona_sqlite` — admit through Paige's home → drop → fresh AdmissionState from the same home → rehydrates her engrams via real SQLite - `for_persona_isolates_two_personas_at_the_storage_layer` — Paige's engrams stay in Paige's home; Niko's fresh AdmissionState sees zero engrams. The crucial per-citizen isolation invariant. 50/50 admission-family tests green overall. ### Architecture documentation [ENTITY-CHAIN-OF-CUSTODY.md](docs/architecture/ENTITY-CHAIN-OF-CUSTODY.md) captures the full six-slice arc: 1. **This slice** — per-citizen home-dir scoping 2. author_peer_id + content_hash on every entity write 3. Sign on save, verify on load (airc Ed25519 keypair) 4. Chain head cache + Merkle walk audit 5. Airc-native entity envelopes (entities flow over airc) 6. Cross-continuum portability (export chain, verify, import) Plus how this generalizes the forge-alloy proof-contract pattern to all entities, and how OAuth/webauthn later derive FROM the airc identity rather than replace it. ### Doctrines this enforces - [[orm-everything-not-hand-edited-files]] — all persistence through the ORM - [[entity-chain-of-custody-vision]] — the multi-slice arc - [[personas-are-citizens-airc-is-identity-provider]] — the airc keypair is the identity primitive that this home dir centers - [[continuums-are-multi-instance-personas-have-lives]] — the storage layout that "personas have lives" requires Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…1519 (task #170) Addresses Reviewer 1 (proc-macro correctness) BLOCK findings. The macro was silently mis-classifying several common Rust types as `Json`, and was tolerating attribute-key typos that would produce subtly wrong schemas. With the substrate's "schema = struct" commitment, those silent drifts were exactly what the macro was supposed to prevent. ### Type inference — now recursive + complete `infer_field_type` and `unwrap_option` were both single-step. Now: - **`unwrap_option` is recursive** — `Option<Option<T>>` collapses to nullable inner T. Previously the inner Option fell through to Json. - **`strip_transparent_wrapper`** peels `Box<T>`, `Arc<T>`, `Rc<T>`, `Cow<'_, T>` before inference. `Box<String>` is now String (was Json). `Arc<u64>` is now Number (was Json). - **`PathBuf` / `Path`** → String (was Json). Path types serdes as strings; the schema should match. - **`SystemTime` / `chrono::DateTime` / `NaiveDateTime` / `Date` / `NaiveDate`** → Date. The `FieldType::Date` variant existed and was never produced — the type-inference path never matched a timestamp type. Now it does. (Removed the `#[allow(dead_code)]` on `InferredFieldType::Date` since it's now reachable.) - **`u128` / `i128`** → Number. Original list was missing these. ### Attribute parsing — hard-fail on unknown keys The forward-compat "tolerate unknown keys" branches at both struct- level and field-level swallowed typos like `#[entity(collecton = "x")]` or `#[entity(indexd)]`, producing silently-wrong schemas. Replaced both with hard errors listing the known keys. The error points at the exact attribute span so the user fixes it instantly. ### `primary_key + foreign_key` on same field — now rejected The combination was silently accepted; the FK was dropped because `primary_key` codegens via `base_entity_fields()` and skips the field. Same drift the macro should prevent. Now produces: > `#[entity(primary_key)]` and `#[entity(foreign_key(...))]` are > mutually exclusive — primary_key implies the BaseEntity id > (unique by design). Declare the FK on a different field, or > drop primary_key if this is actually a relational pointer. ### `parse_foreign_key` tightened `"engrams.id.too.many.dots"` previously parsed with `split_once('.')` and produced `target_field = "id.too.many.dots"`, a string the SQL adapter would later choke on with a cryptic error. Now uses `split('.').collect::<Vec<_>>()` requiring exactly two non-empty alphanumeric+underscore halves; surfaces the bad input at the macro span with a clear message. ### Duplicate-BaseEntity error now names the prior source The error span was on the SECOND offending field with no mention of the FIRST. Now tracks the prior field's identifier and includes it in the message: `"duplicate BaseEntity source on `id` — already declared by `base`."` so the user immediately sees both halves. ### Tests — `TypeInferenceProbe` fixture added Schema-only test entity (no serde derives — the workspace's chrono doesn't have `serde` feature and SystemTime doesn't either; this fixture verifies macro inference, not round-trip serde). Six new tests: - `systemtime_infers_as_date` - `pathbuf_infers_as_string` - `box_and_arc_wrappers_peel_to_inner_type` - `double_option_collapses_to_nullable_inner_type` - `u128_infers_as_number` - `type_inference_probe_registers_cleanly` ### Test results - Derive tests: **15/15** green (up from 9) - ORM family: **193/193** green (up from 185) - Engram family: **61/61** green - Admission family: **50/50** green - RecallMetadata family: **20/20** green - Total this slice: **324 tests green** ### Slices remaining for #1519 - **Slice B (#171)** — persistence correctness: phantom-engram fix, upsert race fix, drop mem::forget in tests - **Slice C (#172)** — doctrine alignment: trim CHAIN-OF-CUSTODY doc to slice 1 + roadmap, soften forge-alloy claim, schema-evolution paragraph, mark BaseEntity pattern A as transitional, scope should-respond JSON to introspection-only Per [[agent-review-as-acceptable-approval]]: the same reviewers can re-verify after each slice; when all three flip to APPROVE the PR is canary-ready. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…1519 (task #171) Addresses Reviewer 2 (persistence + concurrency) BLOCK findings. Two real correctness bugs + one test-hygiene improvement. ### Bug 1: phantom-engram-without-metadata permanently invisible `OrmPersistenceSink::observe_admission` writes engram-then-metadata sequentially. If the engram save succeeds but the metadata save fails (crash mid-spawn, disk full mid-write, etc.), the engram on disk has no metadata row. The inline comment claimed "next decay tick will resurface this" — **false.** The decay tick walks `registry.engram_ids()` (the in-memory DashMap), NOT the engrams table. So: 1. Boot rehydration loads N engrams + N-1 metadata rows. 2. `recall_scored` filter_maps engrams whose registry entry is missing — the phantom never returns. 3. `record_recall_hit` is never called for the phantom. 4. The metadata row is never created. 5. The engram is permanently invisible to recall for the rest of the substrate's lifetime — even though it's right there on disk. **Fix:** at the end of `AdmissionState::new_rehydrated`, iterate loaded_engrams and call `recall_metadata.admit_with_defaults` for each. `or_insert_with` semantics mean loaded metadata wins over default; missing rows get the default seeded; every engram becomes recall-visible from boot. **Regression test:** `rehydrate_backfills_metadata_for_phantom_engrams` hands `new_rehydrated` a Vec with two engrams + a metadata Vec with only one entry, asserts both engrams appear in `recall_scored`, and asserts the loaded high-salience entry still wins ranking against the seeded default. ### Bug 2: UNIQUE-race in upsert_metadata_by_engram_id The first iteration's `upsert_metadata_by_engram_id` did `find_all().await?` + linear scan on every recall hit. Concurrent recall hits could BOTH find no existing row, BOTH decide to insert, SECOND save fails on the UNIQUE constraint on `engram_id`. The error was logged and swallowed; the NEWER salience value (the one that should win) silently lost. Hebbian rehearsal evaporated under any concurrent recall. **Fix:** `OrmPersistenceSink` now holds `row_id_by_engram: DashMap<Uuid, Uuid>`. `observe_admission` populates it BEFORE the spawn so concurrent metadata updates for the same engram_id find the cached row_id deterministically. `observe_metadata_update` looks up the cached row_id and does a targeted `update()` — no more race, no more table scan. `OrmLoader::load_with_row_ids` returns the (engram_id, row_id) pairs to prime the cache at boot via `prime_cache`. `AdmissionState::for_persona` wires the prime automatically. The find_all-per-update cost is also gone — was O(N) per recall hit, now O(1) DashMap lookup + targeted UPDATE. ### Test hygiene: drop `std::mem::forget(tmp)` Eight test sites used `mem::forget(tmp)` to leak the TempDir past test end. Reviewer-2 #5 flagged this — /tmp accumulates stale sqlite dbs over time. **Fix:** in helper functions that constructed the tempdir locally (`fresh_adapter` in `orm/store.rs` and `orm/derive_test.rs`), changed the return type to `(Arc<dyn StorageAdapter>, TempDir)` so the test scope owns the tempdir's lifetime. Drop at test-end cleans up the path. For inline tempdir creation in test bodies (`admission_persistence.rs`, `admission_state.rs`, `engram.rs`, `recall_metadata.rs`), removed the `mem::forget` line — `tmp` was already scoped to the test function, so Drop semantics now clean up. ### Test results - ORM family: **193/193** green (up from 186 before slice B; 7 had failed transiently when mem::forget was first removed before the return-tuple fix) - Admission family: **51/51** green (added `rehydrate_backfills_metadata_for_phantom_engrams`) - Engram family: **61/61** green - RecallMetadata family: **20/20** green - Slice B total: **325 tests** all green ### What still defers to Slice C - Reviewer 3's doctrine-alignment findings (should-respond JSON scope, chain-of-custody doc trim, forge-alloy claim softening, schema-evolution paragraph, BaseEntity Pattern A canonical mark) are doc + scope-clarification work, not correctness fixes. They land separately in Slice C. ### Known tradeoff — fire-and-forget durability remains The reviewer noted that under tokio runtime shutdown, fire-and- forget writes may not complete. That's still true; the design choice is documented inline. A future BatchingSink with drain-on-shutdown semantics closes the remaining window. This slice tightens the correctness invariants WITHIN the fire-and- forget model; full durability semantics is a separate slice. Per [[agent-review-as-acceptable-approval]]: when the reviewer verifies the regression test catches the phantom case and the cache eliminates the UNIQUE race, slice B flips to APPROVE. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…1519 (task #172) Addresses Reviewer 3 (architecture + doctrine) BLOCK findings. All seven items resolved via doc revisions; no code changes. ### Finding #1: Citizen-resolution gap (architecture doc commits to signing without naming the writer-identity resolution) **Fix:** ENTITY-CHAIN-OF-CUSTODY.md trimmed to slice 1 only. Signing + writer-identity surface moved to `docs/planning/ENTITY-CHAIN-OF-CUSTODY-ROADMAP.md` where slice 2 is explicitly an open-question section naming the three writer-identity candidates (per-store binding, WriterContext parameter, async-context local) and committing to (a) per-store binding as the default until the substrate decides otherwise. ### Finding #2: Claude can't hold an airc keypair — categorically weaker than the doc claimed **Fix:** Architecture doc replaces the "Claude has a keypair" framing with the "originating agent" vs "attesting citizen" distinction. Local citizens (personas, humans) are both — they hold a real keypair and sign their own writes. Remote AIs (Claude, openclaw, Hermes via API) split — the local adapter is the attesting citizen (holds keypair, signs attestation); the remote agent is the originator (recorded by name, not cryptographically attested). The planning doc names this as slice 2's entity-schema requirement (`author_peer_id` for attestor; `originating_agent: Option<String>` for the remote-AI case). ### Finding #3: Forge-alloy generalization rhetorical, not structural **Fix:** Both ENTITY-CHAIN-OF-CUSTODY.md and the planning doc downgrade the language from "shares the proof contract" to "shares the proof pattern" (artifact + verifiable lineage). Forge artifacts carry multi-party signatures + dependency refs + methodology citations; entities carry single-writer chains. They rhyme; they aren't the same contract. A future `trait ProofContract` (not yet written) might capture the shared shape — that's a slice-2+ design decision now explicitly named. ### Finding #4: Two BaseEntity patterns violate compression **Fix:** ENTITY-DERIVE-ARCHITECTURE.md now declares Pattern A (bare `Uuid id` + `#[entity(primary_key)]`) as **CANONICAL** — the default for new entities. Pattern B (embedded BaseEntity via `#[serde(flatten)]`) is marked **transitional** — allowed only when the entity has external callers that read `entity.base.*` directly. A follow-up audit ticket should walk Pattern B users and migrate any without legitimate access patterns. ### Finding #5: Schema evolution undiscussed → "swap serde_json for serde_cbor" portability claim breaks **Fix:** ENTITY-DERIVE-ARCHITECTURE.md adds a new §"Schema evolution" section explicitly naming: - What works today: additive `Option<T>` fields are forward-compatible. - Current gaps: enum variants and field renames break round-trip unless `#[serde(other)]` / `#[serde(rename)]` is consistently applied (it isn't). - Non-goals for v1: no auto-migration, no cross-version signature compat. - What needs designing before grid-flow: schema_version field, per-entity migration registry, catch-all enum variants, explicit canonical form definition. The portability claim is now scoped: "exports work between continuums on the same SHA" — a real substrate-internal win, not yet a cross-mesh promise. ### Finding #6: Cognition-pipeline contradiction (PR adds doc that forbids will_respond + lands the contract in same diff) **Reviewer 3 confirmed on re-review:** the load-bearing part is already resolved. `service_loop.rs::serve_persona_loop` (commit `9e5494d94`) drives the canonical respond() cycle and consumes `PersonaResponse::Spoke|Silent` — NOT `will_respond`. The `will_respond + response_text` JSON contract is fully contained in `persona/rag_inspect.rs::inspect_persona_rag_with_inference` and its ServiceModule wrapper. **Fix:** PERSONA-COGNITION-PIPELINE.md §4 gains an explicit carve-out paragraph for `inspect_persona_rag_with_inference`: the `_with_inference` variant is allowed to use the JSON shape because it's introspection (answering "would the persona respond to this RAG snapshot?"), forbidden from being called by `service_loop` or any production cognition path. The only legitimate callers are the rag-inspect ServiceModule + tests. Doc names this so future readers don't have to triangulate it from the import graph; a grep-test / `#[deny]` lint that fires if service_loop imports it would make the forbid structural (named as follow-up). ### Finding #7: Slice 1 + six-slice doc = commitment debt **Fix:** ENTITY-CHAIN-OF-CUSTODY.md trimmed from 174 lines to ~60 lines covering ONLY slice 1 (what IS). Slices 2–6 moved to `docs/planning/ENTITY-CHAIN-OF-CUSTODY-ROADMAP.md` with each slice's open questions explicitly named. The roadmap doc ends with: "This roadmap is NOT a commitment. The substrate can pursue these slices, defer them, or pivot. Per [[constitutional-design-always-a-next-step]]: name the open questions so the substrate has a path to a decision rather than a commitment debt." ### Reviewer 3's newly-discovered concern (PR scope sprawl) Reviewer flagged that the branch grew to 1240 files / +154k lines since the original review (includes VDD recorder, multi_persona stress baseline, inference-grpc, llama-cpp tests, etc). Strong recommendation to split. **This is for Joel to decide** — surfaced in the PR body update for visibility; not addressed in this commit. ### Reviewer 1 + 2 status Re-spawned reviewers verified slices A + B in parallel with slice C. Reviewer 1 confirmed findings 1, 5 addressed; findings 2, 4, 6 code-fixed but lack compile-fail tests (need `trybuild`); findings 3, 7, 8, 9 partial or unaddressed. Slice A2 would close those. Reviewer 2 verdict pending at commit time. ### Test impact Zero — slice C is documentation only. No code changes; no test runs needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…1519 (task #173) Addresses Reviewer 1's remaining BLOCK findings on PR #1519. Four real correctness/ergonomics fixes; trybuild harness deferred to focused follow-up (rationale in code comment). ### Finding #7: External-crate path resolution The macro previously emitted `::continuum_core::*` absolute paths. Any consumer that renames the dep in `Cargo.toml` (`continuum-core = { package = "continuum-core-alt" }`) or a re-exporting facade crate would get unresolved-path errors — because the absolute path bakes in the conventional name. The self-alias `extern crate self as continuum_core;` in continuum-core's `lib.rs` only fixed the in-crate case, not downstream consumers. **Fix:** added `proc-macro-crate` dep. New helper `resolve_continuum_core_path()` calls `crate_name("continuum-core")` at codegen time; returns `crate` when the consumer IS continuum-core itself, or `::<chosen-name>` otherwise. The Entity derive now emits paths prefixed with that resolved ident. `cascade_rule_tokens` was also updated to accept the resolved prefix as a parameter (free functions can't capture the local `core` token from `derive_entity`). ### Finding #8: Empty composite-index name `#[entity(index(name = "", fields = ["a"]))]` previously parsed cleanly and emitted `SchemaIndex { name: "".to_string(), … }`. The SQL adapter would later try `CREATE INDEX `` ON …` and fail with a cryptic error far from the macro span. **Fix:** `parse_composite_index` now rejects empty `name` with the exact-span error "composite index `name` must be non-empty". ### Finding #9: `to_camel_case` edge cases The function mishandled three cases: - `_leading_underscore` → produced `Leading…` (wrong; serde preserves leading underscores: `_field` stays `_field`) - `field__double` → produced `fieldDouble` but only by accident (the doubled `_` set capitalize_next twice; behavior was fragile) - `trailing_` → produced `trailing_` (wrong; serde drops the trailing rust-ident workaround: `type_` → wire `type`) **Fix:** rewritten with explicit handling: - Leading `_`s preserved (peekable loop consumes them literally before the main pass). - Internal `_`s set capitalize_next; doubled `_`s coalesce uniformly (no double-trigger). - Trailing `_` is dropped (final capitalize_next has no following char to consume). Inline unit tests pin the contract: - `to_camel_case_handles_edge_cases` covers all three cases plus the standard snake → camel translation. - `to_camel_case_preserves_existing_case` confirms non-underscore- preceded chars don't accidentally uppercase. ### Finding #3: Multi-span on duplicate-BaseEntity error The error span pointed only at the SECOND offending field. Already addressed in Slice A by tracking the prior field name in `saw_base: Option<String>` and including it in the error text. The reviewer wanted a multi-span (rustc highlights both fields) which requires `syn::Error::combine` — landing that is a small additional ergonomic win but not a correctness gap. Deferred with the trybuild harness slice. ### Reviewer 2 cosmetic: stale `mem::forget` comment Removed the lingering `// mem::forget so the path persists past test end` doc comment in `engram.rs:869-870` — the call itself was removed in Slice B; the comment was stale. ### Trybuild compile-fail tests (deferred follow-up) Findings #2 (primary_key + foreign_key conflict), #4 (unknown attribute keys), #6 (multi-dot FK targets) are correctly handled in code (Slice A) but lack a structural test harness. `trybuild` is the canonical tool. Deferred because the test files need `continuum-core` types in scope (Uuid, BaseEntity, etc.) — the harness belongs alongside the existing derive-test fixtures in `continuum-core/tests/compile_fail/`, not inside the proc-macro crate. Landing trybuild + the .stderr fixtures is a focused follow-up slice; documented in `Cargo.toml` so the next person picking it up has the rationale. ### Test results - ORM family: **193/193** green (unchanged from Slice B) - Admission family: **51/51** green - Engram family: **61/61** green - RecallMetadata family: **20/20** green - continuum-orm-derive inline tests: **2/2** green (new to_camel_case edge-case tests) - Slice A2 total: **327 tests** (325 prior + 2 new) Per [[agent-review-as-acceptable-approval]]: re-spawning Reviewer 1 should flip Slice A's verdict to APPROVE on findings 1, 5, 7, 8, 9. Findings 2, 4, 6 are correct-in-code with the trybuild deferral explicitly named. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
5 tasks
joelteply
added a commit
that referenced
this pull request
Jun 4, 2026
…y, ORM-backed (Slice 1 of #142) (#1521) Joel 2026-06-04 morning, in a sequence of escalations: - "The symmetry is important. Airc identities were supposed to be built into context for each persona each a different user." - "Each UNIQUE identity per persona, per you per me. Not shared." - "Yes airc is CORE LEVEL. this is the session etc." - "What differentiates each persons their own airc workspaces like you and codex is airc identity. This is like Android context and must be fixed." - "We agreed our base data type for anything storable would be rust base entity." The doctrine: airc identity IS the session abstraction. Every actor instance — persona, Claude Code session, Codex session, Joel terminal, jtag CLI invocation, web user — has its own UNIQUE airc identity (peer_id + keypair + home). Not shared. The substrate's universal handle is `Context` (Android-Context analogue): ubiquitous, mandatory, carries identity + services + captures. This commit lands the foundational data type: `Identity` as an ORM-backed entity, using the `#[derive(Entity)]` macro from #1519. Pattern A (canonical): `#[entity(primary_key)] id: Uuid` pulls in BaseEntity columns automatically. `id == airc peer_id` per [[persona-identity-derives-from-source-id]] — your airc cryptographic identity IS your substrate identity, not a separate continuum-side surrogate. ## What lands ### `continuum-core/src/identity/mod.rs` (new) - `IdentityKind` enum: Persona | Claude | Codex | Human | Jtag | Web. Every kind is a first-class substrate citizen per [[airc-is-the-session-not-a-feature]]; the tag lets downstream code branch when actor type matters. - `IdentitySource` enum: ResumedFromDisk | FreshlyMinted. Renamed from `PersonaIdentitySource` because the same enum now applies to every IdentityKind, not just Persona. - `Identity` struct: ORM entity carrying id (= peer_id), kind, agent_name, home_path, default_room, source. Foreign-keyable from every other entity that needs to record "which citizen did this." Derived via `#[derive(Entity)]`; schema IS the struct. ### `continuum-core/src/lib.rs` - `pub mod identity;` registered. ### `continuum-core/src/orm/store.rs` - Lifted `fresh_adapter` out of `#[cfg(test)] mod tests` to module-scope (still `#[cfg(test)]` gated, `pub(crate)`) so cross-module tests can lease the same fixture per [[test-fixtures-are-system-primitives]]. In-mod test callers rewritten to `super::fresh_adapter()`. ## Tests 8 identity tests pass: - `identity_schema_is_derived` — schema introspection: collection name, BaseEntity columns (`id`, `createdAt`, `updatedAt`), declared fields (camelCase via serde rename). - `identity_round_trips_through_orm` — save + find_by_id + find_all. Cross-kind: Persona + Claude rows persist, are decodable, can be manually filtered by kind. Foundation for query-by-room when the predicate-pushdown layer lands. - 3 ts-rs `export_bindings_*` tests for Identity / IdentityKind / IdentitySource — TS bindings generate cleanly. ORM family unchanged: 95 tests pass (the `fresh_adapter` lift doesn't regress anything). ## What this slice does NOT do (out of scope) - `Context` struct wrapping Identity + services + captures (Slice 2 of #142) - Bootstrap paths per IdentityKind — fresh Claude Code session minting its own Identity row + airc home; jtag CLI invocation minting ephemeral; etc. (Slice 3) - `&ctx` ubiquitous refactor across substrate APIs (Slice 4) - Migration of `PersonaInstanceInfo` callers to read from Identity table (Slice 1B, focused follow-up to keep this PR reviewable) ## Doctrine - [[airc-is-the-session-not-a-feature]] — Identity IS the session - [[no-sql-everything-through-orm-entities]] — entity, not JSON file - [[persona-identity-derives-from-source-id]] — peer_id IS the id - [[organization-purity-as-we-migrate]] — same enum across kinds - [[test-fixtures-are-system-primitives]] — fresh_adapter promoted Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Jun 7, 2026
Closed
Closed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per Joel 2026-06-02 ("113, use real LLMs. We can't know if we use fake algorithms. Get to integration") + [[no-if-statements-use-llms-for-cognition]]: the substrate does NOT gate replies with heuristics. The LLM decides
will_respondAND writesresponse_textatomically via grammar-constrained JSON. One LLM call per turn. No heuristic gate.Closes the greeting-loop root cause (task #153) via the substrate-correct path: route the decision THROUGH the LLM, not around it.
What changed
rag_inspect::run_inference_proberesponse_format: Some(ResponseFormat::JsonObject)— flows through to LlamaCpp's GBNF grammar (locked byjson_object_response_format_enables_json_grammarininference/llamacpp_adapter.rs). Sampler can ONLY emit valid JSON.parse_decide_and_respondstrictly parses{"will_respond": bool, "response": str}. Missing/wrong types → typedErrper [[no-fallbacks-ever]].ModelResponseInspectionwill_respond: bool. Substrate honors the persona's own decision; no override.service_loop::serve_persona_loop_innermr.will_respondBEFORE posting.false→turns_skipped. Emptyresponse_textwithwill_respond=true→ also skipped (structural inconsistency).HeuristicInferenceAdapter::build_response_textresponse_format = JsonObject, wraps the echo in{"will_respond":true,"response":"..."}so substrate plumbing validates end-to-end. Per Joel: "we can't know if we use fake algorithms" — this is test plumbing only.Doctrine
Risks for live integration
will_respond: true→ greeting-loop persists despite the change. Model-quality issue, not substrate.Test plan
cargo test --lib ... persona::→ 725/725 passnpm start, send message in continuum room, observe persona decisionsStacked on
PR #1518 (
feat/multi-persona-stress-baseline) → #1517 → #1516 → ...Closes #113.
🤖 Generated with Claude Code