fix(ui): coalesce save_rooms_to_delegate to cap open-time memory burst (#246)#311
Conversation
Addresses #246 — Ivvor reported River's tab climbing at 100-400 MB/s on page open / refresh, peaking 3-4 GB commonly and 16 GB on the worst incident, then "suddenly all freed" (renderer OOM-killed). Steady-state operation was stable; the burst is specific to the multi-room cold-open / refresh path. Root cause: `save_rooms_to_delegate` had no coalescing. Every per-room state notification and catch-up delta on cold open spawns its own `spawn_local(save_rooms_to_delegate())` (from `update_room_state_inner` and `apply_delta_inner`); each call cloned the full ROOMS map, ciborium- serialized it, allocated a fresh 710 KB delegate WASM blob, and bincode- wrapped the lot for the WebSocket. With N rooms and several events per room landing close together during startup, dozens of these MB-scale saves stacked up faster than the WebSocket drained them. WASM linear memory is grow-only, so each peak became permanent RSS until the renderer was killed. Fix - Extract a generic `coalesce_save` primitive that mirrors the shape `save_outbound_dms_to_delegate` already used (futures mutex + AtomicBool dirty flag). A chain of N rapid callers now produces at most 2 actual saves: the in-flight one, plus one catch-up that observes dirty re-set after it returned. Crucially the expensive `ROOMS.read().clone()` + serialize now lives inside the inner `do_save_rooms_to_delegate` and only runs when a save actually executes, not once per queued caller. - `save_rooms_to_delegate` becomes a thin wrapper that delegates to the coalesce primitive. Ride-along - Cache the chat delegate's `DelegateKey` in a `LazyLock` so the 710 KB WASM allocation + BLAKE3 hash only happens once per session instead of per `send_delegate_request` / `fire_load_*_request` call (three hot sites). The delegate code is `include_bytes!`-bundled and invariant for the session. Test coverage - Two new unit tests in `chat_delegate.rs` pin both bounds of the coalesce loop: `coalesce_save_drains_dirty_re_entry_into_one_extra_save` (re-entry mid-save → exactly two iterations) and `coalesce_save_runs_once_when_no_re_entry` (single iteration when nothing re-dirties). Use function-local statics so coalesce-state lifetimes are 'static (matching production) and each test owns fresh state. - Full `cargo test -p river-ui --bins` and `cargo check -p river-ui --target wasm32-unknown-unknown --features no-sync` both green (209/209 tests pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… refactor outbound DMs onto shared helper Round-2 fixes for #311's multi-model review. Three convergent findings (Codex P2 + skeptical-reviewer M1 + testing-reviewer Important #1) all flagged the same correctness bug in the first cut of `coalesce_save`: a queued caller whose own loop runs zero iterations (because another caller's catch-up save drained the dirty flag) returned a synthetic `Ok(())`, hiding the actual save's failure. The legacy-migration path at `response_handler.rs:739` gates `mark_legacy_migration_done()` on `Ok` — a false-Ok there permanently marks migration done while the data was never persisted, silently losing the migrated rooms. Changes - Add `last_result_store: &Mutex<Result<(), String>>` parameter to `coalesce_save`. Update it after every save iteration. When the loop runs zero iterations (queued-caller case), return the stored result rather than synthetic `Ok(())`. Add `ROOMS_SAVE_LAST_RESULT` and `OUTBOUND_DMS_SAVE_LAST_RESULT` statics for the two call sites. - Refactor `save_outbound_dms_to_delegate` to use the same `coalesce_save` helper (testing-reviewer Important #4 + big-picture confirmed it). Removes the hand-rolled duplicate loop and means the new fix to the queued-caller propagation invariant automatically applies to the outbound-DM path too. - Document the `do_save` snapshot-freshness contract on `coalesce_save` (skeptical M3) — snapshot must live inside the closure body, not via captured state, or the catch-up save writes a stale snapshot. - Document memory-ordering rationale (skeptical L1) — the `Release`/`AcqRel` ordering carries correctness on hypothetical multi-threaded targets; on wasm32 the orderings are no-ops and the mutex's task serialization carries correctness. - Update AGENTS.md (big-picture #2) — the outbound-DMs section was pre-existing-stale (still described `IN_FLIGHT`/`DIRTY` atomics per the original PR #259 shape, before PR #259's re-review migrated it to the `Mutex<()>` + `AtomicBool` form to close a TOCTOU window). Now correctly describes the shared `coalesce_save` helper with the `last_result_store` rationale. New tests - `coalesce_save_returns_err_from_last_iteration` — pins `last_result` contract: Ok followed by Err returns Err. - `coalesce_save_queued_caller_sees_real_failure_not_synthetic_ok` — uses `block_on(join(...))` so two coalesce_save futures race on a single-threaded executor; both must surface the save failure rather than the synthetic `Ok(())`. This is the regression test for the convergent-finding bug. - `cached_chat_delegate_key_matches_uncached_construction` (testing Important #3) — verifies `CHAT_DELEGATE_KEY` equals what the previous per-call `Delegate::from(...).key().clone()` construction would produce, so silent drift (parameters arg changed, WASM include path changed) can't silently misroute every delegate request. Test coverage: 4 coalesce_save tests + 1 cache test, all green. Full `cargo test -p river-ui --bins`: 212/212. WASM check green. cargo fmt clean. No delegate/contract WASM or migration TOML touched — the cache equality test pins that the cached key matches the historically-computed key, so users get the fix transparently on UI republish with no migration entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity
left a comment
There was a problem hiding this comment.
Comprehensive PR Review (Round 1)
Summary
- Type: fix(ui)
- Linked Issues: #246 (memory blowup), with #309 and #310 carved out as separate work
- Review tier: Full (concurrency / async surface)
- Reviewers run: code-first, testing, skeptical, big-picture + Codex external pass
- HEAD SHA reviewed:
5e25d868(now superseded by611779eawith round-2 fixes — see Verdict)
Code-First Analysis
Alignment: matches. The PR description, the #246 investigation comment, and the diff agree on the mechanism (uncoalesced save_rooms_to_delegate + per-call 710 KB delegate WASM clone) and the fix (extract the save_outbound_dms_to_delegate coalescing pattern; cache DelegateKey in LazyLock). Code-first reviewer approved.
Testing Assessment
Coverage Level: "minimum viable to pin the pattern" but undersized for the risk surface in round 1.
| Test Type | Status | Notes |
|---|---|---|
| Unit | Round 1 had 2 happy-path tests (1 save / 2 saves). Round 2 adds 3 more: last-iteration-Err propagation, queued-caller-sees-real-failure, and CHAT_DELEGATE_KEY equality |
|
| Integration | N/A | No integration harness for UI persistence; unit layer is the only realistic test surface |
| Simulation | N/A | UI-only change, no protocol/wire surface |
| E2E | N/A | Local-node smoke (open + room create + send) was clean |
Skeptical Findings
Risk Level: medium → reduced after round-2 fixes
| Concern | Severity | Location | Status |
|---|---|---|---|
Queued caller returns synthetic Ok(()) while catch-up save failed |
Must Fix | chat_delegate.rs:73-84 (round 1) |
Fixed in round 2 — coalesce_save now takes a last_result_store: &Mutex<Result<(), String>> and zero-iteration callers return the stored result, not synthetic Ok |
| Tests don't exercise actual concurrency | Should Fix | chat_delegate.rs:966-1021 (round 1) |
Fixed in round 2 — coalesce_save_queued_caller_sees_real_failure_not_synthetic_ok uses block_on(join(...)) to race two coalesce_save futures and verify both surface the error |
do_save snapshot-freshness contract is implicit |
Should Fix | chat_delegate.rs:60-68 |
Fixed in round 2 — explicit doc comment on coalesce_save |
| Memory-ordering comment doesn't note WASM single-threadedness | Consider | chat_delegate.rs:70-74 |
Fixed in round 2 — doc comment now spells out the wasm32 vs multi-threaded reasoning |
The Codex external pass independently flagged the same last_result bug (P2). Convergent findings from three independent reviewers gave the fix high confidence.
Big Picture Assessment
Goal Alignment: yes — matches both stated intent and the #246 investigation.
Anti-Patterns Detected: none.
Removed Code Concerns: none in round 1; round 2 deletes the hand-rolled outbound-DMs coalesce loop in favour of the shared helper, which strictly improves correctness (the new fix applies to both call sites).
Scope Assessment: focused. Side bugs (#309, #310) correctly carved out.
Documentation
- Code docs: complete (round 2 adds the snapshot-freshness contract and memory-ordering rationale).
- AGENTS.md: was pre-existing-stale (lines 296-300 still described the
IN_FLIGHT/DIRTYatomic pair that PR #259's re-review had already replaced). Round 2 updates it to describe the sharedcoalesce_savehelper with thelast_result_storerationale.
Recommendations
Must Fix (Blocking) — addressed in round 2
- ✅ Propagate save errors to queued callers in
coalesce_save. Convergent finding (Codex P2 + skeptical M1 + testing #1). Addressed bylast_result_storeparameter + zero-iteration fallback.
Should Fix (Important) — addressed in round 2
- ✅ Refactor
save_outbound_dms_to_delegateto use the newcoalesce_savehelper (testing #4 + big-picture). The hand-rolled duplicate would have rotted; this also makes the must-fix above apply to both paths. - ✅ Add error-path test coverage for
coalesce_save(testing #1). - ✅ Add equality test pinning
CHAT_DELEGATE_KEYto the un-cached construction (testing #3). - ✅ Document
do_savesnapshot-freshness contract on the helper (skeptical M3). - ✅ Update AGENTS.md outbound-DMs description (big-picture).
Consider (Suggestions)
- Optional: deduplicate the three identical "Reuse the session-cached delegate key" call-site comments into a one-liner pointer (big-picture polish, non-blocking).
create_chat_delegate_container(chat_delegate.rs:1880) is correctly left unmigrated toLazyLock— it returns a fullDelegateContainer(not a key), is called once at startup, and caching the wholeDelegatewould hold the 710 KB WASM bytes alive forever (skeptical L2 + big-picture both confirmed this is the right call).- Branch is 7 commits behind
origin/mainat round-1 SHA. None of the new commits touchchat_delegate.rs, so a clean rebase is expected. Done before merge.
Verdict
State: Needs Changes — Re-review Required After Fix
HEAD SHA reviewed: 5e25d868 (round 1)
Current PR HEAD: 611779ea (round 2 fixes pushed)
Round 1 surfaced one Must-Fix (the false-Ok queued-caller bug) and five Should-Fix findings. All are addressed in 611779ea, which adds ~190 lines: the last_result_store plumbing, the outbound-DMs refactor onto the shared helper, three new unit tests (including the block_on(join(...)) concurrency regression test for the must-fix), the docstring updates, and the AGENTS.md fix. Per the multi-model-review rule and the skill's own re-review policy, the full review must now be re-run against 611779ea before merge — that's the next step.
[AI-assisted - Claude]
…e statics, recover from mutex poisoning Round-3 fixes for #311's round-2 multi-model review. Three findings: 1. **Regression test passed for the wrong reason** (testing + skeptical round-2 reviewers, with skeptical empirically verifying against the round-1 buggy code). `coalesce_save_queued_caller_sees_real_failure_not_synthetic_ok` used `futures::pending!()` which returns Pending without arming a waker — `block_on(join(f1, f2))` polled f1 to completion before f2 even started, so f2 never took the zero-iteration branch. The round-1 (buggy) code passed the test, confirming zero regression coverage. Replaced with a local `YieldOnce` future that calls `cx.waker().wake_by_ref()` so the executor actually interleaves the two futures. Tightened the assertion from `c >= 1 && c <= 2` to `assert_eq!(c, 2)` so a future refactor that drops `last_result` propagation (f2 runs its own iteration → COUNT=3, or returns synthetic Ok via the initial local_result) is distinguishable. 2. **Bundle the three coalesce statics into a `CoalesceState` struct** (skeptical-reviewer M3). Each save path used to declare a `_MUTEX` + `_DIRTY` + `_LAST_RESULT` triple and pass them positionally to `coalesce_save`. A future caller adding a third save path that mixed up which triple it grabbed would compile and silently corrupt coalescing across paths. The new `CoalesceState::new()` is `const`-callable so it fits in a `static` — call sites now just pass `&ROOMS_SAVE_STATE` / `&OUTBOUND_DMS_SAVE_STATE` and the type system makes mismatch impossible. 3. **Recover from mutex poisoning instead of silently returning synthetic `Ok(())`** (skeptical-reviewer M2). On `panic = unwind` profiles (`cargo test`, dev builds — not release WASM which is `panic = abort`), `std::sync::Mutex` poisoning is reachable. The round-2 fallback was `.unwrap_or_else(|_| Ok(()))` — which is exactly the bug the round-2 fix existed to prevent. Both lock sites now `match` on `PoisonError::into_inner()` so we recover the inner value instead of synthesizing `Ok`. New test - `coalesce_save_running_caller_does_not_inherit_stale_last_result` (testing-reviewer "nice to have") — seeds `state.last_result` with a stale `Err`, runs a fresh non-queued caller, asserts the caller returns its own iteration's `Ok` rather than the stale store. Pins that `ran_any` actually gates the return value. Doc updates per round-2 reviewers - Memory-ordering comment now spells out that on the default `wasm32-unknown-unknown` target (no `+atomics`) LLVM lowers the Release/AcqRel orderings to plain loads/stores (skeptical L4). - `last_result` doc adds explicit over-pessimism note: a stale `Err` in the store is strictly better than a false `Ok` for the legacy-migration consumer (testing-reviewer Consider). - `CoalesceState` doc references PR #259's TOCTOU rationale so the history isn't lost when a future reader hits the struct cold. Test coverage: 5 coalesce_save tests + 1 cache-key test, all green. `cargo test -p river-ui --bins`: 213/213. WASM check green. `cargo fmt --check` clean. No delegate/contract WASM or migration TOML touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity
left a comment
There was a problem hiding this comment.
Comprehensive PR Review (Round 2)
Summary
- HEAD SHA reviewed:
611779ea(round 2). Now superseded byc18ddeb4(round 3) — see Verdict. - Review tier: Full (concurrency / async surface, second round)
- Reviewers run: code-first, testing, skeptical, big-picture + Codex external pass
- CI on
611779ea: all-pass (build, ui-playwright-tests, check-delegate-migration, check-room-contract-migration, license/cla). Re-verification needed onc18ddeb4per the per-commit-SHA rule.
Code-First Analysis
Verdict: Approve-with-suggestions (non-blocking polish only).
Implementation matches PR description; do_save snapshot contract honored at both call sites; tests pin the round-1 must-fix.
Testing Assessment
Verdict: Important — the regression test for the convergent-finding bug passed for the wrong reason.
coalesce_save_queued_caller_sees_real_failure_not_synthetic_ok used futures::pending!() after COUNT.fetch_add. Because pending!() returns Pending without arming a waker AND nothing else in f1's await chain returns Pending (uncontended futures::lock::Mutex::lock() is Ready immediately), block_on(join(f1, f2)) polled f1 to completion before f2 even started. f2 never took the zero-iteration branch and returned Err via its own iteration — making the test pass without exercising what it claimed to.
Convergent skeptical-reviewer finding M1 + Codex round-2 caught this empirically by running the round-1 (buggy) code against the test → still passed → confirmed zero regression coverage. Addressed in round 3 by replacing pending!() with a local YieldOnce future that calls cx.waker().wake_by_ref(), and tightening the assertion from c >= 1 && c <= 2 to assert_eq!(c, 2).
Round-2 also added the LazyLock<DelegateKey> equality test, the error-path test, and refactored outbound DMs onto the helper (all closed).
Skeptical Findings
| Concern | Severity | Status |
|---|---|---|
| Regression test passes for the wrong reason | Must Fix | Fixed in round 3 (YieldOnce + strict COUNT == 2) |
Poisoned-mutex fallback returns synthetic Ok(()) — re-introduces the bug in dev/test profiles |
Should Fix | Fixed in round 3 — both lock sites now recover via PoisonError::into_inner() |
Positional (mutex, dirty, last_result) triple is mismatch-prone for future call sites |
Should Fix | Fixed in round 3 — bundled into CoalesceState struct; const fn new() so it fits in a static |
Memory-ordering doc framing on wasm32 |
Low | Addressed in round 3 — comment now spells out the +atomics-feature distinction |
last_result over-pessimism note |
Consider | Addressed in round 3 — explicit doc that stale Err is strictly better than false Ok for the legacy-migration consumer |
Big Picture Assessment
Verdict: Clean. Confirmed: AGENTS.md update is accurate, do_save_outbound_dms_to_delegate body is byte-identical pre/post refactor, TOCTOU rationale preserved verbatim, scope still focused, no skill/doc stale.
Codex External Pass
No findings on 611779ea.
Recommendations
Must Fix (Blocking) — addressed in round 3
- ✅ Fix
coalesce_save_queued_caller_sees_real_failure_not_synthetic_okso it actually exercises the zero-iteration branch.
Should Fix — addressed in round 3
- ✅ Bundle the three coalesce statics into a
CoalesceStatestruct. - ✅ Recover from
Mutexpoisoning viaPoisonError::into_inner()at both lock sites. - ✅ Add
coalesce_save_running_caller_does_not_inherit_stale_last_resultto pin theran_anygate.
Consider — addressed in round 3
- ✅ Doc updates for memory ordering and over-pessimism.
Verdict
State: Needs Changes — Re-review Required After Fix
HEAD SHA reviewed: 611779ea (round 2)
Current PR HEAD: c18ddeb4 (round 3 fixes pushed)
Round 2 surfaced one Must-Fix (the regression-test-passes-for-wrong-reason finding, with empirical proof) and three Should-Fix findings (struct refactor, poison recovery, stale-carry-over test). All addressed in c18ddeb4 (+207/-103). Per the multi-model-review rule, the full review must re-run against c18ddeb4 before merge — that's the next step.
[AI-assisted - Claude]
…tale line refs Round-3 review (#311) flagged convergent cosmetic doc drift across big-picture, code-first, and skeptical reviewers. All non-blocking; addressed in one pass for cleanliness. - `CoalesceState::mutex` doc had a stale `chat_delegate.rs:1812-1822` cross-ref. Fold the TOCTOU rationale (single source of truth for the pattern's existence) into the doc itself so future readers don't have to chase line numbers. - `OUTBOUND_DMS_SAVE_STATE` had a 25-line doc block describing the hand-rolled pre-refactor implementation that no longer lives at that site. Replace with a one-paragraph note pointing at `CoalesceState::mutex` (where the TOCTOU rationale now lives) and cross-referencing `ROOMS_SAVE_STATE` as the other declared state. - `AGENTS.md` outbound-DMs section now names `CoalesceState` so a future agent / contributor can grep for it (big-picture + code-first both flagged). No code changes — pure docstring / AGENTS edits. Tests still 32/32 chat_delegate, 213/213 overall, fmt clean. Per the multi-model review skill's "trivial mechanical findings" allowance, a Light re-check (diff-of-the-diff) is sufficient before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity
left a comment
There was a problem hiding this comment.
Comprehensive PR Review (Round 3)
Summary
- HEAD SHA reviewed:
c18ddeb4(round 3). Then polished toc36161b2(round 3.1, doc-only). - Review tier: Full
- Reviewers run: code-first, testing, skeptical, big-picture + Codex external pass
Findings — zero blocking
| Reviewer | Result |
|---|---|
| Codex | Clean, no findings |
| Big-picture | Approve. AGENTS.md could name CoalesceState (optional). |
| Code-first | Approve-with-suggestions. 3 doc nits (stale line ref, AGENTS.md). Trace-verified the YieldOnce test interleaving end-to-end. |
| Skeptical | Clean. Empirically mutation-tested the regression pin: reverted state.last_result read to round-1 Ok(()) → test deterministically FAILS 5/5 with the expected message. Confirmed YieldOnce is correct via standalone repro. Two non-blocking polish suggestions. |
| Testing | Clean. Independently mutation-tested: same 5/5 deterministic failure on round-1 buggy code injection. YieldOnce mechanism + assert_eq!(COUNT, 2) together pin both failure modes (false-Ok and extra-save). |
Round-3 standings on the round-2 must/should-fix items, with both skeptical and testing reviewers having now empirically verified them:
- ✅ Regression test pins the bug (round-2 M1) — mutation-confirmed 5/5
- ✅
CoalesceStatestruct removes positional-mismatch class (round-2 M3) —Sync-verified viafutures::lock::Mutex<()>: Sync+ all field-typesSend - ✅ Poisoning recovery via
PoisonError::into_inner()(round-2 M2) - ✅ New
coalesce_save_running_caller_does_not_inherit_stale_last_resultpins the in-loop publish
Convergent doc-drift nits (addressed in c36161b2, doc-only follow-up)
Three reviewers (big-picture + code-first + skeptical) converged on cosmetic doc drift after the round-3 refactor:
CoalesceState::mutexdoc had a stalechat_delegate.rs:1812-1822line ref.OUTBOUND_DMS_SAVE_STATEcarried a 25-line doc block describing the pre-refactor hand-rolled implementation that no longer lives at that site.- AGENTS.md didn't name the new struct (future readers can't grep for it).
Addressed in c36161b2: +34/-44 lines, 100% docstring + AGENTS.md, zero .rs code changes. Per the multi-model review skill's "trivial mechanical findings" allowance, a Light re-check (diff-of-the-diff confirmation that the polish is truly cosmetic) is sufficient — that's done.
Non-blocking observations not addressed
- Testing nice-to-have:
coalesce_save_running_caller_does_not_inherit_stale_last_resultis slightly weaker than its name suggests — it catches "forgot to publish in-loop" but not "deleted theran_anygate". Deterministic multi-task tests for the precise scenario are hard to construct; the test still catches a real bug, just not the maximally-precise one. Deferred per testing reviewer's own recommendation. - Skeptical L2: a
debug_assert!defensive label tag onCoalesceStatewould catch a future caller declaring a third static and passing the wrong one with the wrong label. Mismatch is currently a misleading log line, not a real bug. Not worth the API noise unless a third call site appears.
CI Considerations
CI was green on c18ddeb4 (build, ui-playwright-tests, both migration checks, license/cla — all pass). New CI run pending on c36161b2; must re-verify on the new HEAD per the per-SHA rule.
Verdict
State: Ready to Merge (pending CI green on current HEAD c36161b2)
HEAD SHA reviewed: c18ddeb4 (round 3) + Light re-check on c36161b2 (round 3.1 doc polish)
Three rounds of multi-perspective + external-model review on this PR have converged with zero remaining blocking findings. Both skeptical and testing reviewers independently mutation-tested the regression pin and confirmed it distinguishes round-1 buggy code from round-3 fixed code 5/5 deterministically. The round-3.1 doc polish addressed every convergent cosmetic suggestion. Next step: confirm CI green on c36161b2, then merge.
[AI-assisted - Claude]
…m_state (#246) (#312) * perf(ui): drop dead-code parent_state clone in apply_delta/update_room_state (#246) Follow-up to PR #311. Ivvor reports memory spikes on cold open dropped from the 16GB-crashing pre-#311 level to 3-4GB-but-stable. Still big — this PR chips at the next per-event allocation source. `apply_delta_inner` and `update_room_state_inner` previously cloned the full `room_data.room_state` to pass as the `parent_state` argument to `apply_delta` / `merge`. That argument is dead-code at the macro level: - Every field's `summarize` / `delta` impl in `common/src/room_state/` takes `_parent_state` (underscore-prefixed = unused). - `freenet-scaffold-macro` 0.2.2 generates an `apply_delta` that ignores its outer `_parent_state` arg entirely; it computes a fresh `let self_clone = self.clone();` *per field* and uses that as the per-field parent. The caller-side full-state clone never reaches code that reads it. Each ingestion still does the macro-internal per-field clones (those ARE needed — fields like `MembersV1::apply_delta` read sibling fields via parent_state), but the redundant outer clone is gone. Saves one full-state clone per network delta / state update on the open-time hot path. Test coverage - New `merge_with_default_sentinel_parent_matches_merge_with_self_clone_parent` empirically pins the invariant: merging with `&default()` as parent_state produces byte-identical post-merge state to merging with `&self.clone()`. If a future refactor starts reading the outer `parent_state` (or wires it through to a field's summarize/delta), this test fails loudly and the optimization needs revisiting. - Full `cargo test -p river-ui --bins`: 242/242 pass (was 241; one new test). - WASM check green. Scope - UI-only change. Single file. No delegate/contract WASM modified, no migration entry needed. No riverctl republish needed. Honest sizing - This is a ~10% reduction per merge (the macro-internal per-field clones still dominate). It won't bring the 3-4GB residual spike down to MB; that would require fixing freenet-scaffold-macro to hoist its per-field self.clone() outside the field loop, which is a separate- crate PR. Worth landing this win on its own merits and keeping the bigger refactor as a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ui): strengthen merge-equivalence regression to discriminate macro forwarding Round-1 review of #312 (skeptical reviewer) flagged the `merge_with_default_sentinel_parent_matches_merge_with_self_clone_parent` test was tautological on near-default data — `state_a` only differed from `ChatRoomStateV1::default()` in `recent_messages.messages`, so a hypothetical macro regression that forwards outer `_parent_state` to field-level `apply_delta` would still see equivalent values across the two paths and the test would silently pass. Strengthen `state_a` to genuinely diverge from `default()` in fields real-world `apply_delta` impls actually read: - non-default `Configuration` (`max_members = 3`, `max_recent_messages = 5` vs the default 200 / 100) — `MembersV1::apply_delta`, `MessagesV1::apply_delta`, `MemberInfoV1::apply_delta` all read these. - a non-owner `AuthorizedUserBan` — `MembersV1::apply_delta` reads `parent_state.bans` for ban-sweep. Also tightened the doc comment on the `merge` call site at the top of `update_room_state_inner` to spell out that safety on the merge path depends on BOTH the macro's `apply_delta` ignoring outer `_parent_state` AND every field-level `summarize`/`delta` impl declaring it unused (via `_parent_state`). The previous comment just deferred to the `apply_delta_inner` rationale, which only proves one of the two facts. Tests: 242/242 pass. The strengthened test still passes (substitution is correct today), and it would now FAIL if any of: - a future `freenet-scaffold-macro` revision starts forwarding outer `_parent_state` to field `apply_delta` calls, - any field's `summarize`/`delta` starts reading its `parent_state` arg, - the call sites get switched back to `&self.clone()` for parent_state (the test exercises BOTH paths against the same expected end-state). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes #246.
Summary
Coalesces
save_rooms_to_delegateso a chain of N rapid callers produces at most 2 actual saves (the in-flight one + one catch-up). Also caches the chat delegate'sDelegateKeyin aLazyLockso the 710 KB WASM allocation and BLAKE3 hash only run once per session.Why
@Ivvvor reported River climbing at 100-400 MB/s on page open / refresh, peaking 3-4 GB commonly and 16 GB on the worst incident, then "suddenly all freed" (renderer OOM-killed). Steady-state operation was stable; the burst is specific to the multi-room cold-open / refresh path.
Trace: every per-room state notification and catch-up delta on cold open spawns its own
spawn_local(save_rooms_to_delegate())fromupdate_room_state_innerandapply_delta_inner. Each call cloned the full ROOMS map, ciborium-serialized it, allocated a fresh 710 KB delegate WASM blob, and bincode-wrapped it for the WebSocket. With N rooms and several events per room landing close together during startup, dozens of these MB-scale saves stacked up faster than the WebSocket drained them. WASM linear memory is grow-only, so each peak became permanent RSS until the renderer died.Full code-trace writeup with file:line refs lives at #246 (comment).
The fix
save_outbound_dms_to_delegatealready used the right pattern (futures::lock::Mutex<()>+AtomicBoolDIRTY flag) — seechat_delegate.rs:1499-1515on main. This PR extracts that pattern as a genericcoalesce_saveprimitive and wiressave_rooms_to_delegatethrough it. The expensiveROOMS.read().clone()+ ciborium serialize now lives insidedo_save_rooms_to_delegateand only runs when a save actually executes, not once per queued caller.The
LazyLock<DelegateKey>ride-along removes a ~14 MB block of pure-waste allocations on cold open (the same 710 KB WASM blob was being cloned and re-BLAKE3-hashed from three hot call sites:fire_load_rooms_request,fire_load_outbound_dms_request,send_delegate_request). The rarecreate_chat_delegate_containerpath (called once at startup) is left untouched.Test coverage
chat_delegate.rspin both bounds of the coalesce loop:coalesce_save_drains_dirty_re_entry_into_one_extra_save— re-entry mid-save produces exactly two iterations.coalesce_save_runs_once_when_no_re_entry— single iteration when nothing re-dirties.'static(matching production use) and each#[test]owns fresh state.cargo test -p river-ui --bins— 209/209 pass.cargo check -p river-ui --target wasm32-unknown-unknown --features no-sync— green.RefCellissues, WASM stable.Deploy notes
UI-only change. No delegate or contract WASM modified, so no migration entry needed and no riverctl republish needed. Single-file diff to
ui/src/components/app/chat_delegate.rs.Related
🤖 Generated with Claude Code