Skip to content

xmtp_mls: MlsGroup::create_external_invite (produce QR-invite payload + encrypted GroupInfo)#3673

Open
tylerhawkes wants to merge 31 commits into
tyler/qr-invite-l9-openmls-bumpfrom
tyler/qr-invite-l10-create-invite
Open

xmtp_mls: MlsGroup::create_external_invite (produce QR-invite payload + encrypted GroupInfo)#3673
tylerhawkes wants to merge 31 commits into
tyler/qr-invite-l9-openmls-bumpfrom
tyler/qr-invite-l10-create-invite

Conversation

@tylerhawkes
Copy link
Copy Markdown
Contributor

@tylerhawkes tylerhawkes commented May 21, 2026

Summary

Adds the invite-creation half of the QR-invite flow.

  • MlsGroup::create_external_invite(opts) -> CreateExternalInviteOutput
  • Computes group_id_hash = sha256(group_id()), generates fresh symmetric key + nonce, exports a current-epoch GroupInfo (with ratchet tree bundled), wraps it via wrap_group_info_with_nonce, and returns serialized ExternalInvitePayload + EncryptedGroupInfoBlob protos.
  • The envelope's epoch and group_state_hash fields are populated here (the caller has the MlsGroup; the pure-codec helpers in L-5 leave them zero).
  • Refuses to operate on pre-AppData groups — external-commit semantics depend on the AppData GROUP_MEMBERSHIP component existing.
  • Refuses to operate on DM groups (same constraint as set_external_commit_policy).

Stack

Test plan

  • Integration coverage via PR-T1.
  • cargo check -p xmtp_mls passes.

Note

Add MlsGroup::create_external_invite to produce QR-invite payload and encrypted GroupInfo

  • Adds MlsGroup::create_external_invite in external_invite.rs that exports the current epoch GroupInfo, encrypts it via wrap_group_info, and builds an ExternalInvitePayload governed by the group's EXTERNAL_COMMIT_POLICY.
  • Adds MlsGroup::set_external_commit_policy and set_allow_external_commit in mod.rs to write policy into group AppData, generating a symmetric key and external group ID when enabling.
  • Adds ValidatedCommit::from_external_commit in validated_commit.rs and a corresponding Sender::NewMemberCommit path in mls_sync.rs so external commits authored by non-members are validated against policy and processed.
  • Replaces wrap_welcome/unwrap_welcome with builder-style wrap_payload_hpke/wrap_payload_symmetric APIs in payload_encryption.rs, adding WELCOME_HPKE_LABEL and XMTP_EXTERNAL_INVITE_LABEL for domain separation.
  • Removes the separate sync_api_client from ClientBuilder, XmtpMlsLocalContext, and all bindings; replaces dual api_clients() with single api_client() and introduces WorkerConfig for per-worker interval, jitter, and enablement control exposed to WASM, Node, iOS, and Android bindings.
  • Adds the xmtp_logging crate with a platform-aware XmtpLoggingBuilder, runtime LoggingHandle, OTLP telemetry support, and rolling-file logging for native targets; removes logging from xmtp_common.
  • Risk: removing sync_api_client is a breaking API change for all bindings and any callers using api_clients(); WorkerKind::Event is removed from the enum.

Macroscope summarized 57e7e30.

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Claude finished @tylerhawkes's task —— View job


Code Review — 2026-05-27 18:35 UTC

Reviewed the implementation of MlsGroup::create_external_invite for producing QR-invite payloads with encrypted GroupInfo.

Summary

Overall solid implementation with good cryptographic practices and defensive validation. The code follows expected patterns for this codebase and properly delegates to well-tested helper functions. Main concerns are around test coverage gaps and a few edge cases in expiry computation.


Code Quality & Best Practices

Positive:

  • Clean separation of concerns: orchestration logic in external_invite.rs, crypto primitives delegated to xmtp_mls_common
  • Excellent documentation throughout — the module-level comments clearly explain the flow and lifecycle
  • Proper use of tracing instrumentation (#[tracing::instrument])
  • Good error handling with descriptive error messages
  • Constants are well-defined and documented (SYMMETRIC_KEY_LEN, MIN_EXTERNAL_GROUP_ID_LEN)

Minor issues:

  • compute_blob_expiry at line 182 uses u64::try_from(now_ns).unwrap_or(0) — this silently converts negative timestamps to 0, which could mask bugs. Consider adding validation or using u64::saturating_from with a comment explaining the edge case.

Potential Bugs / Edge Cases

  1. Integer conversion safety (external_invite.rs:192):

    let now_u64 = u64::try_from(now_ns).unwrap_or(0);
    candidates.push(now_u64.saturating_add(policy.expire_in_ns));

    If now_ns is negative (shouldn't be, but defensive programming), this becomes 0 and could produce unexpectedly short expiry times. Consider asserting now_ns >= 0 or documenting why negative values are impossible.

  2. Expiry edge case (external_invite.rs:200):
    When all expiry sources are 0, this returns 0 (no expiry). This is documented and correct, but worth noting that expired blobs (with expires_at_ns = 0) will be accepted indefinitely. The unwrap logic correctly handles this (see encrypted_group_info.rs:146).

  3. DM group check missing:
    The PR description states "Refuses to operate on DM groups (same constraint as set_external_commit_policy)," but I don't see a DM check in create_external_invite. Should there be a check like:

    if self.is_dm()? {
        return Err(GroupError::DmGroupMetadataForbidden);
    }

    This might be enforced implicitly (DM groups shouldn't have the policy set), but an explicit check would be clearer and safer.

  4. Pre-AppData group check (external_invite.rs:115-116):
    The code correctly refuses to operate when EXTERNAL_COMMIT_POLICY is absent, which implicitly enforces the "pre-AppData groups" constraint mentioned in the PR description. This is correct but subtle — consider adding a comment explaining this connection.


Security Concerns

Strong points:

  • Proper use of ChaCha20Poly1305 AEAD via OpenMLS provider
  • Fresh nonce generation for each encryption (critical for AEAD security) — correctly delegated to wrap_group_info which calls generate_nonce()
  • Symmetric key length validation before use
  • External group ID minimum length enforcement
  • No key or sensitive data logged (tracing uses skip_all)

Observations:

  • The symmetric key is read from EXTERNAL_COMMIT_POLICY (set by admin via earlier PR in the stack). This means all members with the policy visible can create invites, which appears to be intentional per the documentation. Worth confirming this is the desired authorization model.
  • group_state_hash uses epoch_authenticator() which is correct per OpenMLS semantics.

Performance Considerations

  • export_group_info with with_ratchet_tree = true can be expensive for large groups, but this is documented and expected (joiner needs the tree for external commit). No issues here.
  • The lock pattern load_mls_group_with_lock_async is appropriate for this operation.
  • encode_to_vec() allocations are fine for invite-creation path (infrequent operation).

Test Coverage

Codecov reports 59.13% patch coverage with 47 uncovered lines. This is the main concern.

Looking at the tests:

  • create_external_invite_fails_when_policy_not_set (lines 216-228) covers the basic error case
  • compute_blob_expiry_picks_tightest_active_value (lines 238-273) has good unit coverage for the expiry logic

Missing coverage likely includes:

  • Happy path: successful invite creation when policy IS set ⚠️
  • Malformed policy scenarios (wrong key length, short external_group_id) ⚠️
  • Edge cases around ensure_not_paused() failure
  • export_group_info error path
  • Interaction between blob_expires_at_ns hint and policy expiry

Recommendation: The PR description notes "Integration coverage via PR-T1" (unchecked). Consider adding at least one happy-path test to this PR that:

  1. Calls set_allow_external_commit(true) to enable the policy
  2. Calls create_external_invite successfully
  3. Validates the returned payload and encrypted blob can be decoded

This would significantly improve coverage and document expected usage.


Error Handling Review

The new error variants in error.rs are well-designed:

ExportGroupInfo(#[from] openmls::group::ExportGroupInfoError),          // Line 452
EncryptedGroupInfo(#[from] EncryptedGroupInfoError),                    // Line 460
ExternalCommitNotAllowed,                                               // Line 466
ExternalCommitPolicyMalformed(String),                                  // Line 473

All four are correctly marked as non-retryable in the RetryableError impl (lines 668-671). Good.


Recommendations

  1. High priority: Add happy-path test coverage showing successful invite creation with a valid policy
  2. Medium priority: Add explicit DM group check if intended (or add comment explaining why it's not needed)
  3. Low priority: Consider adding validation for negative now_ns in compute_blob_expiry, or document why it's safe
  4. Low priority: Add test case for malformed policy scenarios (wrong key/ID lengths)

Overall assessment: Well-implemented feature with good cryptographic hygiene. The main gap is test coverage of the happy path, which should be addressed before or in PR-T1.


@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 21, 2026

Approvability

Verdict: Needs human review

Unable to check for correctness in 57e7e30. Diff is too large for automated approval analysis. A human reviewer should evaluate this PR.

You can customize Macroscope's approvability policy. Learn more.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 59.13043% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.82%. Comparing base (b4b3789) to head (e8978a2).

Files with missing lines Patch % Lines
crates/xmtp_mls/src/groups/external_invite.rs 59.13% 47 Missing ⚠️
Additional details and impacted files
@@                         Coverage Diff                         @@
##           tyler/qr-invite-l9-openmls-bump    #3673      +/-   ##
===================================================================
- Coverage                            83.96%   83.82%   -0.14%     
===================================================================
  Files                                  401      402       +1     
  Lines                                59586    59701     +115     
===================================================================
+ Hits                                 50031    50046      +15     
- Misses                                9555     9655     +100     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l9-openmls-bump branch from 8f3e6d6 to 9ba7051 Compare May 26, 2026 22:49
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from ffaaf90 to c12079b Compare May 26, 2026 22:49
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l9-openmls-bump branch from 9ba7051 to e861d43 Compare May 26, 2026 22:59
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from c12079b to 096e94d Compare May 26, 2026 22:59
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l9-openmls-bump branch from e861d43 to 6b218e5 Compare May 27, 2026 18:17
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch 2 times, most recently from 5ebf785 to e8978a2 Compare May 27, 2026 18:33
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l9-openmls-bump branch from 6b218e5 to b4b3789 Compare May 27, 2026 18:33
insipx and others added 16 commits May 27, 2026 16:56
<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
### Add `--read-only` healthcheck mode and run reverse read-only pass in
cross-talk test
- Adds a `--read-only` flag to the healthcheck subcommand that skips all
write operations and reuses an existing primary identity; the flag gates
write ops via a new `WRITES` condition bit and a new `BOOTSTRAP`
condition for empty identity stores.
- Extends the cross-talk test healthcheck phase to run a forward
write-enabled pass followed by a reverse read-only pass over older
versions, annotating result rows with a `leg` field (`fwd`/`rev`).
- Refactors `App` into a unit struct with a global `CONF` singleton and
`App::network()`/`App::network_hash()` accessors, removing per-instance
and per-call network arguments across clients, stores, and subcommands.
- Introduces `DmStore` and a `Dm` data model for persisting direct
message threads during sync, with metadata tracking and display support.
- Adds `BootstrapIdentities` health op that creates and registers three
new identities when both `BOOTSTRAP` and `WRITES` conditions are active.
- Risk: `STRICT_VERSIONING` bit shifted from `1<<0` to `1<<1`; any
serialized or compared condition values will be incorrect unless
updated.

<!-- Macroscope's review summary starts here -->

<sup><a href="https://app.macroscope.com">Macroscope</a> summarized
2326a91.</sup>
<!-- Macroscope's review summary ends here -->

<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
### Patch `tracing-oslog` dependency to remove panics
Replaces the `tracing-oslog` crate with a forked version pinned to a
specific git revision that removes panics from the library.

<!-- Macroscope's review summary starts here -->

<sup><a href="https://app.macroscope.com">Macroscope</a> summarized
bcccc6e.</sup>
<!-- Macroscope's review summary ends here -->

<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
…spam (#3693)

Two focused changes to cut production log volume. Driven by an audit of
a 1000-line VictoriaLogs sample from 5 herald instances, where **53% of
lines were TRACE** and a single steady-state message accounted for
**~40% of all log lines**.

## Commit 1 — `chore(logs): remove steady-state log noise`
Deletes logs that fire on every poll/tick when *nothing happened* (the
largest volume sources):

- **key_package_cleaner**: drop per-poll `"No expired key packages to
delete"` trace (~40% of sampled prod lines).
- **device_sync worker**: drop the 20s `"New event: Tick"` heartbeat
trace; log only real events, with `installation_id` as a structured
field (not a `[..]` message prefix).
- **sql_key_store**: drop the bare `"own_leaf_nodes"` trace (fires on
every leaf-node read).
- **worker metrics**: downgrade per-increment `"firing {metric}"` from
`info` → `trace`.

## Commit 2 — `refactor(api): remove ApiDebugWrapper stats-on-error
spam`
On **every** API error, `ApiDebugWrapper` snapshotted all per-endpoint
request counters into `AggregateStats` and embedded its multi-line
`Debug` dump into `ApiClientError::{ErrorWithStats,
ClientWithEndpointAndStats}` — which then got logged, and re-logged on
each retry, as a verbose stats block.

This removes the error-dump path so API errors are clean one-liners:
- Delete `ApiDebugWrapper` and its module re-export.
- Remove `ClientBuilder::enable_api_debug_wrapper` and drop the call
from the mobile/node/wasm client builders.
- `WrappedXmtpApiClient` is now `TrackedStatsClient<XmtpApiClient>`.
- Remove the now-unused `ErrorWithStats` / `ClientWithEndpointAndStats`
variants and their match arms.

**Kept unchanged:** the `TrackedStatsClient` request counters and the
FFI `api_statistics()` / `api_aggregate_statistics()` /
`clear_all_statistics()` surface — only the error-message spam is
removed.

## Verification
- `cargo check` + `cargo clippy` clean on `xmtp_proto`, `xmtp_api`,
`xmtp_mls`, `xmtp_db` (the only warning, `identity.rs:842` unused `mut`,
is pre-existing and unrelated).
- `just node check` and `just wasm check` both pass after the binding
edits.

## Follow-ups (not in this PR)
- Replace the in-RAM `TrackedStatsClient` counters with OTEL metrics
(per-API-call latency/rate at the retry combinator), then retire the
stats layer once the FFI is migrated.
- Broader log standardization + OTEL traces/metrics rollout (separate
audit docs exist locally).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Remove API debug wrapper and reduce steady-state log noise across
MLS workers
> - Deletes the `ApiDebugWrapper` and `enable_api_debug_wrapper` builder
method; the `WrappedXmtpApiClient` type alias now resolves to
`TrackedStatsClient<XmtpApiClient>` without the debug layer.
> - Removes `ClientWithEndpointAndStats` and `ErrorWithStats` variants
from `ApiClientError`, so errors no longer carry embedded aggregate
stats.
> - Suppresses high-frequency log output: tick events in
`DeviceSyncWorker`, no-op key package cleaner runs, repeated DB
connection-outage errors, and per-metric info logs are all silenced or
downgraded to trace.
> - Behavioral Change: API errors previously wrapped by the debug
adapter are now surfaced as plain `ClientWithEndpoint` or `Client`
variants with no attached stats.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
480bc19.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the SQLite connection pool cannot hand out a connection (exhausted
/ `timed out waiting for connection` / `Unable to open the database
file`), a **single** root cause was logged dozens of times: each
background worker logged the error itself and then propagated it to the
supervisor, which logged it again on every poll/restart. In a production
sample this turned one pool outage into a flood of duplicate `ERROR`
lines.

## Approach: the supervisor is the single logger for propagated worker
errors
The worker supervisor already receives every error a worker returns and
logs it with context (`{kind} worker error: {err}`), and routes pool
failures to a quiet reconnect. So workers should **not** log errors they
propagate.

- **Remove the per-worker error logs** that just logged-then-propagated
(`disappearing_messages`, `pending_self_remove` outer fetch,
`key_package_cleaner` fetch). The supervisor is now the sole source.
Per-group *swallow-and-continue* logs (e.g. one malformed `group_id`)
are kept, since those never reach the supervisor.
- **Fix a latent bug in `key_package_cleaner`**: it previously *logged
and swallowed* a fetch error (returned `Ok`), so a pool outage was
logged every 5s and never triggered the supervisor’s reconnect/restart
path (silent infinite retry). It now propagates the error like the other
workers.
- **Classify pool-acquisition failures as needing a reconnect**:
`StorageError::db_needs_connection()` now matches
`PlatformStorageError::Pool(_)` (the r2d2 acquisition/timeout error) in
addition to `PoolNeedsConnection`, so the supervisor takes the quiet
reconnect path for a pool outage instead of the noisy error+restart
path. This is the one place the predicate is consulted.
- **Drop a redundant log**: `"Key package deletion successful"`
immediately followed `"Deleted {N} expired key packages (up to ID …)"` —
same meaning, removed.

## Tests
- `pool_errors_need_connection` — unit test for the
`StorageError::db_needs_connection()` classification
(`PoolNeedsConnection` directly and wrapped in `Connection`; a non-pool
error is not). The `Pool(_)` arm can’t be unit-constructed
(`r2d2::PoolError` has no public ctor) so it’s covered end-to-end below.
- `pool_failure_needs_connection` — end-to-end: disconnect a persistent
store’s pool, run a real query, assert the resulting error reports
`db_needs_connection()`.

## Incidental CI fix
`xmtp_db`’s `test-utils` feature did not forward
`xmtp_proto/test-utils`, so `cargo test -p xmtp_db` only compiled under
workspace-wide feature unification (which CI uses). This masked a broken
standalone per-crate test build. Added the missing feature forward.

## Verification
- `cargo clippy --locked --all-features --all-targets --no-deps --
-Dwarnings` ✓
- `cargo fmt --check` ✓ · `cargo hakari generate --diff` ✓
- Both new tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Deduplicate DB pool-failure error spam across MLS workers
> - Extends `StorageError::db_needs_connection()` and
`ConnectionError::db_needs_connection()` to classify generic `Pool(_)`
acquisition failures (not just `PoolNeedsConnection`) as needing a
reconnect.
> - Implements `NeedsDbReconnect` for `MlsStoreError` so worker restart
logic can inspect wrapped DB errors.
> - Refactors `DisappearingMessagesCleaner`, `KeyPackagesCleaner`, and
`PendingSelfRemoveWorker` to propagate structured errors instead of
logging and swallowing them, so pool failures surface once via the
reconnect path rather than flooding logs.
> - Behavioral Change: workers now abort and return an error on the
first DB failure per cycle instead of continuing past errors; this
reduces log spam but changes per-cycle recovery behavior.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
ffe1014.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Third log-noise PR (after #3693, #3696), driven by a **new herald log
dump filtered to INFO-only** (2000 lines, ~15.5 min of healthy traffic —
the first sample was TRACE-flooded and couldn't show INFO top-talkers).

## Headline finding from the INFO dump
**`"Processing readd requests for N groups"` was 1705 of 2000 lines
(85%)** — `commit_log.rs` `handle_incoming_pending_readds`, which runs
every ~2s worker tick and logged at info **even when N=0**. Fixed:
early-return when empty + downgrade to debug. This single change removes
the dominant INFO offender.

## Changes (all info→debug, per-item/steady-state churn)
- **commit_log.rs**: gate `"Processing readd requests"` on non-empty
(the 85% offender); `"No commit log requests to query"` (nothing-to-do
early return); the per-readd cluster (`Sending readd request`, `Sending
readd request to`, `Sent readd request`, `Updated requested readd
sequence id`); `Updating publish cursor for conversation`
(per-conversation loop).
- **welcome_sync.rs**: two `"[{}] syncing group"` (per-group, also
dropped the duplicate inbox_id prefix); `unstuck previously paused
group` (per-group).
- **welcomes/xmtp_welcome.rs**: per-welcome routine logs (`attempting to
commit welcome`, `updating cursor to`, `storing group with welcome id`,
`Created GroupUpdated message`, `updated message cursor from welcome
metadata`). Kept the semantic state-change events (re-add → ALLOWED,
consent reset) and error logs at info/warn.
- **mls_ext/decrypted_welcome.rs**: `Trying to decrypt welcome` /
`welcome pointer` (per-welcome).
- **welcome_pointer.rs**: `Resolving welcome pointer` (kept the
retry-progress log).
- **mls_store.rs**: `returning N welcomes` (per query).
- **oneshot.rs**: `Processing oneshot welcome`.
- **subscriptions/process_welcome.rs**: 3 per-stream-event logs.
- **subscriptions/stream_messages.rs**: `encountered newly unprocessed
message` (per message).
- **xmtp_api_d14n/.../mls.rs**: `got N envelopes`, `querying welcomes`
(per query).

## Deliberately out of scope
- **mls_sync.rs** — the most sensitive code; its per-message logs are
being decided separately to preserve debugging context.
- **The structured `log_event!` event system** (the `➣`-prefixed events
like `GroupSyncGroupInactive`) — left untouched pending a separate
evaluation of whether it's worth keeping.

## Verification
- `cargo clippy --locked --all-features --all-targets --no-deps --
-Dwarnings` ✓
- `cargo fmt --check` ✓
- Log-level/message-only changes; no behavior change except the readd
early-return (skips a no-op log when there is no work).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Downgrade per-item welcome, sync, and query log statements from
info to debug
> Reduces log noise by lowering high-frequency, per-item `tracing`
statements across welcome processing, group sync, commit log, and MLS
query paths from `info` to `debug` level. Also adds an early-return
guard in `handle_incoming_pending_readds` to skip logging entirely when
there are no groups to process.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
357ed5d.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continues the log-noise cleanup (after #3693). These three sites were
among the top remaining volume sources in the production sample, all
logging normal-course behavior at warn/info.

## Changes
- **`xmtp_common/src/retry.rs`** — the retry combinator (every API call
funnels through it):
- Per-attempt `"retrying function that failed"` was logged at **warn**
on *every* retry. A single retry is normal transient behavior, so a
flapping endpoint flooded prod warn logs. → **debug** (with an `attempt`
field).
- `"retry strategy exceeded max wait time"` stays at **warn** (it
signals a call that never succeeded) but now carries context:
`attempts`, `elapsed_ms`, and the error — previously it was a bare
string with no indication of which call or how long.
- **`xmtp_mls/src/groups/mls_sync.rs`** — `"[{inbox_id}] syncing group,
epoch = …"` fired at **info** on every group sync, and duplicated
`inbox_id` (both a structured field *and* the `[{}]` message prefix). →
**debug**, dropped the redundant prefix, `epoch` is now a field.
- **`xmtp_mls/src/groups/welcome_sync.rs`** — `"sync_welcomes failed,
continuing with group sync"` and `"unstick_paused_groups failed,
continuing with sync"` were **warn**, but both are explicitly non-fatal
recovered paths. → **debug**.

Net effect: prod logs at `info`+ now reflect real problems; normal
retry/sync churn drops to `debug`.

## Verification
- `cargo clippy --locked --all-features --all-targets --no-deps --
-Dwarnings` ✓
- `cargo fmt --check` ✓
- Log-level/message-only changes; no behavior change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Downgrade retry and sync log levels from warn/info to debug
> - Retry attempts in `retry_async` now log at debug instead of warn;
exhaustion logs still warn but now include attempt count, elapsed_ms,
and the error message.
> - Group sync start in `MlsGroup.sync` is downgraded from info to debug
and switches to structured fields (inbox_id, installation_id, group_id,
epoch).
> - Failures in `WelcomeSync.sync_groups` for `unstick_paused_groups`
and `sync_welcomes` are downgraded from warn to debug.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
1e96453.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary

Workers in `xmtp_mls` run under `WorkerRunner`. Its outer loop
(`Worker::spawn`) stops a worker **only** when an error returned from
`run_tasks` reports `needs_db_reconnect() == true` — that's the signal
that the connection pool was dropped and the worker should stop until
reconnect.

Several workers violated this contract: they caught per-item errors with
`tracing::error!` and continued, so a dropped pool could never reach the
supervisor and the worker hot-looped against a gone pool. The signal was
also silently lost in wrapper error types that mapped `GroupError(_) =>
false`.

This cleans the workers up to follow the contract, using the
recently-cleaned `disappearing_messages` worker as the model.

## Root fix (error-type forwarding)

- `ConnectionError` gets a `db_needs_connection()` helper mirroring
`StorageError`.
- `GroupError`, `MlsStoreError`, and `SubscribeError` now implement
`NeedsDbReconnect`, forwarding their storage / connection / client /
wrapped variants.
- `DeviceSyncError::needs_db_reconnect` now forwards `Db` / `Group` /
`MlsStore` / `Subscribe` instead of only `Client`.
- `CommitLogError::needs_db_reconnect` forwards `GroupError` and narrows
`Connection` to `db_needs_connection()` (was unconditionally `true`);
removes the stale TODOs.

## Workers

- **key_package_cleaner** — propagate the expired-KP fetch error (was
swallowed → `Ok`); bubble a disconnect caught while deleting a single
key package.
- **pending_self_remove** — capture the group-load (`MlsStoreError`)
path (previously never converted into the worker error) and bubble a
disconnect from either per-group operation; fix `GroupError(_) =>
false`.
- **commit_log** — bubble a disconnect from the readd-request and
readd-validation loops.
- **device_sync** — bubble a disconnect from the per-message processing
loop.

Per-item **non-DB** failures still log and continue, as before — only a
dropped pool now stops the worker.

## Other

- Removes the dead `WorkerKind::Event` variant (no impl, no
construction, not in bindings/FFI/serde).
- Adds native-only unit tests pinning the disconnect-forwarding contract
for every worker error type (`disconnect_propagation_tests`).

## Test plan

- `cargo nextest run -p xmtp_mls disconnect_propagation` — 7/7 pass.
- `cargo check -p xmtp_mls --tests` clean.
- `just lint-rust` clean (clippy / fmt / hakari).

🤖 Generated with [Claude Code](https://claude.com/claude-code)


<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Propagate DB-disconnect errors out of workers to the supervisor
> - Adds or fixes `needs_db_reconnect()` implementations across multiple
error types (`GroupError`, `ClientError`, `IdentityError`,
`SubscribeError`, `DeviceSyncError`, `PendingSelfRemoveWorkerError`) so
dropped-pool signals are forwarded up the call stack rather than
swallowed.
> - Errors in `publish_commit_logs_to_remote` and
`prepare_publish_commit_log_info` now propagate instead of being logged
and ignored, so cursors only advance on success.
> - `send_outgoing_readd_requests` now fails the operation when any
readd request fails, instead of logging and continuing.
> - `handle_incoming_pending_readds` now aborts early on retryable
validation errors to avoid hot loops.
> - Device-sync and archive import paths abort on DB errors instead of
logging warnings and continuing.
> - Behavioral Change: workers that previously swallowed DB-disconnect
errors will now surface them to the supervisor, triggering reconnect
handling.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
4c78768.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When an own-intent failed non-retryably, process_message_inner marked
the intent row Error, committed, and returned Ok(identifier) — dropping
the real GroupMessageProcessingError. process_messages recorded only a
success, so sync_until_intent_resolved returned a summary that displayed
'0 failed 1 succeeded' yet surfaced as a top-level GroupError::Sync with
no cause.

Capture the dropped cause via a ProcessedMessageOutcome struct and
record it in ProcessSummary.errored, so the summary (and its source())
report the actual failure. No DB/txn-commit or retry-rollback behavior
changes.

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Fix `SyncSummary.extend` to preserve the earliest error cause
across sync retries
> - `SyncSummary.extend` previously overwrote `self.other`
unconditionally, clobbering earlier error causes with later `None`
values; it now only sets `self.other` when `self.other` is currently
`None`.
> - `GroupError::SyncFailedToWait` now includes the `SyncSummary` in its
`Display` output so the cause is visible in error messages.
> - Per-attempt error logging inside `MlsGroup::sync` is removed;
summaries are accumulated silently and logged once by
`GroupSyncFinished`.
> - `GroupSyncIntentErrored` events no longer redundantly include the
summary metadata field.
> - Two unit tests are added to
[summary.rs](https://github.com/xmtp/libxmtp/pull/3701/files#diff-cc6e0f2ec334b2a161a0e24bf0a280e37287801dcc7c6e40c46b7d5153580240)
validating the corrected `extend` accumulation behavior.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
55e3eec.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
…3702)

## Problem

`SyncSummary` had two predicates that disagreed about what "errored"
means:

- **`Display`** (success banner) fires only when `publish_errors`,
`post_commit_errors`, AND `other` are all empty.
- **`is_errored()`** used `&&` for publish/post_commit
(`!publish.is_empty() && !post_commit.is_empty()`), so a sync with
*only* publish errors (or only post-commit errors) returned
`is_errored() == false` — reported as not-errored — while its `Display`
printed the `===== Errors Occurred During Sync =====` banner.

These two govern whether a summary may sit in the `Err` position
(`sync_with_conn`) and which `Display` branch is shown, so the
divergence meant an `Ok(...)` summary could render as an error block
(and vice versa).

Separately, `GroupError::Sync` / `SyncFailedToWait` box a `SyncSummary`
whose `source()` returned `None`, so the FFI `Caused by:` chain
dead-ended at the summary's flattened string — the underlying error was
never reachable.

## Changes (summary.rs only)

- **`is_errored()`**: `&&` → `||` for publish/post_commit, and unify on
one predicate. `process.errored` (per-message failures) is *deliberately
excluded* and documented: a sync that decrypts some messages and fails
others is still a successful sync of the group as a whole.
- **`Display`**: success branch is now `if !self.is_errored()` — the two
can no longer drift.
- **`source()`**: returns the first available cause (`other` → first
publish → first post_commit → first `process.errored`), so the FFI error
chain reaches the real failure.
- **6 unit tests**: clean / publish-only / post-commit-only / other /
per-message / source-priority. The publish-only test is the explicit
regression guard for the `&&` bug.

## Verification

- `cargo clippy --locked -p xmtp_mls --all-features --all-targets
--no-deps -- -Dwarnings` + `cargo fmt --check` clean
- New unit tests pass; sync/welcome/commit_log/process_message
integration suites pass (no regression from the `is_errored` behavior
change)

Pairs with #3701 (which captures the dropped intent-error cause into
`process.errored`); this PR makes the type honest, #3701 fills in the
cause. Independent-compiling; recommend landing this first.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Fix `SyncSummary` error detection logic and expose error cause via
`source()`
> - `is_errored()` previously required both `publish_errors` and
`post_commit_errors` to be non-empty (logical AND); it now returns
`true` when any of `other`, `publish_errors`, or `post_commit_errors` is
present.
> - `std::error::Error::source()` now returns the first available
underlying error, preferring `other`, then `publish_errors`, then
`post_commit_errors`, then per-message processing errors; previously it
returned `None`.
> - `Display` formatting now delegates to `is_errored()` instead of
duplicating the error-check logic.
> - Behavioral Change: callers relying on `is_errored()` returning
`false` when only one of publish or post-commit errors is present will
now see `true`.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
9d90b03.</sup>
> <!-- Macroscope's review summary ends here -->
>
> <!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
…st (#3703)

## Problem

Loading clients in xdbg was dominated by TLS setup, not DB or MLS work.
Phase timing while loading 100 identities showed ~100% of the time
inside `network.client_bundle()`, with a tell-tale monotonic
+42ms-per-call stair-step — i.e. a global lock serializing ~42ms of work
per call, paid **twice per client** (api + sync channel).

Root cause: `create_tls_channel` builds a fresh tonic `Endpoint` on
every call, and `ClientTlsConfig::with_enabled_roots()` makes tonic call
`rustls_native_certs::load_native_certs()` and parse the **entire OS
trust store** into a rustls `ClientConfig` each time. On macOS that read
is serialized through Security.framework (~40ms). 200 calls × ~40ms ≈
the ~9–19s we were seeing.

## Fix

**`xmtp_api_grpc`:** cache the built `Endpoint` per `(host,
rate_limit)`. The endpoint owns an `Arc<ClientConfig>` with the
already-parsed roots, so `connect_lazy()` on a cheap clone still hands
**every client its own connection** — no change to connection topology
or trust semantics — while the native-cert load happens **once per
host**. The lock is held across the one-time build so concurrent
first-callers load certs exactly once. This benefits every platform that
creates clients (node/mobile too), not just xdbg.

**`xdbg`:** parallelize identity loading (was a sequential `for` loop).
Each client open is blocking SQLite/MLS work, so they're handed to
Tokio's blocking pool via `tokio::task::spawn_blocking` and awaited
together; `load_all_identities` / `load_n_identities` and their callers
become `async`.

## Result

`cargo xdbg -b dev generate -e message -a 1`, 100 clients:

| | load time |
|---|---|
| before | ~19s |
| after | **~185ms** |

No new dependencies. No trust-store/behavior change. Each client still
gets its own connection.

## Test plan
- [x] `cargo fmt --check`
- [x] `cargo clippy --locked --all-features --all-targets --no-deps --
-Dwarnings` (full workspace, clean)
- [x] manual: 100-client xdbg load ~19s → ~185ms, sends still succeed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ## Problem
>
> Loading clients in xdbg was dominated by TLS setup, not DB or MLS
work. Phase timing while loading 100 identities showed ~100% of the time
inside `network.client_bundle()`, with a tell-tale monotonic
+42ms-per-call stair-step — i.e. a global lock serializing ~42ms of work
per call, paid **twice per client** (api + sync channel).
>
> Root cause: `create_tls_channel` builds a fresh tonic `Endpoint` on
every call, and `ClientTlsConfig::with_enabled_roots()` makes tonic call
`rustls_native_certs::load_native_certs()` and parse the **entire OS
trust store** into a rustls `ClientConfig` each time. On macOS that read
is serialized through Security.framework (~40ms). 200 calls × ~40ms ≈
the ~9–19s we were seeing.
>
> ## Fix
>
> **`xmtp_api_grpc`:** cache the built `Endpoint` per `(host,
rate_limit)`. The endpoint owns an `Arc<ClientConfig>` with the
already-parsed roots, so `connect_lazy()` on a cheap clone still hands
**every client its own connection** — no change to connection topology
or trust semantics — while the native-cert load happens **once per
host**. The lock is held across the one-time build so concurrent
first-callers load certs exactly once. This benefits every platform that
creates clients (node/mobile too), not just xdbg.
>
> **`xdbg`:** parallelize identity loading (was a sequential `for`
loop). Each client open is blocking SQLite/MLS work, so they're handed
to Tokio's blocking pool via `tokio::task::spawn_blocking` and awaited
together; `load_all_identities` / `load_n_identities` and their callers
become `async`.
>
> ## Result
>
> `cargo xdbg -b dev generate -e message -a 1`, 100 clients:
>
> | | load time |
> |---|---|
> | before | ~19s |
> | after | **~185ms** |
>
> No new dependencies. No trust-store/behavior change. Each client still
gets its own connection.
>
> ## Test plan
> - [x] `cargo fmt --check`
> - [x] `cargo clippy --locked --all-features --all-targets --no-deps --
-Dwarnings` (full workspace, clean)
> - [x] manual: 100-client xdbg load ~19s → ~185ms, sends still succeed
>
> 🤖 Generated with [Claude Code](https://claude.com/claude-code)<!--
Macroscope's changelog starts here -->
> #### Changes since #3703 opened
>
> - Converted `load_clients_parallel`, `load_all_identities`, and
`load_n_identities` utilities to async functions [69570ca]
> - Updated `GenerateMessages::new` constructor and
`GenerateGroups::create_groups` workflow to async [69570ca]
> - Updated `Generate::run` and `Sync::run` command methods to await
async operations [69570ca]
> <!-- Macroscope's changelog ends here -->
>
<!-- Macroscope's pull request summary ends here -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sable (#3706)

## Summary

Makes `xmtp_mls` background-worker scheduling configurable to fix a
**thundering-herd / stampede** problem: when many clients (250+) deploy
at the same time, their workers tick in lockstep and fire a wall of
simultaneous requests.

Adds, surfaced through the **Node, WASM, and Mobile** SDKs:

- **Configurable intervals** — a global default plus per-worker
overrides (nanoseconds, consistent with the existing
`ForkRecoveryOpts::worker_interval_ns`).
- **Optional jitter** (`worker_interval_jitter_ns`) — a one-time random
startup offset (de-syncs clients that booted together) plus per-tick
jitter (keeps them from re-converging). Fully optional; `0`/absent =
deterministic, byte-for-byte today's behavior.
- **Per-worker enable/disable** — a unified `enabled` map. The legacy
`device_sync_worker_mode` and `disable_commit_log_worker` toggles fold
into the same map (still work unchanged); `disable_workers` remains the
global kill-switch.

All new config is opt-in: `WorkerConfig::default()` reproduces existing
behavior exactly. No DB migration, no protocol change, no change to
default cadences.

## Design

Mirrors the in-tree `fork_recovery_opts` pattern: `WorkerConfig` is
carried on `XmtpMlsLocalContext` and exposed via
`XmtpSharedContext::worker_config()`. A new
`xmtp_common::time::jittered_interval_stream(base, jitter)` adds the
startup offset + per-tick jitter (`jitter == 0` short-circuits to the
existing `interval_stream`, pulling no randomness). Each worker resolves
its interval through a context helper instead of its hardcoded const,
keeping the const as the fallback default.

Workers wired: disappearing-messages, key-package-cleaner,
pending-self-remove, commit-log, device-sync heartbeat. TaskRunner is
event/dynamic-driven and left as-is. For commit-log, the existing
fork-recovery `worker_interval_ns` override still wins where set.

## SDK surface (all fields optional)

| Concept | Node | WASM | Mobile |
|---|---|---|---|
| Global interval | `workerConfig.defaultIntervalNs` |
`workerConfig.defaultIntervalNs` | `FfiWorkerConfig.default_interval_ns`
|
| Jitter | `workerConfig.jitterNs` | `workerConfig.jitterNs` |
`FfiWorkerConfig.jitter_ns` |
| Per-worker interval | `workerConfig.workerIntervalsNs` |
`workerConfig.workerIntervalsNs` | `FfiWorkerConfig.worker_intervals_ns`
|
| Per-worker disable | `workerConfig.disabledWorkers` |
`workerConfig.disabledWorkers` | `FfiWorkerConfig.disabled_workers` |

New `WorkerKind` / `FfiWorkerKind` enum mirrors the core `WorkerKind`.

## Testing

- `xmtp_common`: `jittered_interval_stream` — zero-jitter equals base;
jittered period stays within bounds; startup offset bounded.
- `xmtp_mls`: `WorkerConfig::interval` precedence (override > default >
const), zero-clamp, jitter carry, `worker_enabled` default-on; a
disabled worker is not registered.
- Existing worker + device-sync suites pass unchanged (defaults preserve
behavior).

Verified: `just lint-rust`, `just node lint`, `cargo nextest run -p
xmtp_common time`, `cargo nextest run -p xmtp_mls worker`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… iOS/Android (#3707)

Normalize Node and WASM 1.10.0 release notes to match the structure 
already used in ios/4.10.0.md and android/4.10.0.md.

No content changes, structural and formatting fixes only.

Changes:

node-bindings/1.10.0.md:
  Add `## What's Changed` section header before intro text
  Add `## Breaking Changes` section header
    `### History sync is now manual` → moved under `## Breaking Changes`
`### Upcoming deprecation of preferences.sync()` → moved under `##
Breaking Changes`
  Add `## New Features` section header
`### Manual history sync archive management` → moved under `## New
Features`
    `### Archive-based backups` → moved under `## New Features`
  Add `## Bug Fixes` section header
`### Performance improvement` → removed subheading, text moved under `##
Bug Fixes`
  `**THIS IS A BREAKING CHANGE.**` → `**This is a breaking change.**`

wasm-bindings/1.10.0.md:
  (identical structural changes as node-bindings/1.10.0.md)

Test plan:
Documentation-only change; no code paths affected.

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Normalize Node and WASM 1.10.0 release notes structure to match
iOS/Android
> Restructures the 1.10.0 release notes for
[Node](https://github.com/xmtp/libxmtp/pull/3707/files#diff-883d74d020e02801e31065330f11fdbd52fd8892ed0e14c97ba4616d470e07df)
and
[WASM](https://github.com/xmtp/libxmtp/pull/3707/files#diff-1dd324ff90565158b66b4d4de5d306179d8970d17b88a2537449749d67fb51de)
bindings to match the section layout used by the iOS and Android notes.
Adds explicit `What's Changed`, `Breaking Changes`, `New Features`, and
`Bug Fixes` headers, moves content into the appropriate sections,
deduplicates the `preferences.sync()` deprecation notice, and normalizes
minor formatting.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
83ce8bf.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
…3709)

## Summary

Follow-up to #3706. `WorkerConfig` shipped with a single **global**
`jitter_ns` that applied to *every* background worker. Consumers who
only want to jitter one worker — e.g. scheduling the daily CommitLog
fork-recovery worker with 6h jitter to avoid a fleet stampede — end up
jittering the fast workers too. A 1s-interval worker
(DisappearingMessages) inheriting up to 6h of jitter is effectively
broken.

This makes jitter **per-worker**, mirroring how interval already works
via `interval_overrides`.

## Changes

- **Core** (`crates/xmtp_mls/src/worker.rs`): remove
`WorkerConfig.jitter_ns: Option<u64>`, add `jitter_overrides:
HashMap<WorkerKind, u64>`. `interval()` resolves jitter per-kind; absent
entry => `0` (deterministic). No global default jitter — that's the
foot-gun being removed.
- **Bindings** (node / wasm / mobile): drop the scalar `jitterNs` from
`WorkerConfigOptions`, add a `workerJittersNs` array of `{ kind,
jitterNs }` mirroring `workerIntervalsNs`.

Generated TS surface:
```ts
workerConfig.workerJittersNs?: Array<{ kind: WorkerKind; jitterNs: bigint }>
```

## Breaking change

Breaks the nightly-only `WorkerConfigOptions.jitterNs` field. No stable
consumers; nightly breakage is expected. Downstream consumers set
per-worker jitter instead, e.g.:
```ts
workerJittersNs: [{ kind: "CommitLog", jitterNs: 21600n * 1_000_000_000n }]
```

## Notes

- `create_client` FFI signatures unchanged (still one `worker_config`
param) — only the inside of `WorkerConfigOptions` changes, so no
SDK-wrapper or caller churn.
- `WorkerConfig::default()` behavior unchanged (empty maps,
deterministic). No DB/protocol change.

## Testing

- Core: per-kind jitter carried; jitter scoped per worker (jittered
worker gets it, un-listed worker stays `0`).
- `just lint-rust` ✓ · `just node lint` ✓ · all three bindings compile
(`bindings_node`, `bindings_wasm` wasm32, `xmtpv3`) ✓ · `xmtp_mls
worker_config` tests ✓

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Replace global `jitter_ns` with per-worker `jitter_overrides` in
`WorkerConfig`
> - Removes the global `jitter_ns: Option<u64>` field from
`WorkerConfig` and replaces it with `jitter_overrides:
HashMap<WorkerKind, u64>`, so jitter is scoped per worker kind with a
default of zero.
> - Updates `WorkerConfig.interval()` to resolve jitter from the
per-kind map instead of a single global value.
> - Mirrors the change across all three binding layers:
[worker_config.rs](https://github.com/xmtp/libxmtp/pull/3709/files#diff-3b21fab5bc1e30d4eb92b2d0e14461f1c2fd991164b484046e665363bb3fd48d)
(mobile/FFI),
[options.rs](https://github.com/xmtp/libxmtp/pull/3709/files#diff-5306d98c5fe6413b641310877f4657e782cd87bc1e7497e28c0478154fb062d7)
(Node), and
[client.rs](https://github.com/xmtp/libxmtp/pull/3709/files#diff-6fa64ca0f35092cf055d31985b6e071f8dda1fcac5f9f3216f813422411af7c8)
(WASM), each replacing a single `jitter_ns` field with a
`worker_jitters_ns` array of per-worker override objects.
> - Behavioral Change: workers with no explicit jitter entry now receive
`Duration::ZERO` instead of any previously configured global jitter;
callers must migrate to per-worker overrides.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
187d245.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary

Adds a `worker_turn` tracing span to each background worker's per-turn
unit of work, so telemetry/otel can report **per-worker turn latency and
counts, sliced by worker kind**.

**Why:** each worker's `run_tasks()`/`run()` is an internal infinite
loop (`while intervals.next().await { ... }`), so the existing
`worker_run` span is lifetime-long and idle-inclusive — useless for
per-turn metrics. The fix attaches `#[tracing::instrument]` to each
worker's actual per-turn method (the loop body), not the looping
function.

## What changed

A `worker_turn` span on all 6 workers' turn methods, all carrying the
byte-identical `operation = "worker_turn"` (the spanmetrics aggregation
key) and a `worker` dimension:

| Worker | turn span site |
|---|---|
| CommitLog | `tick()` |
| DisappearingMessages | `delete_expired_messages()` |
| PendingSelfRemove | `remove_pending_remove_users()` |
| KeyPackageCleaner | extracted `tick()` |
| TaskRunner | `run_and_reschedule_task()` |
| DeviceSync | extracted `handle_event()` |

- `worker = ?self.kind()` (Debug) on the 5 method sites; `TaskRunner`
uses the literal `"TaskRunner"` (its turn method is an associated fn
with no `self`). Both formats equal the existing `worker_run` span's
value, so the two span series correlate cleanly.
- `skip_all` everywhere (workers hold a `Context` with DB/API handles).
- Two behavior-preserving extractions to host the macro:
`KeyPackageCleaner::tick()` and `DeviceSync::handle_event()`. The
DeviceSync 20s `Tick` heartbeat is deliberately bypassed before dispatch
so idle heartbeats open no span (and emit no log).

## Deliberately out of scope

- Existing `worker_run` / `libxmtp_worker` spans left untouched (don't
break existing dashboards).
- Deep spans in `mls_sync.rs` (`sync`, `post_commit`, `publish_intents`,
...) not touched — they're shared by foreground user calls too, so
there's no static worker to stamp. otel traces already carry the
worker→deep-span lineage; deep-span dashboard slicing is a
telemetry-repo concern.

Design + plan included under `docs/superpowers/`.

## Test Plan

- [x] `just check crate xmtp_mls` — compiles clean
- [x] `just lint-rust` — clippy / fmt / hakari clean
- [x] `just test v3 worker` — 29/29 worker tests pass (incl. all
device_sync + disconnect-propagation)
- [x] Invariant check: exactly 6 `operation = "worker_turn"` sites,
byte-identical

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Add per-worker-turn tracing spans across MLS background workers
> Adds `#[tracing::instrument]` spans with `worker` and
`operation="worker_turn"` fields to each background worker's main
processing method, enabling per-turn telemetry in distributed traces.
>
> -
[`commit_log.rs`](https://github.com/xmtp/libxmtp/pull/3711/files#diff-523943065ed63a511a3a552e44cb0ba0a05fecab9163088d5588cb5c949e6a5d),
[`disappearing_messages.rs`](https://github.com/xmtp/libxmtp/pull/3711/files#diff-93ae9a4a70ee163ee3ce898babe5119780fa34ca9d91b823ee0135fda32f1160),
[`pending_self_remove.rs`](https://github.com/xmtp/libxmtp/pull/3711/files#diff-2c915110e23b0ac1caa8ae0026e6cf76cff97c528ff81f91eef1a26a5d421f81),
and
[`tasks.rs`](https://github.com/xmtp/libxmtp/pull/3711/files#diff-b3d01dc512a077cf4b625f516f8c36ea61978f070bcce720ed8d522065a268b6)
each get a span on their primary tick/run method with no logic changes.
> -
[`key_package_cleaner.rs`](https://github.com/xmtp/libxmtp/pull/3711/files#diff-f474b9d0bca387a8ef826d7b34e811f8644323a8ac5584efc6eef034cfaeeb8d)
extracts a new `tick` method to serve as the span boundary.
> -
[`device_sync/worker.rs`](https://github.com/xmtp/libxmtp/pull/3711/files#diff-3d1d107a7d85b7aaf674a2365c789f51bda6243c3a37caab478baa8672ec737e)
introduces a `handle_event` method that wraps non-`Tick` events in a
span with an additional `event` field; `Tick` events bypass the span to
avoid noise.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
7d32c55.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
insipx and others added 6 commits June 3, 2026 21:32
…t gating (phase 1) (#3710)

## Summary

Phase 1 of an automated release flow for libxmtp: a **git-cliff–driven**
always-open Release PR with auto-generated changelog, a **GitHub Release
"hub"** that fans out to per-SDK nightly releases with track-aware
versions, and a **cross-test gate** that blocks nightlies unless
cross-talk + cross-version are green for the exact release SHA.

### Why git-cliff (not release-plz)
This started as "use release-plz." Investigation found release-plz's
Release-PR **cannot** generate a changelog without also bumping
`Cargo.toml`'s version — and our workspace version is the **runtime MLS
protocol identity** (`PROPOSALS_MIN_PROTOCOL_VERSION <=
CARGO_PKG_VERSION` is CI-enforced; flipping it backward bricks
`enable_proposals`). So the version must never be auto-bumped.
**git-cliff** (the engine release-plz uses internally) gives the same UX
— always-open changelog PR + a print-only version oracle — while never
touching any manifest. No crates.io publish.

### What's in this PR
- **git-cliff seed:** `cliff.toml` (changelog config, anchored to stable
`v*` tags) + `.github/workflows/release-pr.yml` (always-open
changelog-only Release PR) + git-cliff in the Nix devshell.
- **SDK registry extension** (`dev/release-tools`): `versionTrack`
(`follows-libxmtp` vs `independent`), notes globs, `releaseWorkflow`,
`channels` per SDK — the extensibility spine (xmtp-js later = 2 entries,
no engine change).
- **Version-track engine:** `resolve-sdk-version` (follows-libxmtp takes
the 1.x number verbatim; independent mirrors the bump-kind onto its own
base, e.g. iOS 4.10 → 4.11), preview-next prerelease versioning
(semver-correct nightlies), git-cliff oracle parsing
(`pending-version`).
- **Cross-test gate:** `cross-test-gate` (fail-closed, SHA-exact) +
nightly wiring in `release.yml` (gate → pending → fan-out → hub), with
an idempotent hub GitHub Release carrying an SDK-version rollup.

### Not in this PR (future phases)
- Stable release-branch flow, backports, RC (phase 2).
- xmtp-js (node-sdk 6.x, browser-sdk 7.x) onboarding (phase 3).
- Wiring per-SDK nightly *filtered* notes onto the hub (thin follow-up;
globs are already data).

The runtime/manifest version (`1.11.0-dev`) is intentionally
**untouched** — verified across the whole diff.

## Test Plan
- [x] `dev/release-tools`: `yarn typecheck` clean, `yarn test` 158
passed
- [x] All 5 workflow YAML files validate
- [x] git-cliff `--bumped-version` oracle verified print-only against
the real repo (resolves to last stable tag, writes nothing)
- [x] CLI smoke tests: `pending-version`, `resolve-sdk-version` (ios →
4.11.0), `list-sdks`, `cross-test-gate` (pass + fail cases, exit 0 on
skip)
- [ ] **Manual (post-merge / dispatch on a test branch):** trigger the
Release workflow nightly path and confirm gate-skip behavior, pending
version/kind, per-SDK preview-next versions, and the hub release with
rollup. (CI workflows can't be fully unit-tested.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Add git-cliff nightly release orchestration with cross-test gating
and auto-generated changelogs
> - Adds a gate job in [release.yml](.github/workflows/release.yml) that
blocks nightly releases unless cross-test CI passes for the exact commit
SHA, using the new `xmtp-release cross-test-gate` command.
> - Adds a pending job that uses `git-cliff --bump --context` to detect
unreleased conventional commits; skips the nightly entirely if nothing
is pending.
> - Nightly SDK release workflows
([iOS](.github/workflows/release-ios.yml),
[Android](.github/workflows/release-android.yml),
[Node](.github/workflows/release-node.yml),
[WASM](.github/workflows/release-wasm.yml)) now require
`pending-version` and `pending-kind` inputs and resolve versions via the
new `xmtp-release resolve-sdk-version` command.
> - Adds a hub job that creates or updates a prerelease GitHub Release
summarizing all SDK versions shipped in a nightly run, skipped if no SDK
shipped.
> - Adds [release-pr.yml](.github/workflows/release-pr.yml) to maintain
a persistent open PR that updates `CHANGELOG.md` on every push to main.
> - Adds new release-tools CLI commands (`pending-version`,
`resolve-sdk-version`, `list-sdks`, `cross-test-gate`) and supporting
library modules for gate evaluation, git-cliff context parsing, and
track-aware SDK version resolution.
> - Risk: nightly releases now fail closed — missing cross-test runs or
absent pending commits will skip or abort the release rather than
proceeding.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
988e379.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary

An already-processed welcome (duplicate delivery, or a resume past our
cursor for a group we're already in) is an **expected,
gracefully-handled** condition — the error is returned, the cursor is
handled, the welcome is skipped. But it was logged at `tracing::error!`.

The otel tracing layer (`tracing_opentelemetry`, default config) sets a
span's status to `Error` for any `error!` event fired while that span is
active — it's event-driven, not return-value-driven. So this benign
condition marked the enclosing worker/sync span `status:error` and
**inflated the telemetry error rate** even though nothing actually
failed.

## Fix

Downgrade both emitters of this condition from `error!` to `warn!`:

- **`groups/welcomes/xmtp_welcome.rs`** — the upstream `"Skipping
already processed welcome"` log inside `.process()`. This is the one
that actually fires inside the worker span and poisons its status; it's
the dominant source for the duplicate-delivery case.
- **`groups/welcome_sync.rs`** — the downstream catch-all `Err(err)`
arm, which would otherwise log the returned `WelcomeAlreadyProcessed` at
`error!` for the already-in-group path.

This mirrors the existing storage-duplicate handling a few lines up,
which already logs at `warn!`.

**Behavior is otherwise unchanged:** the error is still returned, cursor
handling is untouched. This is a telemetry/log-level-only change.

## Test Plan

- [x] `just check crate xmtp_mls` — compiles clean
- [x] `just lint-rust` — clippy / fmt / hakari clean
- [x] `just test v3 welcome` — 60/60 welcome tests pass
- [x] Confirmed no `error!` remains for this condition (both sites now
`warn!`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Log already-processed welcomes at warn instead of error in
`process_new_welcome`
> Encountering an already-processed welcome is an expected,
non-retryable condition, so logging it at `error` level produced noise.
Both
[`welcome_sync.rs`](https://github.com/xmtp/libxmtp/pull/3714/files#diff-5ad5d6ca4506c4960727c32b22b9168a128f14af4d61497cd168ba5c80fb43f5)
and
[`xmtp_welcome.rs`](https://github.com/xmtp/libxmtp/pull/3714/files#diff-7ad79af03cc69f17888ed323b42554f52c5c706674e76a981c4fd4663464bb4e)
now use `warn` for this case. Error return semantics are unchanged.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
43d7550.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
…3713)

Two changelog improvements, both stemming from the scruffy first
generated Release PR (#3712).

## 1. Tidy the generated changelog (`cliff.toml`)

**Body-dump scruff** — the template rendered `commit.message` (full
message = subject + body). GitHub squash-merges fill the commit body
with the PR description, so each merge dumped its whole PR body into the
changelog: Macroscope HTML-comment summary blocks, `Co-authored-by:`
trailers, even a pasted Nix build-error log (~316 of #3712's 461 lines).
```diff
-{{ commit.message | upper_first }}
+{{ commit.message | split(pat="\n") | first | trim | upper_first }}
```

**Fake per-scope groups** — non-conventional commits (e.g. `xmtp_proto:
…`, `bindings/mobile: …`) made git-cliff invent junk section headers
like `Xmtp_proto` / `Bindings/mobile`. A catch-all collects them into
one `Other` group:
```diff
+  { message = ".*", group = "<!-- 8 -->Other" },
```

Result (regenerated from `main`): group headers are exactly **Features ·
Bug Fixes · Performance · Documentation · Other**, every entry a clean
subject line, zero scruff.

## 2. Per-nightly changelog in the hub release (`release.yml`)

Each nightly hub GitHub Release now appends a **"Changes since the last
nightly"** section — the git-cliff delta over
`<previous-nightly-tag>..HEAD` (`--strip header`), computed before this
run is tagged so the newest existing `v*-nightly.*` tag is genuinely the
prior one. First-ever nightly (no prior nightly tag) falls back to
since-last-stable. Empty output is dropped. Adds git-cliff to the hub
job.

> Uses an explicit `<tag>..HEAD` range rather than the config's
stable-anchored `tag_pattern`, so it diffs against the last *nightly*,
not the last stable. Relies on the §1 fixes for clean output.

## Verification

- Regenerated changelog from `main`: clean, 5 well-formed groups, no
scruff.
- Simulated the per-nightly delta (`<tag>..HEAD --strip header`): clean
grouped output, 22 bullets, no header.
- All workflow YAML validated.

Once merged, the open Release PR (#3712) regenerates clean on the next
push to main, and the next real nightly carries a proper
since-last-nightly changelog.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Tidy generated changelog to show subject-only commits and group
non-conventional commits into 'Other'
> - Updates
[cliff.toml](https://github.com/xmtp/libxmtp/pull/3713/files#diff-e1372c8b03c40942b5d828a90975054cb8aaed3b38189f434396f922ec41a584)
to trim commit messages to their first line only and capitalise the
first character, removing multi-line noise from changelogs.
> - Adds a catch-all parser rule in
[cliff.toml](https://github.com/xmtp/libxmtp/pull/3713/files#diff-e1372c8b03c40942b5d828a90975054cb8aaed3b38189f434396f922ec41a584)
that places non-conventional commits into an 'Other' group instead of
creating per-scope groups.
> - Updates
[release.yml](https://github.com/xmtp/libxmtp/pull/3713/files#diff-87db21a973eed4fef5f32b267aa60fcee5cbdf03c67fafdc2a9b553bb0b15f34)
to install `git-cliff`, compute a delta changelog from the previous
nightly tag to `HEAD`, and append a
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
eecd171.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iber + otel) + node migration (#3708)

## What

Introduces a new `crates/xmtp_logging` crate as the **sole owner** of
the
`tracing-subscriber` and `opentelemetry` dependencies, and migrates the
**node**
binding onto it. All production logging/tracing pipeline construction,
the OTLP
trace exporter, and the runtime-control handle now live in one place.

## Why

Logging/telemetry setup was scattered across `xmtp_common` (which sat
near the
bottom of the dep graph, so every crate that touched it pulled in
`tracing-subscriber` + the full OpenTelemetry stack) and re-implemented
per
binding. Consolidating into a dedicated crate gives us:

- one builder + one `LoggingConfig` record at the FFI boundary
- a `LoggingHandle` for runtime control (set level, enable/disable file
logging,
  enable telemetry, flush) without re-installing the subscriber
- `xmtp_common` no longer carries `opentelemetry*` /
`tracing-subscriber` as
  production deps (test/bench-only now)

## What's in the crate

- `config.rs` — canonical `Level` / `Rotation` / `ProcessType` enums and
  `FileConfig` / `TelemetryConfig` / `LoggingConfig` records.
- `filter.rs` — `filter_directive` (moved from `xmtp_common`).
- `telemetry.rs` — OTLP span exporter + `TelemetryGuard` (native-only,
`cfg(not(wasm32))`; always compiled, enabled at runtime when an endpoint
is set
  — no Rust feature flag).
- `layers/` — stdout/json fmt, rolling file, per-OS native
(logcat/os_log/
  server-compact), and browser console/perf layers.
- `builder.rs` — `XmtpLoggingBuilder::from_config(cfg).install()`
installs the
  global subscriber.
- `handle.rs` — `LoggingHandle` with reloadable level / file / telemetry
slots.
- test subscriber (the obsolete `tracing-forest` tree logger and
`TestLogReplace` / `REPLACE_IDS` are removed — OTEL spans replace them).

## Consumer changes

- **node** — fully migrated: `init_logging` builds a `LoggingConfig`
from
`LogOptions` and installs once; `flush_telemetry` delegates to the
handle.
  Drops the direct `tracing-subscriber` dep.
- **mobile / wasm / xmtp_debug** — repointed `filter_directive` to
`xmtp_logging`
  and dropped the `xmtp_common` `logging` feature. These keep their own
subscribers for now; fully migrating them to the builder is a follow-up.

## Verification

Native + wasm build, clippy, rustfmt, taplo, hakari (regenerated — drops
`tracing-log`, the removed transitive), and unit tests
(`xmtp_logging` + `xmtp_common`) all pass locally.

## Follow-ups (separate PRs)

- Migrate wasm + mobile bindings onto the builder/handle (not just
  `filter_directive`).
- Migrate apps.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Extract logging and telemetry into a dedicated `xmtp_logging`
crate as sole owner of tracing-subscriber and OpenTelemetry
> - Creates a new `xmtp_logging` crate with a fluent builder API
(`XmtpLoggingBuilder`) that installs a global tracing subscriber with
reloadable log level, optional rolling-file output, and optional OTLP
telemetry export.
> - Migrates all logging/telemetry code out of `xmtp_common`, removing
the `logging` feature flag from that crate; all consumers now depend on
`xmtp_logging` directly.
> - Updates the Node binding's `init_logging` to use `xmtp_logging` via
a process-global `LOGGING_HANDLE`, and changes `flush_telemetry` to call
`handle.flush()` instead of shutting down the exporter.
> - Removes `TestLogReplace` and per-instance log-ID replacement from
`xmtp_common::test` and `Tester`; test harnesses use
`xmtp_logging::test_logging` instead.
> - Risk: `TestLogReplace` log-ID substitution is permanently removed
from test output; tests that relied on named replacements in log lines
will no longer see them.
>
> <!-- Macroscope's changelog starts here -->
> #### Changes since #3708 opened
>
> - Replaced `once_cell` dependency with standard library
synchronization primitives [6d0ab95]
> - Standardized workspace dependency notation across Cargo.toml files
[6d0ab95]
> - Updated Cargo.lock [6d0ab95]
> <!-- Macroscope's changelog ends here -->
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
b45b18f.</sup>
> <!-- Macroscope's review summary ends here -->
>
> <!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
…ry-run gate (#3717)

## Why

On 2026-06-04 the nightly pipeline published a bogus **libxmtp `2.0.0`**
and **iOS/Android `5.0.0`**, and the Android `5.0.0-nightly` is now
**permanently on Maven Central** (immutable). Root causes:

1. **Non-deterministic git-cliff.** The oracle installed git-cliff via
`cargo binstall git-cliff || cargo install git-cliff` — floating latest.
At the same commit, CI computed a **major** bump while local computed
minor.
2. **Nightly auto-majored.** A `BREAKING CHANGE:` footer drove a major
bump; a nightly should never auto-major.
3. **No dry-run / preview gate.** Nothing sat between "compute version"
and the irreversible publishes.

## What (three layers of protection)

- **COMPUTE** — new pure `capBumpKind` in `xmtp-release` clamps a
nightly `major` → `minor` (recomputing the version from the last stable
tag). Exposed via `pending-version --max-bump minor`. Unit-tested.
- **ASSERT** — the plan job fails if the bump is still `major` after the
cap (defense-in-depth; every release-*/hub job `needs:` the plan, so a
failure blocks all publishes). Plus a human-readable release plan
printed to the step summary.
- **PREVIEW** — git-cliff is pinned to **2.13.1**; a `dry-run` dispatch
input runs the full plan but skips every publish/tag; and
`gate`+`pending` are extracted into a reusable `release-gate-plan.yml`
so a new **`nightly-dry-run.yml`** can preview the real nightly version
logic on demand (no publish jobs exist in it — structurally cannot
publish).

## Verification (against the exact v5 failure mode)

Reproduced locally with the pinned git-cliff 2.13.1 against real
upstream `main`:

| case | result |
| --- | --- |
| real current main, capped | `1.11.0` / minor ✓ |
| v5 mode (git-cliff says major), **no cap** | `2.0.0` / major —
reproduces the bug |
| v5 mode **with `--max-bump minor`** | **`1.11.0` / minor** — clamped ✓
|
| assertion if a major slipped through | fires `::error:: refusing to
publish`, exit 1 ✓ |

167 unit tests pass; typecheck clean; all 3 workflow YAMLs valid. The
refactor was verified to NOT break non-nightly (dev/rc/final) releases
(a skipped `plan` dependency doesn't block them — `!cancelled()` +
`release-type != 'nightly'` short-circuit).

> **Post-merge step:** `workflow_dispatch` requires the workflow on the
default branch, so the `nightly-dry-run.yml` GH-Actions wiring gets its
full end-to-end check by dispatching it on `main` after merge (safe — no
publish jobs) before the next cron nightly.

## Out of scope
- Recovering the already-published `5.0.0-nightly` (impossible — Maven
Central is immutable) and the resulting Android-v5 / iOS-sync decision
(team call).
- The larger release-process simplification (unify version lines /
libsignal model / SPM-only) — tracked separately, needs team buy-in.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Make nightly release version deterministic, cap bump at minor, and
add dry-run gate
> - Extracts release gating and version computation into a reusable
[release-gate-plan.yml](.github/workflows/release-gate-plan.yml)
workflow that resolves the exact SHA, checks cross-test status, and
computes a pending version capped at minor bump.
> - Adds `--max-bump` to the `pending-version` CLI command via a new
`capBumpKind` utility in
[sdk-version.ts](dev/release-tools/src/lib/sdk-version.ts) that clamps
the bump kind and recomputes the version from the last-shipped baseline.
> - Replaces separate `gate` and `pending` jobs in
[release.yml](.github/workflows/release.yml) with a single `plan` job;
all release jobs now read `pending-version` and `pending-kind` from plan
outputs and pin git-cliff to 2.13.1.
> - Adds a new manual-only
[nightly-dry-run.yml](.github/workflows/nightly-dry-run.yml) workflow
and an optional `dry-run` input to `release.yml` that prints the
computed plan without publishing, tagging, or merging.
> - Risk: the workflow now fails hard if a major bump slips through
after capping, which will block nightly releases until the bump is
addressed.
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
c775131.</sup>
> <!-- Macroscope's review summary ends here -->
>
<!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
)

## What
The native gRPC HTTP/2 keepalive parameters are now overridable via
environment variables, read once per process in `apply_channel_options`.
**Defaults are unchanged** (fast detection).

## Why
herald-lite #70 (`UNAVAILABLE` / status 14 disconnects on herald dev)
was root-caused to libxmtp's own client-side h2 keepalive: ping every
16s, demand a PONG within 10s, even idle. On a cross-cloud Fly→AWS path
under load the PONG starves past the 10s deadline, hyper drops the
connection, tonic maps it to `UNAVAILABLE`, and the retry layer masks
it.

A **global** relax is wrong: mobile wants *fast* dead-connection
detection, herald wants *forgiving*. (A global 30/30 broke
`client::tests::should_reconnect`, which deliberately relies on ~26s
keepalive detection — its comment cites "16s interval + 10s timeout".)
So the defaults stay fast and a high-latency deployment opts out via
env.

## Env vars (all optional; unset = default)
| env var | overrides | default |
|---|---|---|
| `XMTP_GRPC_KEEPALIVE_INTERVAL_SECS` | `http2_keep_alive_interval` | 16
|
| `XMTP_GRPC_KEEPALIVE_TIMEOUT_SECS` | `keep_alive_timeout` | 10 |
| `XMTP_GRPC_TCP_KEEPALIVE_SECS` | `tcp_keepalive` (0 disables) | 16 |
| `XMTP_GRPC_KEEPALIVE_WHILE_IDLE` | `keep_alive_while_idle` | true |

Read once via a `LazyLock` (servers set them before start), so every
connection after the first is a struct copy, not an env read.
Native-only (wasm fetch has no h2 knobs). herald rides `@xmtp/node-sdk`,
so env vars need **no** binding/SDK change — it just sets them in its
Fly deploy.
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from e8978a2 to e8ceed3 Compare June 4, 2026 19:38
&self,
policy: xmtp_proto::xmtp::mls::message_contents::ExternalCommitPolicyV1,
) -> Result<(), GroupError> {
self.ensure_not_paused().await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium groups/mod.rs:2139

set_external_commit_policy and set_allow_external_commit do not check ConversationType::Dm before queuing the intent, unlike every other metadata-mutating API in this file (update_group_name, update_app_data, update_group_description, update_group_image_url_square, update_permission_policy, update_admin_list). A migrated DM can therefore have EXTERNAL_COMMIT_POLICY persisted, enabling QR/external-commit state on a conversation type the codebase explicitly forbids elsewhere (e.g., create_external_invite documents DM prohibition). Consider adding the standard DM guard to reject the operation early.

Also found in 1 other location(s)

crates/xmtp_mls/src/groups/external_invite.rs:107

create_external_invite never checks the group's conversation_type before exporting a QR invite. The PR explicitly requires DM groups to be rejected, but this implementation only checks pause state and policy presence. If a DM group carries an EXTERNAL_COMMIT_POLICY entry (for example from malformed or legacy state), this method will still mint an external invite for that DM, violating the intended restriction.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/groups/mod.rs around line 2139:

`set_external_commit_policy` and `set_allow_external_commit` do not check `ConversationType::Dm` before queuing the intent, unlike every other metadata-mutating API in this file (`update_group_name`, `update_app_data`, `update_group_description`, `update_group_image_url_square`, `update_permission_policy`, `update_admin_list`). A migrated DM can therefore have `EXTERNAL_COMMIT_POLICY` persisted, enabling QR/external-commit state on a conversation type the codebase explicitly forbids elsewhere (e.g., `create_external_invite` documents DM prohibition). Consider adding the standard DM guard to reject the operation early.

Also found in 1 other location(s):
- crates/xmtp_mls/src/groups/external_invite.rs:107 -- `create_external_invite` never checks the group's `conversation_type` before exporting a QR invite. The PR explicitly requires DM groups to be rejected, but this implementation only checks pause state and policy presence. If a DM group carries an `EXTERNAL_COMMIT_POLICY` entry (for example from malformed or legacy state), this method will still mint an external invite for that DM, violating the intended restriction.

Comment on lines +2775 to +2784
for mutation in &delta.mutations {
let key = match mutation {
TlsMapMutation::Insert { key, .. }
| TlsMapMutation::Update { key, .. }
| TlsMapMutation::Delete { key } => key,
};
if key != &joiner_id {
return Err(CommitValidationError::ExternalCommitAppDataUpdateOutOfScope);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium groups/validated_commit.rs:2775

enforce_app_data_update_scope accepts TlsMapMutation::Update and TlsMapMutation::Delete mutations for the joiner's inbox id, so an external commit can pass validation while deleting or mutating the joiner's GROUP_MEMBERSHIP entry instead of inserting it. This violates the function's documented contract that "the joiner is authorized to insert their own entry — but ONLY their own" and leaves the tree membership and AppData membership out of sync. Consider rejecting Update and Delete mutations to ensure the joiner can only insert their own entry.

-    for mutation in &delta.mutations {
-        let key = match mutation {
-            TlsMapMutation::Insert { key, .. }
-            | TlsMapMutation::Update { key, .. }
-            | TlsMapMutation::Delete { key } => key,
-        };
-        if key != &joiner_id {
-            return Err(CommitValidationError::ExternalCommitAppDataUpdateOutOfScope);
-        }
+    for mutation in &delta.mutations {
+        let key = match mutation {
+            TlsMapMutation::Insert { key, .. } => key,
+            _ => {
+                return Err(CommitValidationError::ExternalCommitAppDataUpdateOutOfScope);
+            }
+        };
+        if key != &joiner_id {
+            return Err(CommitValidationError::ExternalCommitAppDataUpdateOutOfScope);
+        }
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/groups/validated_commit.rs around lines 2775-2784:

`enforce_app_data_update_scope` accepts `TlsMapMutation::Update` and `TlsMapMutation::Delete` mutations for the joiner's inbox id, so an external commit can pass validation while deleting or mutating the joiner's `GROUP_MEMBERSHIP` entry instead of inserting it. This violates the function's documented contract that "the joiner is authorized to insert their own entry — but ONLY their own" and leaves the tree membership and AppData membership out of sync. Consider rejecting `Update` and `Delete` mutations to ensure the joiner can only insert their own entry.

/// Sugar wrapper over [`MlsGroup::set_external_commit_policy`] that
/// only flips the master switch and leaves the time-window controls
/// at their defaults (no automatic expiry / no staleness bound).
pub async fn set_allow_external_commit(&self, allowed: bool) -> Result<(), GroupError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium groups/mod.rs:2171

set_allow_external_commit(true) unconditionally generates new symmetric_key and external_group_id values via generate_symmetric_key() and generate_external_group_id(). When external commits are already enabled, this call rotates the invite material and invalidates previously issued QR codes, deep links, and uploaded blobs indexed by the old external_group_id. Consider reading the existing policy first and re-using its key/id when allow_external_commit is already true to preserve stability across calls.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/groups/mod.rs around line 2171:

`set_allow_external_commit(true)` unconditionally generates new `symmetric_key` and `external_group_id` values via `generate_symmetric_key()` and `generate_external_group_id()`. When external commits are already enabled, this call rotates the invite material and invalidates previously issued QR codes, deep links, and uploaded blobs indexed by the old `external_group_id`. Consider reading the existing policy first and re-using its key/id when `allow_external_commit` is already `true` to preserve stability across calls.

/// explicit nonce management is required (e.g. tests). Reusing a `(key,
/// nonce)` pair across distinct ciphertexts breaks ChaCha20Poly1305's security
/// argument.
pub fn wrap_group_info_with_nonce(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High invite/encrypted_group_info.rs:92

wrap_group_info_with_nonce encrypts plaintext with AEAD but writes epoch, group_state_hash, and expires_at_ns into EncryptedGroupInfoBlobV1 in cleartext without including them as AEAD associated data. An attacker can tamper with expires_at_ns (e.g., changing an expired timestamp to 0 or a far-future value) and unwrap_group_info will accept the modified blob because the AEAD tag only authenticates ciphertext. This allows expired QR invites to remain valid and breaks integrity of the envelope metadata. Consider including these fields as AEAD associated data so tampering is detected during decryption.

Also found in 1 other location(s)

crates/xmtp_mls_common/src/mls_ext/payload_encryption.rs:139

unwrap_payload_hpke silently accepts a missing secondary ciphertext at line 139 by returning vec![] whenever wrapped_secondary_payload is empty. Because the presence of the second ciphertext is not authenticated anywhere in wrapped_payload, an attacker can strip wrapped_secondary_payload in transit and the receiver will still accept the first plaintext as valid while seeing an empty secondary payload instead of a tamper error. Any protocol field carried in the secondary blob loses integrity protection for its presence.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls_common/src/invite/encrypted_group_info.rs around line 92:

`wrap_group_info_with_nonce` encrypts `plaintext` with AEAD but writes `epoch`, `group_state_hash`, and `expires_at_ns` into `EncryptedGroupInfoBlobV1` in cleartext without including them as AEAD associated data. An attacker can tamper with `expires_at_ns` (e.g., changing an expired timestamp to `0` or a far-future value) and `unwrap_group_info` will accept the modified blob because the AEAD tag only authenticates `ciphertext`. This allows expired QR invites to remain valid and breaks integrity of the envelope metadata. Consider including these fields as AEAD associated data so tampering is detected during decryption.

Also found in 1 other location(s):
- crates/xmtp_mls_common/src/mls_ext/payload_encryption.rs:139 -- `unwrap_payload_hpke` silently accepts a missing secondary ciphertext at line `139` by returning `vec![]` whenever `wrapped_secondary_payload` is empty. Because the presence of the second ciphertext is not authenticated anywhere in `wrapped_payload`, an attacker can strip `wrapped_secondary_payload` in transit and the receiver will still accept the first plaintext as valid while seeing an empty secondary payload instead of a tamper error. Any protocol field carried in the secondary blob loses integrity protection for its presence.

let joiner_leaf = staged_commit
.update_path_leaf_node()
.ok_or(CommitValidationError::ExternalCommitMissingPathLeaf)?;
let joiner_inbox_id = inbox_id_from_credential(joiner_leaf.credential())?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical groups/validated_commit.rs:986

from_external_commit extracts the joiner's inbox id from the credential (line 986) and checks that every Add proposal's credential string matches it (line 1000), but it never verifies that the joiner's installation key (or any added installation key) is actually registered in the association state for that inbox id. In contrast, from_staged_commit performs this check (lines 760-778) by looking up each participant's installation_id in the AssociationState at the relevant sequence_id. Without this lookup, a crafted external commit can claim any inbox id and join (or add installations) with signature keys that have no legitimate association to that inbox, bypassing XMTP's installation↔inbox binding. Consider adding the same association-state verification that from_staged_commit performs — this will require threading in a context (or equivalent identity-lookup capability) and making the function async.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/groups/validated_commit.rs around line 986:

`from_external_commit` extracts the joiner's inbox id from the credential (line 986) and checks that every `Add` proposal's credential string matches it (line 1000), but it never verifies that the joiner's installation key (or any added installation key) is actually registered in the association state for that inbox id. In contrast, `from_staged_commit` performs this check (lines 760-778) by looking up each participant's `installation_id` in the `AssociationState` at the relevant `sequence_id`. Without this lookup, a crafted external commit can claim any inbox id and join (or add installations) with signature keys that have no legitimate association to that inbox, bypassing XMTP's installation↔inbox binding. Consider adding the same association-state verification that `from_staged_commit` performs — this will require threading in a `context` (or equivalent identity-lookup capability) and making the function async.


let epoch = openmls_group.epoch().as_u64();
let group_state_hash = openmls_group.epoch_authenticator().as_slice().to_vec();
let blob_expires_at_ns = compute_blob_expiry(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low groups/external_invite.rs:149

create_external_invite computes blob_expires_at_ns using xmtp_common::time::now_ns() before returning to the caller, but the caller uploads encrypted_group_info only after this method returns. When policy.expire_in_ns is used, the relative expiry is anchored to invite creation time instead of upload time, so any delay between creation and upload can make a freshly uploaded blob already stale or expired.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/groups/external_invite.rs around line 149:

`create_external_invite` computes `blob_expires_at_ns` using `xmtp_common::time::now_ns()` before returning to the caller, but the caller uploads `encrypted_group_info` only after this method returns. When `policy.expire_in_ns` is used, the relative expiry is anchored to invite creation time instead of upload time, so any delay between creation and upload can make a freshly uploaded blob already stale or expired.

tylerhawkes and others added 9 commits June 4, 2026 20:02
…lpers) (#3670)

New module `xmtp_mls_common::invite::payload` providing pure-protocol
helpers around the `ExternalInvitePayload` proto:

- `generate_key()` / `generate_nonce()` — CSPRNG-backed material
generation.
- `group_id_hash(&[u8])` — sha256(group_id) for the service lookup key.
- `is_expired(payload, now_ns)` — ns-precision expiry check (zero = no
expiry).
- `validate_version(payload)` — rejects zero and future versions.
…ion (#3669)

## Summary

Pure-refactor PR — no behavior change.

- Moves `crates/xmtp_mls/src/groups/mls_ext/welcome_wrapper.rs` to
`crates/xmtp_mls_common/src/mls_ext/payload_encryption.rs`.
- Renames the four helpers from welcome-specific names to generic ones:
  - `wrap_welcome` -> `wrap_payload_hpke`
  - `unwrap_welcome` -> `unwrap_payload_hpke`
  - `wrap_welcome_symmetric` -> `wrap_payload_symmetric`
  - `unwrap_welcome_symmetric` -> `unwrap_payload_symmetric`
- The HPKE label becomes a parameter (was hardcoded to the welcome
label). Welcome callers still pass
`xmtp_configuration::WELCOME_HPKE_LABEL`; the QR-invite path (downstream
PR) will pass `XMTP_EXTERNAL_INVITE_LABEL`.
- All call sites in xmtp_mls updated to import from the new path.

## Why this PR

Sets up the encryption primitive used by the QR-invite (external commit)
flow. The functions are already generic — they don't depend on
welcome semantics. Lifting them to `xmtp_mls_common` lets the invite
module use them without pulling in `xmtp_mls`.

## Preserved XWING codepoint compat (#3662)

The relocated `wrap_payload_hpke` / `unwrap_payload_hpke` still call
`wrapper_algorithm.to_hpke_config()`, which lives in
`xmtp_id::WrapperAlgorithm` and pins `XWingMLKEM768Draft6` to
`KemAlgorithm::XWingDraft06Obsolete` (codepoint `0x004D`) for wire
compat with v1.9 / v1.10 clients. Workspace `Cargo.toml`'s
`[patch.crates-io]` block pinning the xmtp hpke-rs fork is untouched.
Verified post-rebase: `round_trip_xwing_hpke` test passes.
## Why

Creating an XMTP client required **two gRPC connections**. They were not
a message/payer or primary/secondary split — they were two
`tonic::Channel`s to the **same host**, wrapping the **same** client
type, built from the **same** backend config. The second ("sync")
connection was introduced on the assumption that a dedicated sync
channel buys isolation; that isolation was never realized (same
endpoint, no stream/concurrency separation), and the second client was
used by exactly two unary reads.

The redundancy cost every client an extra TCP connection + file
descriptor (the iOS SDK even kept a whole second client cache and dialed
the backend twice), with no measured benefit.

This collapses the design to a single connection.

> Note: the d14n payer/message split (`xmtp_api_d14n`) is a separate
internal concern and is **not** touched here.

## What changed

Commit-by-commit (each compiles + is self-contained):

1. **core** — remove `sync_api_client` field + `sync_api()` accessor
from `XmtpMlsLocalContext`/`XmtpSharedContext`; collapse
`ClientBuilder::api_clients(api, sync_api)` → `api_client(api)`; the two
`.sync_api()` readers (`query_group_messages`, `load_identity_updates`)
now use `.api()`.
2. **tests/benches** — every `.api_clients(a, b)` call site reduced to
`.api_client(a)`.
3. **apps** — cli + xmtp_debug build one client.
4. **node + wasm** — `create_client_inner` drops its second client
param; bindings build the backend once.
5. **mobile FFI** — **breaking**: `create_client` drops its second
`sync_api: Arc<XmtpApiClient>` argument (plus its tests/bench call
sites).
6. **sdks** — iOS + Android connect once, cache once, pass once. Android
`xmtpv3.kt` regenerated.

## Verification

- `just check`, `just wasm check`, `just lint` (rust), `just node lint`
— all green locally.
- `just android check` — green (bindings regenerated).
- Pre-existing markdown-lint failures in untouched repo docs
(`sdks/ios/README.md`, `VALIDATION.md`, `TEST_SCENARIOS.md`) were left
alone — not part of this change.

## ⚠️ Follow-up required before merge

- **iOS `xmtpv3.swift` must be regenerated on macOS.** The mobile FFI
signature changed, but the iOS xcframework + Swift glue can only be
regenerated on macOS/Xcode (`ios-xcframeworks-fast` has no Linux build),
so the generated `xmtpv3.swift` in this branch still shows the old 2-arg
signature. The hand-written `Client.swift` already targets the new 1-arg
signature. Run `./sdks/ios/dev/bindings` on a Mac then `just ios check`.
- Minor: `sdks/ios/dev/.setup` resolves the repo root via `git
rev-parse` with no jj fallback (Android's already has one), which breaks
iOS regen from a jj workspace.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- Macroscope's pull request summary starts here -->
<!-- Macroscope will only edit the content between these invisible
markers, and the markers themselves will not be visible in the GitHub
rendered markdown. -->
<!-- If you delete either of the start / end markers from your PR's
description, Macroscope will append its summary at the bottom of the
description. -->
> [!NOTE]
> ### Collapse two gRPC connections into one across all client bindings
> - Removes the dedicated sync API client from `ClientBuilder`,
`XmtpMlsLocalContext`, and the `XmtpSharedContext` trait in
[builder.rs](https://github.com/xmtp/libxmtp/pull/3721/files#diff-112b6dd965ad29739b36ccfb7f1283b1dbc20ec772591a4487a0898c6c5650f6)
and
[context.rs](https://github.com/xmtp/libxmtp/pull/3721/files#diff-ca0115cb0e326f7fa5c9bc660a4a1c1aff68c42a4c048eb79529bba4575aa909);
all network calls now route through a single API client.
> - Replaces the `.api_clients(api, sync_api)` builder method with
`.api_client(api)` across all call sites: mobile FFI, Node, WASM,
Android, iOS, and test utilities.
> - Operations previously routed through the sync client (group message
queries, identity update fetches) now use the primary API client.
> - Behavioral Change: the `create_client_does_not_hit_network` test
expectation for `get_identity_updates_v2` increases from 2 to 3 because
all identity-update reads are now counted on the single client.
>
> <!-- Macroscope's changelog starts here -->
> #### Changes since #3721 opened
>
> - Modified `FfiXmtpClient` to support graceful shutdown and worker
configuration [6f64938]
> - Updated root resolution logic in setup script to support multiple
version control systems [6f64938]
> <!-- Macroscope's changelog ends here -->
>
> <!-- Macroscope's review summary starts here -->
>
> <sup><a href="https://app.macroscope.com">Macroscope</a> summarized
5ab010b.</sup>
> <!-- Macroscope's review summary ends here -->
>
> <!-- macroscope-ui-refresh -->
<!-- Macroscope's pull request summary ends here -->
… generic AppDataUpdate intent

Defense-in-depth master switch + per-component declarative external
committer authorization for the QR-invite flow (MLS External Commits,
RFC 9420 §12.4.3.2).

- ComponentId::EXTERNAL_COMMIT_POLICY = 0x800C in xmtp_mls_common.
- Type-dispatch entry: ComponentType::Bytes for the new id.
- crates/xmtp_mls/src/groups/external_commit_policy.rs (new file):
  - load_external_commit_policy(group) -> Option<ExternalCommitPolicyV1>
  - is_external_commit_allowed(group) -> bool
  - external_committer_permissions_for(group, id) -> Option<ComponentPermissions>
- MlsGroup::set_external_commit_policy(policy) — granular setter
  writes the full ExternalCommitPolicyV1 via the generic
  AppDataUpdate intent (AppDataUpdateOp::Replace, since the
  component is full-replace Bytes-typed).
- MlsGroup::set_allow_external_commit(bool) — sugar wrapper.

Depends on L-0 (generic IntentKind::AppDataUpdate) for the intent
machinery. No new IntentKind variant added — the generic path covers
this and every future full-replace component setter.

No bootstrap synth: absent EXTERNAL_COMMIT_POLICY = disabled default.
First setter call creates the entry via AppDataUpdateOperation::Update
upsert semantics. Zero risk of cross-client serialization drift.
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from e8ceed3 to 57e7e30 Compare June 6, 2026 05:13
@tylerhawkes tylerhawkes requested a review from a team as a code owner June 6, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants