Skip to content

xmtp_mls: Client::join_group_by_external_invite (atomic external-commit join)#3674

Open
tylerhawkes wants to merge 1 commit into
tyler/qr-invite-l10-create-invitefrom
tyler/qr-invite-l11-join-by-invite
Open

xmtp_mls: Client::join_group_by_external_invite (atomic external-commit join)#3674
tylerhawkes wants to merge 1 commit into
tyler/qr-invite-l10-create-invitefrom
tyler/qr-invite-l11-join-by-invite

Conversation

@tylerhawkes
Copy link
Copy Markdown
Contributor

@tylerhawkes tylerhawkes commented May 21, 2026

Summary

Adds the join half of the QR-invite flow.

  • Client::join_group_by_external_invite(invite_payload_bytes, encrypted_blob_bytes) -> JoinByExternalInviteOutput<Context>
  • Decodes both protos, unwraps the encrypted GroupInfo, verifies the group_id_hash matches sha256(decrypted.group_id) (defense-in-depth against a malicious service swapping ciphertexts).
  • Validates the payload version and expiry.
  • Builds an atomic external commit: ExternalInit + by-value Adds for the joiner's other installations + AppDataUpdate(s) registering the joiner in the group membership component. Single commit, no fast-follow.
  • Publishes the commit, HPKE-Welcomes the other installations, then re-exports a refreshed GroupInfo at the new epoch (wrapped with the same key + fresh nonce) for the next scanner.
  • Output carries the new MlsGroup and the serialized refreshed blob.

Stack

Test plan

  • Integration coverage via PR-T1.
  • cargo check -p xmtp_mls passes.

Note

Add Client::join_group_by_external_invite for atomic external-commit group joining

  • Adds a new client/external_invite.rs module implementing Client::join_group_by_external_invite, which performs an atomic MLS external commit to join a group from a QR-style invite payload.
  • The method decodes and validates the ExternalInvitePayload, parses GroupInfo, fetches key packages for co-resident installations, and builds an external commit with inline Add proposals and an AppDataUpdate recording the new member.
  • After committing, it persists a StoredGroup row with Pending membership, publishes the commit with an HMAC binding, sends HPKE-wrapped Welcome messages to other installations, validates post-join policy, and returns a re-encrypted GroupInfo blob for upload back to the invite service.
  • Adds JoinByExternalInviteError for structured error reporting across all failure modes in the join flow.

Macroscope summarized b5b9a53.

@tylerhawkes tylerhawkes requested a review from a team as a code owner May 21, 2026 08:21
@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Claude finished @tylerhawkes's task —— View job


Code Review — 2026-06-06 05:15 UTC

After reviewing the PR changes, I've identified the following issues and observations:

Critical Issue

Database error silently ignored in persist_joined_group (line 745)

The store_or_ignore result is discarded with let _ =, making this the only place in the entire codebase where database persistence errors are silently swallowed. This is particularly problematic here because:

  • The openmls state has already been persisted (line 466 finalize())
  • If the libxmtp StoredGroup row fails to persist, the group exists in MLS state but is invisible to libxmtp APIs (find_groups, etc.)
  • The function returns Ok(()) even when persistence failed
  • This creates state inconsistency that's difficult to diagnose

Recommendation: Propagate the error:

stored_group.store_or_ignore(&context.db())?;
Ok(())

Macroscope Comment Clarification

The Macroscope review comment about hmac_keys(0..=0) at line 699 appears to be based on a misunderstanding. The hmac_keys() function uses wall-clock time epochs (30-day periods via utils::time::hmac_epoch()) for HMAC key rotation, NOT MLS group epochs. The parameter 0..=0 means "current 30-day period", which is correct for deduplication purposes. The pattern hmac_keys(0..=0) is used consistently throughout the codebase (see prepare_group_messages at mls_sync.rs:4294).

Positive Security Observations

  1. Symmetric key zeroization (line 611): Properly zeroizes the decryption key after use
  2. Expiry validation: Enforces time-based expiry on encrypted blobs (line 237)
  3. Post-join correlation check (lines 570-582): Validates external_group_id matches after join to detect stale QRs or service swaps
  4. Identity updates refresh (lines 261-267): Fetches latest association state before enumerating installations to avoid missing recently rotated installations

Test Coverage

The PR description notes 0% patch coverage (177 lines missing). While integration tests are planned for PR-T1, consider adding unit tests for:

  • Error path validation (malformed payloads, expired blobs, etc.)
  • Edge cases (no other installations, missing key packages)
  • The compute_refreshed_blob_expiry logic

Summary: One critical issue with database error handling that should be fixed. The Macroscope HMAC epoch comment is not valid. Otherwise, the implementation follows sound security practices.


@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 21, 2026

Approvability

Verdict: Needs human review

4 blocking correctness issues found. This PR introduces a substantial new feature (753 lines) implementing QR-invite external commit join flow with complex MLS protocol interactions and network operations. Additionally, there are multiple unresolved review comments including 2 high-severity issues (publishing before validation, hardcoded sync cursor) that warrant human review.

You can customize Macroscope's approvability policy. Learn more.

@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from ffaaf90 to c12079b Compare May 26, 2026 22:49
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l11-join-by-invite branch 2 times, most recently from df41a55 to dab038e Compare May 26, 2026 22:59
Comment on lines +749 to +750
// End-to-end test coverage for this entry point lives in the QR-invite
// integration test (T-1, libxmtp-integration-test). The join path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium client/external_invite.rs:749

persist_joined_group discards the Result from stored_group.store_or_ignore(&context.db()) with let _ = , so database errors are silently swallowed and the function returns Ok(()) even when persistence fails. This leaves MLS state persisted (via openmls finalize) but the libxmtp StoredGroup row missing, causing the group to be invisible to libxmtp APIs. Consider propagating the error by returning store_or_ignore(...)? or handling the Err case explicitly.

-    let _ = stored_group.store_or_ignore(&context.db());
+    stored_group.store_or_ignore(&context.db()).map_err(|e| {
+        ClientError::Generic(format!(
+            "join_group_by_external_invite: store_or_ignore failed: {e}"
+        ))
+    })?;
     Ok(())
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/client/external_invite.rs around lines 749-750:

`persist_joined_group` discards the `Result` from `stored_group.store_or_ignore(&context.db())` with `let _ = `, so database errors are silently swallowed and the function returns `Ok(())` even when persistence fails. This leaves MLS state persisted (via openmls finalize) but the libxmtp `StoredGroup` row missing, causing the group to be invisible to libxmtp APIs. Consider propagating the error by returning `store_or_ignore(...)?` or handling the `Err` case explicitly.

Evidence trail:
crates/xmtp_mls/src/client/external_invite.rs lines 728-751 (REVIEWED_COMMIT): `persist_joined_group` function discards `Result` on line 749 with `let _ =` and returns `Ok(())` on line 750. Caller at line 485 uses `?` operator expecting error propagation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 0% with 177 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (tyler/qr-invite-l10-create-invite@57e7e30). Learn more about missing BASE report.

Files with missing lines Patch % Lines
crates/xmtp_mls/src/client/external_invite.rs 0.00% 177 Missing ⚠️
Additional details and impacted files
@@                         Coverage Diff                          @@
##             tyler/qr-invite-l10-create-invite    #3674   +/-   ##
====================================================================
  Coverage                                     ?   83.82%           
====================================================================
  Files                                        ?      411           
  Lines                                        ?    60484           
  Branches                                     ?        0           
====================================================================
  Hits                                         ?    50700           
  Misses                                       ?     9784           
  Partials                                     ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l11-join-by-invite branch from dab038e to f0d63d4 Compare May 27, 2026 18:17
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from 5ebf785 to e8978a2 Compare May 27, 2026 18:33
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l11-join-by-invite branch from f0d63d4 to a6ec61d Compare May 27, 2026 18:33
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from e8978a2 to e8ceed3 Compare June 4, 2026 19:38
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l10-create-invite branch from e8ceed3 to 57e7e30 Compare June 6, 2026 05:13
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l11-join-by-invite branch from a6ec61d to b5b9a53 Compare June 6, 2026 05:13
let welcome_bytes = welcome_msg
.tls_serialize_detached()
.map_err(JoinByExternalInviteError::GroupInfoTlsDecode)?;
let welcome_metadata_bytes = WelcomeMetadata { message_cursor: 0 }.encode_to_vec();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High client/external_invite.rs:527

WelcomeMetadata { message_cursor: 0 } is hard-coded for HPKE-wrapped Welcomes to other installations, but process_new_welcome uses message_cursor to set the recipient's sync cursor. Each co-resident installation will store the membership-change message at cursor 0 instead of the actual commit position, causing subsequent syncs to replay history or mis-order messages. Consider passing the real cursor value from the external commit's publication response.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 527:

`WelcomeMetadata { message_cursor: 0 }` is hard-coded for HPKE-wrapped Welcomes to other installations, but `process_new_welcome` uses `message_cursor` to set the recipient's sync cursor. Each co-resident installation will store the membership-change message at cursor 0 instead of the actual commit position, causing subsequent syncs to replay history or mis-order messages. Consider passing the real cursor value from the external commit's publication response.

}
if policy.external_group_id.as_slice() != payload_external_group_id.as_slice() {
return Err(JoinByExternalInviteError::ExternalGroupIdMismatch.into());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low client/external_invite.rs:582

compute_refreshed_blob_expiry uses now_ns captured at the start of the function, so expire_in_ns policies produce a TTL shortened by the full join latency. When that latency exceeds the relative expiry window, the returned refreshed_encrypted_group_info blob is already expired. Consider capturing a fresh now_ns just before the expiry computation so the TTL reflects the actual remaining validity window.

-        let refreshed_expires_at_ns = compute_refreshed_blob_expiry(&policy, now);
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 582:

`compute_refreshed_blob_expiry` uses `now_ns` captured at the start of the function, so `expire_in_ns` policies produce a TTL shortened by the full join latency. When that latency exceeds the relative expiry window, the returned `refreshed_encrypted_group_info` blob is already expired. Consider capturing a fresh `now_ns` just before the expiry computation so the TTL reflects the actual remaining validity window.

Comment on lines +461 to +469
let (mls_group, bundle) = builder
.build(provider.rand(), provider.crypto(), signer, |_| true)
.map_err(|e| {
JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}"))
})?
.finalize(&provider)
.map_err(|e| {
JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("finalize: {e}"))
})?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium client/external_invite.rs:461

If send_group_messages fails, the method returns PublishError after builder.finalize() has already persisted the new MLS group state locally and persist_joined_group has created the libxmtp wrapper row. The client is left thinking it joined the group at the new epoch even though the external commit never reached XMTP delivery, so the operation is not safely retryable and the local state diverges from the rest of the group. Consider reordering so that publishing happens before local persistence, or implementing cleanup/rollback logic on publish failure.

-        let (mls_group, bundle) = builder
-            .build(provider.rand(), provider.crypto(), signer, |_| true)
-            .map_err(|e| {
-                JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}"))
-            })?
-            .finalize(&provider)
-            .map_err(|e| {
-                JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("finalize: {e}"))
-            })?;
+        let (commit, bundle) = builder
+            .build(provider.rand(), provider.crypto(), signer, |_| true)
+            .map_err(|e| {
+                JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}"))
+            })?;
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around lines 461-469:

If `send_group_messages` fails, the method returns `PublishError` after `builder.finalize()` has already persisted the new MLS group state locally and `persist_joined_group` has created the libxmtp wrapper row. The client is left thinking it joined the group at the new epoch even though the external commit never reached XMTP delivery, so the operation is not safely retryable and the local state diverges from the rest of the group. Consider reordering so that publishing happens before local persistence, or implementing cleanup/rollback logic on publish failure.

created_at_ns,
);

// 8. Publish the PublicMessage external commit to XMTP delivery.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High client/external_invite.rs:493

join_group_by_external_invite publishes the external commit and device welcomes (lines 503-561) before verifying that the joined group's EXTERNAL_COMMIT_POLICY.external_group_id matches the QR payload (lines 580-582). If the invite service returns a GroupInfo for the wrong group encrypted under the same symmetric key, the client joins and publishes to that wrong group before ExternalGroupIdMismatch is returned. Move the external_group_id validation (steps 10a-10b) before the publish steps (steps 8-9) so the mismatch is caught before any network side effects occur.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 493:

`join_group_by_external_invite` publishes the external commit and device welcomes (lines 503-561) before verifying that the joined group's `EXTERNAL_COMMIT_POLICY.external_group_id` matches the QR payload (lines 580-582). If the invite service returns a `GroupInfo` for the wrong group encrypted under the same symmetric key, the client joins and publishes to that wrong group before `ExternalGroupIdMismatch` is returned. Move the external_group_id validation (steps 10a-10b) before the publish steps (steps 8-9) so the mismatch is caught before any network side effects occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant