xmtp_mls: Client::join_group_by_external_invite (atomic external-commit join)#3674
xmtp_mls: Client::join_group_by_external_invite (atomic external-commit join)#3674tylerhawkes wants to merge 1 commit into
Conversation
|
Claude finished @tylerhawkes's task —— View job Code Review — 2026-06-06 05:15 UTCAfter reviewing the PR changes, I've identified the following issues and observations: Critical IssueDatabase error silently ignored in The
Recommendation: Propagate the error: stored_group.store_or_ignore(&context.db())?;
Ok(())Macroscope Comment ClarificationThe Macroscope review comment about Positive Security Observations
Test CoverageThe PR description notes 0% patch coverage (177 lines missing). While integration tests are planned for PR-T1, consider adding unit tests for:
Summary: One critical issue with database error handling that should be fixed. The Macroscope HMAC epoch comment is not valid. Otherwise, the implementation follows sound security practices. |
ApprovabilityVerdict: Needs human review 4 blocking correctness issues found. This PR introduces a substantial new feature (753 lines) implementing QR-invite external commit join flow with complex MLS protocol interactions and network operations. Additionally, there are multiple unresolved review comments including 2 high-severity issues (publishing before validation, hardcoded sync cursor) that warrant human review. You can customize Macroscope's approvability policy. Learn more. |
ffaaf90 to
c12079b
Compare
df41a55 to
dab038e
Compare
| // End-to-end test coverage for this entry point lives in the QR-invite | ||
| // integration test (T-1, libxmtp-integration-test). The join path |
There was a problem hiding this comment.
🟡 Medium client/external_invite.rs:749
persist_joined_group discards the Result from stored_group.store_or_ignore(&context.db()) with let _ = , so database errors are silently swallowed and the function returns Ok(()) even when persistence fails. This leaves MLS state persisted (via openmls finalize) but the libxmtp StoredGroup row missing, causing the group to be invisible to libxmtp APIs. Consider propagating the error by returning store_or_ignore(...)? or handling the Err case explicitly.
- let _ = stored_group.store_or_ignore(&context.db());
+ stored_group.store_or_ignore(&context.db()).map_err(|e| {
+ ClientError::Generic(format!(
+ "join_group_by_external_invite: store_or_ignore failed: {e}"
+ ))
+ })?;
Ok(())🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/client/external_invite.rs around lines 749-750:
`persist_joined_group` discards the `Result` from `stored_group.store_or_ignore(&context.db())` with `let _ = `, so database errors are silently swallowed and the function returns `Ok(())` even when persistence fails. This leaves MLS state persisted (via openmls finalize) but the libxmtp `StoredGroup` row missing, causing the group to be invisible to libxmtp APIs. Consider propagating the error by returning `store_or_ignore(...)?` or handling the `Err` case explicitly.
Evidence trail:
crates/xmtp_mls/src/client/external_invite.rs lines 728-751 (REVIEWED_COMMIT): `persist_joined_group` function discards `Result` on line 749 with `let _ =` and returns `Ok(())` on line 750. Caller at line 485 uses `?` operator expecting error propagation.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## tyler/qr-invite-l10-create-invite #3674 +/- ##
====================================================================
Coverage ? 83.82%
====================================================================
Files ? 411
Lines ? 60484
Branches ? 0
====================================================================
Hits ? 50700
Misses ? 9784
Partials ? 0 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
dab038e to
f0d63d4
Compare
5ebf785 to
e8978a2
Compare
f0d63d4 to
a6ec61d
Compare
e8978a2 to
e8ceed3
Compare
e8ceed3 to
57e7e30
Compare
a6ec61d to
b5b9a53
Compare
| let welcome_bytes = welcome_msg | ||
| .tls_serialize_detached() | ||
| .map_err(JoinByExternalInviteError::GroupInfoTlsDecode)?; | ||
| let welcome_metadata_bytes = WelcomeMetadata { message_cursor: 0 }.encode_to_vec(); |
There was a problem hiding this comment.
🟠 High client/external_invite.rs:527
WelcomeMetadata { message_cursor: 0 } is hard-coded for HPKE-wrapped Welcomes to other installations, but process_new_welcome uses message_cursor to set the recipient's sync cursor. Each co-resident installation will store the membership-change message at cursor 0 instead of the actual commit position, causing subsequent syncs to replay history or mis-order messages. Consider passing the real cursor value from the external commit's publication response.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 527:
`WelcomeMetadata { message_cursor: 0 }` is hard-coded for HPKE-wrapped Welcomes to other installations, but `process_new_welcome` uses `message_cursor` to set the recipient's sync cursor. Each co-resident installation will store the membership-change message at cursor 0 instead of the actual commit position, causing subsequent syncs to replay history or mis-order messages. Consider passing the real cursor value from the external commit's publication response.
| } | ||
| if policy.external_group_id.as_slice() != payload_external_group_id.as_slice() { | ||
| return Err(JoinByExternalInviteError::ExternalGroupIdMismatch.into()); | ||
| } |
There was a problem hiding this comment.
🟢 Low client/external_invite.rs:582
compute_refreshed_blob_expiry uses now_ns captured at the start of the function, so expire_in_ns policies produce a TTL shortened by the full join latency. When that latency exceeds the relative expiry window, the returned refreshed_encrypted_group_info blob is already expired. Consider capturing a fresh now_ns just before the expiry computation so the TTL reflects the actual remaining validity window.
- let refreshed_expires_at_ns = compute_refreshed_blob_expiry(&policy, now);🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 582:
`compute_refreshed_blob_expiry` uses `now_ns` captured at the start of the function, so `expire_in_ns` policies produce a TTL shortened by the full join latency. When that latency exceeds the relative expiry window, the returned `refreshed_encrypted_group_info` blob is already expired. Consider capturing a fresh `now_ns` just before the expiry computation so the TTL reflects the actual remaining validity window.
| let (mls_group, bundle) = builder | ||
| .build(provider.rand(), provider.crypto(), signer, |_| true) | ||
| .map_err(|e| { | ||
| JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}")) | ||
| })? | ||
| .finalize(&provider) | ||
| .map_err(|e| { | ||
| JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("finalize: {e}")) | ||
| })?; |
There was a problem hiding this comment.
🟡 Medium client/external_invite.rs:461
If send_group_messages fails, the method returns PublishError after builder.finalize() has already persisted the new MLS group state locally and persist_joined_group has created the libxmtp wrapper row. The client is left thinking it joined the group at the new epoch even though the external commit never reached XMTP delivery, so the operation is not safely retryable and the local state diverges from the rest of the group. Consider reordering so that publishing happens before local persistence, or implementing cleanup/rollback logic on publish failure.
- let (mls_group, bundle) = builder
- .build(provider.rand(), provider.crypto(), signer, |_| true)
- .map_err(|e| {
- JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}"))
- })?
- .finalize(&provider)
- .map_err(|e| {
- JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("finalize: {e}"))
- })?;
+ let (commit, bundle) = builder
+ .build(provider.rand(), provider.crypto(), signer, |_| true)
+ .map_err(|e| {
+ JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}"))
+ })?;🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around lines 461-469:
If `send_group_messages` fails, the method returns `PublishError` after `builder.finalize()` has already persisted the new MLS group state locally and `persist_joined_group` has created the libxmtp wrapper row. The client is left thinking it joined the group at the new epoch even though the external commit never reached XMTP delivery, so the operation is not safely retryable and the local state diverges from the rest of the group. Consider reordering so that publishing happens before local persistence, or implementing cleanup/rollback logic on publish failure.
| created_at_ns, | ||
| ); | ||
|
|
||
| // 8. Publish the PublicMessage external commit to XMTP delivery. |
There was a problem hiding this comment.
🟠 High client/external_invite.rs:493
join_group_by_external_invite publishes the external commit and device welcomes (lines 503-561) before verifying that the joined group's EXTERNAL_COMMIT_POLICY.external_group_id matches the QR payload (lines 580-582). If the invite service returns a GroupInfo for the wrong group encrypted under the same symmetric key, the client joins and publishes to that wrong group before ExternalGroupIdMismatch is returned. Move the external_group_id validation (steps 10a-10b) before the publish steps (steps 8-9) so the mismatch is caught before any network side effects occur.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 493:
`join_group_by_external_invite` publishes the external commit and device welcomes (lines 503-561) before verifying that the joined group's `EXTERNAL_COMMIT_POLICY.external_group_id` matches the QR payload (lines 580-582). If the invite service returns a `GroupInfo` for the wrong group encrypted under the same symmetric key, the client joins and publishes to that wrong group before `ExternalGroupIdMismatch` is returned. Move the external_group_id validation (steps 10a-10b) before the publish steps (steps 8-9) so the mismatch is caught before any network side effects occur.
Summary
Adds the join half of the QR-invite flow.
Client::join_group_by_external_invite(invite_payload_bytes, encrypted_blob_bytes) -> JoinByExternalInviteOutput<Context>group_id_hashmatchessha256(decrypted.group_id)(defense-in-depth against a malicious service swapping ciphertexts).MlsGroupand the serialized refreshed blob.Stack
Test plan
Note
Add
Client::join_group_by_external_invitefor atomic external-commit group joiningclient/external_invite.rsmodule implementingClient::join_group_by_external_invite, which performs an atomic MLS external commit to join a group from a QR-style invite payload.ExternalInvitePayload, parsesGroupInfo, fetches key packages for co-resident installations, and builds an external commit with inlineAddproposals and anAppDataUpdaterecording the new member.StoredGrouprow withPendingmembership, publishes the commit with an HMAC binding, sends HPKE-wrapped Welcome messages to other installations, validates post-join policy, and returns a re-encryptedGroupInfoblob for upload back to the invite service.JoinByExternalInviteErrorfor structured error reporting across all failure modes in the join flow.Macroscope summarized b5b9a53.