feat: add e3 key forward secrecy [skip-line-limit]#1591
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds per-E3 forward-secrecy cipher support: new Cipher key APIs, E3CipherExtension to generate/encrypt/store per-E3 AES-GCM keys under a master cipher, lifecycle purge and hydrate restore, keyshare changes to use per-E3/multithread ciphers, routing updates for timeout-terminal events, and builder wiring/tests. ChangesPer-E3 Cipher Extension Workflow
Sequence DiagramsequenceDiagram
participant RequestRouter
participant E3CipherExtension
participant MasterCipher
participant KVStore
participant E3Context
participant ThresholdKeyshareExtension
RequestRouter->>E3CipherExtension: E3Requested(e3_id)
E3CipherExtension->>E3CipherExtension: generate AES-256-GCM Cipher (per-E3)
E3CipherExtension->>MasterCipher: encrypt(per-E3 key bytes)
E3CipherExtension->>KVStore: store encrypted bytes at //e3_keys/{e3_id}
E3CipherExtension->>E3Context: inject per-E3 Cipher (E3_CIPHER_KEY)
RequestRouter->>ThresholdKeyshareExtension: create ThresholdKeyshareParams
ThresholdKeyshareExtension->>E3Context: resolve per-E3 cipher (or fallback master)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/request/src/domain/routing.rs (1)
204-219: 💤 Low valueMinor test duplication: consolidate or remove redundant test.
Both
stage_changed_to_failed_ignored_when_completed(line 204) andstage_changed_to_failed_ignored_when_already_completed(line 390) test the same routing behavior:E3StageChanged(Failed)is ignored when the E3 is already completed. The only difference is theprevious_stagevalue (CiphertextReadyvsCommitteeFinalized), but the router doesn't branch onprevious_stage—onlynew_stage.Consider consolidating into a single test or removing one.
Also applies to: 390-403
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/request/src/domain/routing.rs` around lines 204 - 219, Two tests, stage_changed_to_failed_ignored_when_completed and stage_changed_to_failed_ignored_when_already_completed, duplicate the same assertion that RequestRouter::route returns RoutingDecision::Ignore for E3StageChanged with new_stage E3Stage::Failed when the e3_id is in the completed set; remove one of them (or merge into a single test) so only one test asserts that behavior using from_data(E3StageChanged { e3_id, new_stage: E3Stage::Failed }) with completed containing the id and asserting RequestRouter::route(&msg, &completed) == RoutingDecision::Ignore; keep the most clearly named test and update its previous_stage value if you want to still exercise a different prior stage for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/request/src/e3_cipher.rs`:
- Around line 101-104: The current Err branch in the create_and_store result
swallows the error (tracing::error!(e3_id = %data.e3_id, "failed to create E3
cipher: {e}")) which allows silent fallback to the master cipher and breaks
forward secrecy; change this handling either to propagate the failure (return
Err(...) / propagate the create_and_store error from the enclosing function so
the round aborts) or, if degraded operation is acceptable, explicitly downgrade
to tracing::warn! and set a clearly-documented fallback flag/event (emit a bus
error or status entry) so downstream code knows the per-E3 cipher is missing;
update the create_and_store error branch accordingly and reference the same
e3_id in the new return/error/event so callers can correlate the failure.
---
Nitpick comments:
In `@crates/request/src/domain/routing.rs`:
- Around line 204-219: Two tests, stage_changed_to_failed_ignored_when_completed
and stage_changed_to_failed_ignored_when_already_completed, duplicate the same
assertion that RequestRouter::route returns RoutingDecision::Ignore for
E3StageChanged with new_stage E3Stage::Failed when the e3_id is in the completed
set; remove one of them (or merge into a single test) so only one test asserts
that behavior using from_data(E3StageChanged { e3_id, new_stage: E3Stage::Failed
}) with completed containing the id and asserting RequestRouter::route(&msg,
&completed) == RoutingDecision::Ignore; keep the most clearly named test and
update its previous_stage value if you want to still exercise a different prior
stage for clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0040c354-515a-4f3c-8b12-d80f8c63a2da
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
crates/ciphernode-builder/src/ciphernode_builder.rscrates/crypto/src/cipher.rscrates/events/src/enclave_event/e3_failed.rscrates/events/src/store_keys.rscrates/keyshare/src/actors/threshold_keyshare.rscrates/keyshare/src/ext.rscrates/request/Cargo.tomlcrates/request/src/domain/routing.rscrates/request/src/e3_cipher.rscrates/request/src/lib.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/keyshare/src/actors/threshold_keyshare.rs (1)
2228-2232: ⚡ Quick winExercise the actor path with distinct ciphers.
These tests prove the
Cipherprimitive rejects the wrong key, but they never driveThresholdKeysharewith differentcipherandmultithread_ciphervalues. A regression back toself.cipherinhandle_shares_generatedwould still pass here becausestart_actorwires the same key into both fields.Also applies to: 2350-2422
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/keyshare/src/actors/threshold_keyshare.rs` around lines 2228 - 2232, Create and use two distinct Cipher instances when constructing ThresholdKeyshareParams so tests exercise the actor path with different keys: call Cipher::from_password twice (e.g., for "test-password" and a different password) and pass one into the cipher field and the other into multithread_cipher before calling ThresholdKeyshare::new (and likewise update the other test block referenced around the 2350-2422 region); this ensures start_actor can't mask a regression where handle_shares_generated incorrectly uses self.cipher instead of the multithreaded cipher.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/keyshare/src/actors/threshold_keyshare.rs`:
- Around line 2228-2232: Create and use two distinct Cipher instances when
constructing ThresholdKeyshareParams so tests exercise the actor path with
different keys: call Cipher::from_password twice (e.g., for "test-password" and
a different password) and pass one into the cipher field and the other into
multithread_cipher before calling ThresholdKeyshare::new (and likewise update
the other test block referenced around the 2350-2422 region); this ensures
start_actor can't mask a regression where handle_shares_generated incorrectly
uses self.cipher instead of the multithreaded cipher.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3b8f2384-5c52-47fc-b4e3-211bfb965739
📒 Files selected for processing (2)
crates/keyshare/src/actors/threshold_keyshare.rscrates/keyshare/src/ext.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/keyshare/src/ext.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/keyshare/src/domain/share_generation.rs (1)
63-67: ⚡ Quick winMake the two cipher roles non-positional.
Lines 64-67 take two adjacent
&Ciphervalues with opposite downstream contracts. A caller can swap them without a compiler error and break either Multithread proof handling or local own-share recovery at runtime. A tiny wrapper struct or distinct newtypes would make this API much harder to misuse.♻️ Proposed refactor
+pub(crate) struct SharePlanCiphers<'a> { + pub transport: &'a Cipher, + pub local: &'a Cipher, +} + pub(crate) fn build_shares_generated_plan( - transport_cipher: &Cipher, - local_cipher: &Cipher, + ciphers: SharePlanCiphers<'_>, share_enc_preset: BfvPreset, party_id: u64, threshold_m: u64, @@ - transport_cipher, + ciphers.transport, @@ - local_cipher, + ciphers.local,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/keyshare/src/domain/share_generation.rs` around lines 63 - 67, The parameters transport_cipher and local_cipher of build_shares_generated_plan are two plain &Cipher references whose roles can be accidentally swapped; introduce distinct wrapper types (e.g., TransportCipher(Cipher) and LocalCipher(Cipher) or a small struct with named fields) and change the function signature to accept &TransportCipher and &LocalCipher (or the struct) instead of two &Cipher to make the roles non-positional; update all callers to wrap their Cipher values in the new types and adjust any pattern matching or method accessors to unwrap the inner Cipher where needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/keyshare/src/domain/share_generation.rs`:
- Around line 63-67: The parameters transport_cipher and local_cipher of
build_shares_generated_plan are two plain &Cipher references whose roles can be
accidentally swapped; introduce distinct wrapper types (e.g.,
TransportCipher(Cipher) and LocalCipher(Cipher) or a small struct with named
fields) and change the function signature to accept &TransportCipher and
&LocalCipher (or the struct) instead of two &Cipher to make the roles
non-positional; update all callers to wrap their Cipher values in the new types
and adjust any pattern matching or method accessors to unwrap the inner Cipher
where needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3a951f0f-c68e-4848-b9e4-b526566e757f
📒 Files selected for processing (2)
crates/keyshare/src/actors/threshold_keyshare.rscrates/keyshare/src/domain/share_generation.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/keyshare/src/actors/threshold_keyshare.rs
|
still working on this! |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/keyshare/src/actors/threshold_keyshare.rs (1)
2130-2136:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPreserve the phase-specific timeout event here too.
Unlike the encryption-key and threshold-share timeout handlers above, this branch never republishes
DecryptionKeySharedCollectionFailedbefore emittingE3Failed. That drops the structuredmissing_parties/reasonpayload for the decryption phase and makes this timeout path behave differently in history/debug consumers.🧾 Keep timeout telemetry consistent
self.decryption_key_shared_collector = None; + self.bus.publish_without_context(msg.clone())?; self.bus.publish_without_context(E3Failed { e3_id: msg.e3_id.clone(), failed_at_stage: E3Stage::CommitteeFinalized, reason: FailureReason::DecryptionTimeout,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/keyshare/src/actors/threshold_keyshare.rs` around lines 2130 - 2136, This branch currently clears self.decryption_key_shared_collector and publishes E3Failed without first emitting the phase-specific timeout event; restore parity by publishing DecryptionKeySharedCollectionFailed (including the collector's missing_parties/related payload and reason FailureReason::DecryptionTimeout) before publishing E3Failed, using the same data sourced from self.decryption_key_shared_collector (or a snapshot of it) so consumers receive the structured timeout telemetry for the decryption phase; ensure you still clear self.decryption_key_shared_collector afterwards and keep the E3Failed publish as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/keyshare/src/actors/threshold_keyshare.rs`:
- Around line 2130-2136: This branch currently clears
self.decryption_key_shared_collector and publishes E3Failed without first
emitting the phase-specific timeout event; restore parity by publishing
DecryptionKeySharedCollectionFailed (including the collector's
missing_parties/related payload and reason FailureReason::DecryptionTimeout)
before publishing E3Failed, using the same data sourced from
self.decryption_key_shared_collector (or a snapshot of it) so consumers receive
the structured timeout telemetry for the decryption phase; ensure you still
clear self.decryption_key_shared_collector afterwards and keep the E3Failed
publish as-is.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d48bfb54-ef43-4eda-a75f-e6d144682f52
📒 Files selected for processing (5)
crates/ciphernode-builder/src/ciphernode_builder.rscrates/keyshare/src/actors/threshold_keyshare.rscrates/keyshare/src/domain/share_generation.rscrates/keyshare/src/ext.rscrates/multithread/src/multithread.rs
💤 Files with no reviewable changes (1)
- crates/keyshare/src/ext.rs
|
WIP WIP WIP |
WIP, will continue after critical path
--
Each E3 round gets a randomly-generated AES-256-GCM key ("E3 key"). All sensitive cryptographic material for that round (BFV keyshares, threshold shares, decryption shares) is encrypted with this per-E3 key rather than the node's master cipher.
The E3 key itself is stored in the KV store encrypted under the master cipher, so the node can survive restarts mid-round without user interaction. When the round finishes (or times out), the E3 key is deleted from the store. From that point on, the encrypted blobs in the commit log are permanently irrecoverable (even if the master passphrase is later leaked).
The master cipher is a key-encryption key (KEK); the per-E3 keys are data-encryption keys (DEKs). The DEKs are ephemeral: they exist only for the lifetime of a round.
Note that previously, a timeout mid-round (DKG, committee formation, compute, decryption) would leave the E3 context and its per-E3 key alive indefinitely; resulting in a resource/memory leak. Now, timeout failures trigger
E3RequestCompletethe same way a successful round does, which tears down the context and purges the E3 key.Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation
Other