feat(encryption): Stage 6D-5 - storage-layer cutover toggle (WithStorageEnvelopeGate)#809
Conversation
…ageEnvelopeGate)
Stage 6D-5 lands the §6.2 storage-layer toggle that the
Stage 6D-4 sidecar's StorageEnvelopeActive field gates. The
storage layer's encryptForKey path now consults the new toggle
in addition to the existing activeStorageKeyID closure, so
operators get the §7.1 Phase 0 / Phase 1 split: every node has
the DEK provisioned (Phase 0) before the cluster quiesces the
cutover Raft entry that flips the bit (Phase 1).
Operator-inert until 6D-6 wires the cipher + gate in main.go
and exposes the EnableStorageEnvelope RPC.
## Wire-up (store/encryption_glue.go)
- New StorageEnvelopeActive type: func() bool. Doc spells out
that this is the §7.1 Phase 1 cutover signal, distinct from
ActiveStorageKeyID's "is a DEK provisioned" signal, and that
the gate is consulted on writes only - reads always dispatch
on the per-version encryption_state bit (so mixed cleartext +
encrypted versions for the same key stay readable across the
cutover, per design doc §6.3).
- New WithStorageEnvelopeGate PebbleStoreOption. Nil argument is
a no-op (legacy "encrypt whenever DEK is active" posture)
matching the pre-existing convention from WithEncryption.
- encryptForKey adds a single AND-gate: if the gate is wired
AND returns false, force encStateCleartext even though a DEK
is active. The branch sits AFTER the existing
(cipher == nil || activeStorageKeyID == nil) early-return so
the legacy fast paths are unaffected.
## Tests (store/lsm_store_encryption_test.go)
- gatedFixture + newGatedEncryptedStore helper bundles the gate
wiring with idempotent close-then-on-disk-peek (Pebble panics
on double-close).
- TestStorageEnvelopeGate_PreCutoverForcesCleartext - gate
returns false, cipher + active DEK both wired - on-disk
encryption_state is 0b00 (cleartext).
- TestStorageEnvelopeGate_PostCutoverEmitsEnvelope - gate
returns true, on-disk encryption_state is 0b01 (encrypted).
- TestStorageEnvelopeGate_FlipMidLife - pre-cutover write,
flip gate, post-cutover write; both reads succeed live and
on-disk bits actually differ. Pins the §6.3 mixed-mode-per-key
property.
- TestStorageEnvelopeGate_NilGatePreservesLegacyBehavior -
newEncryptedStoreFixture (no gate) still emits envelopes when
the DEK is active, proving the pre-6D-5 posture is unchanged.
- TestStorageEnvelopeGate_NilArgumentIsNoOp - explicit nil into
WithStorageEnvelopeGate must not nil-deref on the next Put.
## Caller audit (semantic change)
The new option is purely additive; existing WithEncryption call
sites are not touched. encryptForKey's existing fast-path callers
(every store.Put / PutAt) see the gate skipped when the field is
nil (the default), so this PR cannot regress any pre-existing
test or production code path. Audit:
- All current store.WithEncryption call sites are in
store/lsm_store_encryption{,_prop}_test.go: 5 sites, none of
them set WithStorageEnvelopeGate, all retain the legacy
"encrypt when DEK active" posture.
- main.go does NOT currently call store.WithEncryption at all
(storage-layer encryption wiring lands in 6D-6 alongside the
cutover RPC), so this PR is operator-inert.
## Self-review (5 lenses)
1. Data loss - no. encryptForKey's only behavioural change is
one additional path to encStateCleartext when the gate is
wired AND returns false. The Pebble write itself is
unchanged (same encodeValue + Set path).
2. Concurrency / distributed failures - the gate closure is
read once per Put and is plain field-read on the caller's
atomic.Bool / sync.Map view (test uses a *bool with the test's
serial Put pattern). No new locking; no new races. Read path
is untouched. Replicas flip StorageEnvelopeActive
deterministically at FSM apply (proved by the 6D-4 tests), so
no inter-replica skew window.
3. Performance - one extra branch + one closure call per Put
when wired; nil-check skips both when not wired. No
allocations.
4. Data consistency - the §6.3 mixed-mode invariant
(cleartext + encrypted versions for the same key) is
exercised by TestStorageEnvelopeGate_FlipMidLife. The §4.1
AAD binding for envelope writes is unchanged.
5. Test coverage - 5 new tests cover (pre-cutover cleartext,
post-cutover envelope, mid-life flip, nil-gate legacy
posture, nil-argument no-op). Existing TestEncryption_*
tests continue to pass with no modification.
## Verification
- go test -race -timeout 180s ./store/... ./internal/encryption/...
./kv/... - all green.
- golangci-lint --new-from-rev=origin/main - 0 issues.
Refs: design doc §6.2 (Stage 6D-5 row of §7 decomposition table).
|
Warning Review limit reached
Your plan currently allows 1 review/hour. Refill in 26 minutes and 24 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughImplements a storage-envelope gate ( ChangesStorage Envelope Gate Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the Stage 6D-5 storage-layer toggle for data-at-rest encryption, introducing a StorageEnvelopeActive gate to control the transition between cleartext and encrypted writes. The changes include the definition of the gate, its integration into the encryptForKey logic, and extensive tests to verify correct behavior across various cutover scenarios. Feedback focused on ensuring that errors from Close() methods in test cleanup functions are properly reported rather than silently ignored.
| g := &gatedFixture{mvcc: mvcc, dir: dir} | ||
| t.Cleanup(func() { | ||
| if !g.closed { | ||
| _ = g.mvcc.Close() |
There was a problem hiding this comment.
The error from g.mvcc.Close() is silently ignored in the cleanup function. Per the general rules, errors from Close() methods on resources should not be ignored to ensure that resource leaks or cleanup problems are visible. Even in test code, it is better to log or report these errors using t.Errorf.
if err := g.mvcc.Close(); err != nil {
t.Errorf("cleanup close pebble: %v", err)
}References
- Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.
| if err != nil { | ||
| t.Fatalf("NewPebbleStore: %v", err) | ||
| } | ||
| t.Cleanup(func() { _ = mvcc.Close() }) |
There was a problem hiding this comment.
The error from mvcc.Close() is silently ignored in the cleanup function. Following the general rules, errors from Close() methods should be logged or reported to surface potential resource leaks or cleanup failures.
t.Cleanup(func() {
if err := mvcc.Close(); err != nil {
t.Errorf("cleanup close pebble: %v", err)
}
})References
- Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@store/lsm_store_encryption_test.go`:
- Around line 514-520: The test for legacy envelope emission with
WithStorageEnvelopeGate(nil) only checks round-trip data but not that data was
actually stored encrypted; update the test after closing the store to assert the
store's encryption_state equals encStateEncrypted. Specifically, in the test
containing mvcc.PutAt(...) and mustGet(...), add an assertion that reads the
store's encryption_state field (or exposes it via the store API) and compare it
to encStateEncrypted to ensure envelopes were emitted encrypted when using
WithStorageEnvelopeGate(nil).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fb15cb46-01d6-4138-b396-3233a5ba9daf
📒 Files selected for processing (4)
docs/design/2026_05_18_partial_6d_enable_storage_envelope.mdstore/encryption_glue.gostore/lsm_store.gostore/lsm_store_encryption_test.go
… minor wire-level assertion Three review findings on PR809, all in store/lsm_store_encryption_test.go. ## gemini medium - newGatedEncryptedStore cleanup ignored Close error The fixture's t.Cleanup did `_ = g.mvcc.Close()`, swallowing any Pebble close error. A silent failure would mask a leaked handle or a flush regression. Fix: surface the error via t.Errorf when the test body has not already closed the store. ## gemini medium - TestStorageEnvelopeGate_NilArgumentIsNoOp cleanup ignored Close error Same shape, separate cleanup. Same fix - surface via t.Errorf. Added a `closed` flag so the test body can close the store itself (needed for the on-disk peek - see coderabbit below) without double-closing in cleanup. ## coderabbit minor - TestStorageEnvelopeGate_NilArgumentIsNoOp lacked wire-level assertion The test proved round-trip but not the on-disk encryption_state bits. A regression that wrote cleartext would still round-trip cleanly. Fix: close the store, peek via rawEncStateAt, assert encStateEncrypted. Now symmetric with the other gate tests which already peek the on-disk bits. ## Caller audit Test-cleanup-only changes; no production code semantics touched. The encryptForKey gate logic from the initial PR is unchanged. All 5 gate tests still pass. ## Verification - go test -race -timeout 60s -run TestStorageEnvelopeGate_ ./store/... - all green - golangci-lint --new-from-rev=origin/main - 0 issues
|
@claude review Round-1 (commit gemini medium — close-err ignored in
|
|
Claude finished @bootjp's task in 2m 50s —— View job Review of Round-1 fixes (commit
|
| Site | Line | Note |
|---|---|---|
newEncryptedStoreFixture cleanup |
~93 | _ = f.mvcc.Close() |
newToggleableEncryptedStore cleanup |
~243 | _ = mvcc.Close() |
TestEncryption_SnapshotRestoreAtMaxValueSize |
~832 | _ = dst.Close() |
rawEncStateAt |
~336 | _ = closer.Close() (Pebble iterator) |
These are pre-existing patterns not introduced by this PR. Worth a follow-up pass to align them with the new convention, but not blocking here.
Overall assessment
Round-1 addresses all three reviewer findings cleanly. The close-error suppression is fixed at both new sites with the right guard structure, and the NilArgumentIsNoOp test now proves the legacy-posture at the wire level symmetrically with the other four gate tests. No regression to existing tests or production code paths. LGTM.
…StateAt Claude bot review on PR809 round-1 noted that rawEncStateAt (added by this PR) still swallowed the Pebble iterator's closer.Close error via _ = closer.Close(). Symmetric with the round-1 fixes for the t.Cleanup sites. Test helpers introduced by this PR now uniformly surface close errors via t.Errorf. The three pre-existing close-err swallows flagged by Claude (newEncryptedStoreFixture, newToggleableEncryptedStore, TestEncryption_SnapshotRestoreAtMaxValueSize) are out of scope for PR809 - Claude's review explicitly called them 'worth a follow-up pass, but not blocking here'. Verification: - go test -race -timeout 60s -run TestStorageEnvelopeGate_ ./store/... - all green - golangci-lint --new-from-rev=origin/main - 0 issues
|
@claude review Round-2 (commit
The three pre-existing close-err swallow sites (
|
|
Claude finished @bootjp's task in 1m 6s —— View job Review of Round-2 fixes (commit
|
Summary
StorageEnvelopeActivefield gates.encryptForKeynow consults the newWithStorageEnvelopeGateclosure in addition to the existingactiveStorageKeyID, so operators get the §7.1 Phase 0 / Phase 1 split: every node has the DEK provisioned before the cluster quiesces the cutover Raft entry that flips the bit.StorageEnvelopeActive func() boolandWithStorageEnvelopeGateoption (store/encryption_glue.go). Nil argument is a no-op (legacy "encrypt whenever DEK is active" posture), matching the pre-existingWithEncryptionconvention.encryption_state == 0b01versions always go through the cipher regardless of the gate, so mixed cleartext + encrypted versions for the same key stay readable across the cutover (design doc §6.3).Operator-inert until 6D-6 wires the cipher + gate in
main.goand exposes theEnableStorageEnvelopeRPC.Test plan
go test -race -timeout 180s ./store/... ./internal/encryption/... ./kv/...— all greengolangci-lint --new-from-rev=origin/main— 0 issuesTestStorageEnvelopeGate_PreCutoverForcesCleartext— gate=false with cipher + active DEK wired → on-diskencryption_state = 0b00TestStorageEnvelopeGate_PostCutoverEmitsEnvelope— gate=true → on-diskencryption_state = 0b01TestStorageEnvelopeGate_FlipMidLife— pre-cutover write, flip gate, post-cutover write; both reads succeed live, on-disk bits actually differ (pins §6.3 mixed-mode-per-key)TestStorageEnvelopeGate_NilGatePreservesLegacyBehavior—newEncryptedStoreFixture(no gate) still emits envelopes; pre-6D-5 posture unchangedTestStorageEnvelopeGate_NilArgumentIsNoOp—WithStorageEnvelopeGate(nil)does not nil-deref on the next PutCaller audit (semantic change)
The new option is purely additive; existing
WithEncryptioncall sites are not touched.encryptForKey's existing fast-path callers see the gate skipped when the field is nil (the default), so this PR cannot regress any pre-existing test or production code path.store.WithEncryptioncall sites are instore/lsm_store_encryption{,_prop}_test.go(5 sites). None setWithStorageEnvelopeGate; all retain the legacy posture.main.godoes NOT currently callstore.WithEncryption(storage-layer encryption wiring lands in 6D-6 alongside the cutover RPC), so this PR is operator-inert.Self-review (5 lenses)
encryptForKey's only behavioural change is one additional path toencStateCleartextwhen the gate is wired AND returns false. The Pebble write itself is unchanged (sameencodeValue+Setpath).StorageEnvelopeActivedeterministically at FSM apply (proved by the 6D-4 tests), so no inter-replica skew window.TestStorageEnvelopeGate_FlipMidLife. The §4.1 AAD binding for envelope writes is unchanged.TestEncryption_*tests continue to pass with no modification.Refs: design doc §6.2 (Stage 6D-5 row of §7 decomposition table).
Summary by CodeRabbit
New Features
Documentation
Tests