feat(encrypt): selectable encryption engine + AES-GCM re-encryption migrator#40654
feat(encrypt): selectable encryption engine + AES-GCM re-encryption migrator#40654rusackas wants to merge 9 commits into
Conversation
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40654 +/- ##
==========================================
- Coverage 64.16% 64.05% -0.12%
==========================================
Files 2666 2654 -12
Lines 143935 144090 +155
Branches 33091 33202 +111
==========================================
- Hits 92357 92295 -62
- Misses 49968 50156 +188
- Partials 1610 1639 +29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a backward-compatible configuration mechanism to select the encryption engine used for app-encrypted SQLAlchemy fields (defaulting to the existing AES-CBC engine), plus a SIP draft describing the broader CBC→GCM migration plan.
Changes:
- Introduces
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE("aes"|"aes-gcm") and wires the defaultSQLAlchemyUtilsAdapterto honor it (while keeping"aes"as the default). - Adds unit tests covering default selection, config-based selection, unknown-value fallback, and explicit
engine=override behavior. - Adds a SIP draft documenting the motivation, phased rollout, and migration considerations for authenticated encryption at rest.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
superset/utils/encrypt.py |
Adds selectable engine mapping and config-driven engine selection in the default adapter. |
superset/config.py |
Documents and defines the new SQLALCHEMY_ENCRYPTED_FIELD_ENGINE config default. |
tests/unit_tests/utils/encrypt_test.py |
Adds unit tests validating engine selection and precedence rules. |
docs/sip/authenticated-encryption-at-rest.md |
Adds SIP draft describing the overall authenticated-encryption proposal and phased plan. |
a2f6b9e to
9d9a205
Compare
There was a problem hiding this comment.
Code Review Agent Run #a466b5
Actionable Suggestions - 1
-
superset/cli/update.py - 1
- CWE-20: Case-Sensitive Choice Validation · Line 118-118
Review Details
-
Files reviewed - 4 · Commit Range:
d480d62..9d9a205- superset/cli/update.py
- superset/config.py
- superset/utils/encrypt.py
- tests/unit_tests/utils/encrypt_test.py
-
Files skipped - 2
- UPDATING.md - Reason: Filter setting
- docs/sip/authenticated-encryption-at-rest.md - Reason: Filter setting
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
9d9a205 to
9f247e4
Compare
…sitive --engine; warn on unknown engine Address bot review on #40654: - SecretsMigrator now tries the previous key under both the column's configured engine and the historical AES-CBC engine, so SECRET_KEY rotation works for AES-GCM deployments (not just CBC) while still reading CBC source data after an engine-flipped-first migration. - re-encrypt-secrets --engine accepts case-insensitive engine names. - Adapter logs a warning when SQLALCHEMY_ENCRYPTED_FIELD_ENGINE is unrecognized before falling back to AES-CBC. - config.py points operators at the SQLALCHEMY_ENCRYPTED_FIELD_ENGINE knob as the preferred way to switch engines. - Add unit test covering AES-GCM SECRET_KEY rotation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #20202bActionable Suggestions - 0Additional Suggestions - 2
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
…RAFT/SIP]
App-encrypted fields (DB passwords, SSH tunnel creds, OAuth tokens) use
sqlalchemy_utils EncryptedType, which defaults to AES-CBC — unauthenticated
encryption vulnerable to bit-flipping/tamper of ciphertext at rest. AES-GCM is
authenticated and fails closed on tampering.
Phase 1 (this change, backward compatible): add a configurable encryption
engine via SQLALCHEMY_ENCRYPTED_FIELD_ENGINE ("aes" | "aes-gcm"), defaulting to
"aes" so existing deployments are unchanged. The default SQLAlchemyUtilsAdapter
now honors it; an explicit engine kwarg still wins. New installs can opt into
AES-GCM with one line instead of writing a custom adapter.
Includes docs/sip/authenticated-encryption-at-rest.md describing the full
proposal: Phase 2 (a CBC->GCM re-encryption migrator built on SecretsMigrator)
and Phase 3 (flip default for fresh installs). The instance-wide re-encryption
migration is intentionally NOT included here — it is the subject of the SIP.
DRAFT: flipping the engine on a populated DB without the Phase 2 migrator makes
existing secrets undecryptable; this PR only adds the safe opt-in plumbing for
discussion.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 2 of the authenticated-encryption-at-rest proposal. Teaches the
existing SecretsMigrator an engine-migration mode (decrypt with the source
engine under the current SECRET_KEY, re-encrypt with a target engine) and
exposes it via `superset re-encrypt-secrets --engine aes-gcm`. The run is
transactional and idempotent per column, so existing installs can move from
AES-CBC to authenticated AES-GCM without bricking stored secrets. Default
engine is unchanged ("aes"); behavior is opt-in. Adds UPDATING.md runbook and
updates the SIP to reflect Phases 1–2 shipping here.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sitive --engine; warn on unknown engine Address bot review on #40654: - SecretsMigrator now tries the previous key under both the column's configured engine and the historical AES-CBC engine, so SECRET_KEY rotation works for AES-GCM deployments (not just CBC) while still reading CBC source data after an engine-flipped-first migration. - re-encrypt-secrets --engine accepts case-insensitive engine names. - Adapter logs a warning when SQLALCHEMY_ENCRYPTED_FIELD_ENGINE is unrecognized before falling back to AES-CBC. - config.py points operators at the SQLALCHEMY_ENCRYPTED_FIELD_ENGINE knob as the preferred way to switch engines. - Add unit test covering AES-GCM SECRET_KEY rotation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CodeQL's clear-text-logging query flags interpolating any app-config value into a log because the config also holds SECRET_KEY. The unrecognized-engine warning now lists only the valid engine names, which is enough to diagnose a typo. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fc28e60 to
731a476
Compare
There was a problem hiding this comment.
Code Review Agent Run #64dc65
Actionable Suggestions - 1
-
superset/cli/update.py - 1
- Missing CLI integration test · Line 113-146
Additional Suggestions - 1
-
superset/utils/encrypt.py - 1
-
Redundant decryptor in source list · Line 255-286`_source_decryptors` always appends all supported engines with `prev_key` to the fallback list, including the column's own engine — even when `prev_key == _secret_key`. This creates duplicate entries `[col_engine, CBC(prev), GCM(prev)]` instead of `[col_engine, other_engine(prev)]`. While the fast path masks the correctness issue in the idempotency test, the redundancy is wasteful and risks re-encrypting already-migrated values in edge cases.
Code suggestion
--- superset/utils/encrypt.py +++ superset/utils/encrypt.py @@ -266,12 +266,19 @@ class SecretsMigrator: decryptors = [encrypted_type] if self._previous_secret_key: # Try the column's own engine first, then any remaining supported - # engines (notably the historical AES-CBC), de-duplicated so we - # never build the same decryptor twice. + # engines (notably the historical AES-CBC). When the previous key + # equals the current key the column's own engine is already in + # ``decryptors[0]``, so skip adding it again here to avoid + # redundant (and potentially incorrect) re-encryption attempts. engines: list[type[Any]] = [type(encrypted_type.engine)] for engine in ENCRYPTION_ENGINES.values(): if engine not in engines: engines.append(engine) + # When prev_key == _secret_key the column engine is the primary + # decryptor; do not add it again as a fallback. + if self._previous_secret_key == self._secret_key: + engines = engines[1:] decryptors.extend( EncryptedType( type_in=encrypted_type.underlying_type,
-
Review Details
-
Files reviewed - 4 · Commit Range:
f748de3..731a476- superset/cli/update.py
- superset/config.py
- superset/utils/encrypt.py
- tests/unit_tests/utils/encrypt_test.py
-
Files skipped - 2
- UPDATING.md - Reason: Filter setting
- docs/sip/authenticated-encryption-at-rest.md - Reason: Filter setting
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #9b1164Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Adds a regression test for the reverse engine migration (--engine aes): an AES-GCM value must be re-encrypted back to AES-CBC, not skipped. This guards the idempotency fast-path, which decrypts in the target form first — since the rollback target (AES-CBC) is unauthenticated, the test confirms it does not mis-read GCM ciphertext and wrongly skip the value. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #c82ab8Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
richardfogaca
left a comment
There was a problem hiding this comment.
Posting on Richard's behalf — this is his PR reviewer agent. Forward any pushback to him and he'll loop me back in.
Really solid PR overall — opt-in end to end, genuinely zero-behavior-change by default, and reusing SecretsMigrator instead of growing a parallel tool is the right call. The four items under "Design" are the ones worth a look before merge (mostly fail-open vs fail-closed posture and the operational window); the rest are optional. All line numbers verified against HEAD 6671c90.
Design — worth discussing before merge
-
superset/utils/encrypt.py:86-97An unrecognized
SQLALCHEMY_ENCRYPTED_FIELD_ENGINElogs a warning and silently falls back to AES-CBC. For a crypto-strength selector that feels like the wrong default posture: an operator who typos"aes_gcm"or"AES-GCM"believes they have authenticated encryption but runs unauthenticated CBC indefinitely — and if the typo happens after a GCM migration, new secrets get written CBC into a GCM database while existing ones become unreadable. It's also inconsistent with the CLI, which rejects unknown names viaclick.Choice(case-insensitively) atsuperset/cli/update.py:118.Would it be worth raising at field-creation time instead of warning, so a misconfig fails at startup rather than degrading silently? (Or at minimum case-normalizing to match the CLI.)
-
superset/utils/encrypt.py:343-349The "already in target form" fast path treats any successful decrypt under the target engine as proof a value is already migrated. Forward (CBC→GCM) that's sound — the GCM tag is a real authenticator. In the rollback direction (
--engine aes), CBC decryption is unauthenticated, so a GCM blob can spuriously "decrypt" (length check + padding + UTF-8 all passing by accident) and get counted asskipped— left as GCM ciphertext that bricks once config is flipped back to"aes", with the run still reporting success. Low probability per value, but it's nonzero across a large fleet, and the docstring ontest_engine_migration_gcm_to_cbc_rolls_back(tests/unit_tests/utils/encrypt_test.py:164) suggests you were already thinking about this.Could the skip-check prefer the authenticated interpretation — e.g. when the target is CBC, only treat a value as "already migrated" if the GCM source decrypt fails first? Alternatively documenting rollback as best-effort would also close it.
-
UPDATING.md:73-83Two gaps in the runbook window between step 2 (migrate) and step 4 (restart): (a) a live app keeps writing new secrets as CBC during that window, and those become undecryptable after the flip — the code already handles this fine (re-running the migrator after the restart sweeps them up, per
test_engine_migration_reads_cbc_after_config_already_flipped), but the runbook never says to; (b) in multi-worker deployments there's an unavoidable decrypt-outage window between the migration commit and the last worker restarting with the new config, since runtime reads use only the single configured engine — "transactional" may read as "zero-downtime" to operators when it isn't.Small suggestion: add a final "re-run
re-encrypt-secrets --engine aes-gcmafter restart" step and a sentence about scheduling the cutover in a quiet window. WDYT? -
superset/utils/encrypt.py:204discover_encrypted_fieldsonly matchesisinstance(col.type, EncryptedType), so a deployment using a customSQLALCHEMY_ENCRYPTED_FIELD_TYPE_ADAPTERgets a silent no-op migration — 0 columns discovered, exit 0, "success". Pre-existing for key rotation, but this PR makes the migrator the blessed engine-migration path.Would it be worth detecting a non-default adapter and warning (or refusing)? Happy to keep as-is if that's considered out of scope.
Other suggestions
-
superset/utils/encrypt.py:85app_config.get("SQLALCHEMY_ENCRYPTED_FIELD_ENGINE", "aes")— since the key now has a default insuperset/config.py:309, the house pattern isapp_config["..."]; the default currently lives in three places (config.py, this.get, and theENCRYPTION_ENGINES.get(..., AesEngine)fallback), which could drift when Phase 3 flips it. Totally optional given custom configs may omit the key. -
superset/cli/update.py:114The PR body advertises composability with key rotation (
--previous_secret_key+--enginein one run), but no test exercises the combined mode (old-key CBC → current-key GCM). Could we add one? It's the mode most likely to regress since each mode's tests pin the other's variable. -
Few small nits, take or leave: function-local
AesGcmEngineimports intests/integration_tests/cli_tests.py:375(and again ~402) could move to module top;engines: list[type[Any]]atsuperset/utils/encrypt.py:274could betype[EncryptionDecryptionBaseEngine]since it's already imported from the same module.
Praise
The unit-test suite is unusually thoughtful for crypto code — each test docstring states the operational scenario it protects, and test_engine_migration_reads_cbc_after_config_already_flipped (tests/unit_tests/utils/encrypt_test.py:140) covers the exact "operator did the steps in the wrong order" recovery path most migrators forget.
…back classification Addresses review feedback on the selectable encryption engine PR. Posture: authenticated checks gate decisions; unauthenticated paths never silently win. - Unknown engine name fails closed. An unrecognized SQLALCHEMY_ENCRYPTED_FIELD_ENGINE now raises at field-construction (startup) instead of silently degrading to unauthenticated AES-CBC. The value is normalized (trim/lower/underscores→hyphens) to match the CLI's case-insensitive click.Choice, and the offending value is kept out of the error message (same clear-text-logging rationale as the CodeQL fix). Absent key still defaults to "aes" (no behavior change for existing installs). - Rollback (GCM→CBC) no longer mis-skips. The "already in target form" fast path trusted any successful target decrypt — but unauthenticated CBC can coincidentally decrypt a GCM blob, counting it skipped and leaving GCM ciphertext that bricks on the next config flip while the run reports success. When the target engine is unauthenticated, the value is now first probed under any authenticated engine (current + previous key); an authenticated decrypt wins. Generalized on "is the target authenticated", not engine names. - De-dup the redundant previous-key decryptor when prev_key == SECRET_KEY (the column's own engine is already decryptors[0]). - Runbook: add the post-restart re-run step (sweeps secrets written CBC during the migration window) and a quiet-window note (multi-worker cutover is not zero-downtime). Tests: flip the unknown-engine test to assert the fail-closed raise (and that the value isn't leaked); add a normalization test; add the rollback ordering-invariant test (authenticated probe wins over a forced-spurious CBC skip); add combined key-rotation + engine-migration coverage at both the migrator and CLI levels; hoist the function-local AesGcmEngine imports. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks @richardfogaca — really sharp review. Addressed the design items in 10c49b6 under one posture: authenticated checks gate decisions; unauthenticated paths never silently win. Unknown engine → fail-closed ( Rollback spurious-skip → fixed, not documented ( Runbook ( Smaller items: combined key-rotation + engine-migration is now covered at both the migrator ( Two I left as follow-ups rather than widen this PR, lmk if you'd rather I fold them in:
|
Code Review Agent Run #44d6b7Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
richardfogaca
left a comment
There was a problem hiding this comment.
Posting on Richard's behalf - this is his PR reviewer agent. Forward any pushback to him and he'll loop me back in.
Left one functional note that seems worth fixing before merge: the new fail-closed engine validation still has one empty-config path that falls back to AES-CBC. The docs/style notes are smaller. All line numbers verified against HEAD 10c49b61.
Functional - worth fixing before merge
-
superset/utils/encrypt.py:135The
or DEFAULT_ENCRYPTION_ENGINE_NAMEfallback means a present-but-emptySQLALCHEMY_ENCRYPTED_FIELD_ENGINEskipsresolve_encryption_engine()and silently uses AES-CBC. That seems to undercut the fail-closed behavior documented in this same block: after a GCM migration, an empty env/config value could make new secrets get written as CBC while the operator thinks the app is still in GCM mode.WDYT - could we distinguish an absent key from a present empty value, and add a small test that
""raises instead of falling back?
Other suggestions
-
docs/sip/authenticated-encryption-at-rest.md:111The SIP runbook still says backup -> run migrator -> set config -> restart, while
UPDATING.mdnow adds a post-restart second pass to sweep up CBC writes during the cutover window. Since this file is the proposal/operators may cross-reference it fromUPDATING.md, the two runbooks could leave readers unsure whether the second pass is required.Small suggestion: would it be worth syncing this runbook with the five-step
UPDATING.mdversion? -
tests/integration_tests/cli_tests.py:371The new CLI tests in this block do not include
-> Nonereturn annotations. Existing tests in this file predate that pattern, but the current repo guidance asks for type hints on new Python code/tests, and the unit tests added in this PR already follow it.Nit: could we add
-> Noneto the newtest_re_encrypt_secrets_*functions? Happy to keep as-is if this file intentionally stays untyped.
Praise
-
tests/unit_tests/utils/encrypt_test.py:256Nice coverage on the combined key-rotation plus engine-migration path. The test verifies that the rewritten value decrypts as AES-GCM under the current key, not just that the migrator issued an update.
…ped CLI tests Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks again @richardfogaca — all three caught and fixed in ad613d4. The empty-value one was the real bug: |
…d test params Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #4221e8Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
App-encrypted fields (database passwords, SSH tunnel credentials, OAuth tokens, …) use
sqlalchemy_utils.EncryptedType, which defaults toAesEngine(AES-CBC) — unauthenticated encryption.AesGcmEngine(AES-GCM) is authenticated and fails closed on tampering. This PR makes the engine selectable and provides a migration path so existing deployments can adopt it safely.Phase 1 — engine selection
SQLALCHEMY_ENCRYPTED_FIELD_ENGINEconfig ("aes"|"aes-gcm"), defaulting to"aes"— zero behavior change for existing installs.SQLAlchemyUtilsAdapterhonors it; an explicitenginekwarg still wins. Unknown values fall back to AES-CBC.Phase 2 — CBC→GCM re-encryption migrator (new in this revision)
SecretsMigrator(previously only used forSECRET_KEYrotation) an engine-migration mode: decrypt each value with the source engine under the currentSECRET_KEY, re-encrypt with the target engine.--engineflag on the existing CLI:superset re-encrypt-secrets --engine aes-gcm.--engine aesrollback) and composable with a concurrent key rotation (--previous_secret_key).UPDATING.mddocuments the operator runbook (backup →re-encrypt-secrets --engine aes-gcm→ set config → restart).The instance-wide default flip (Phase 3) is intentionally not included — it's the subject of the SIP.
TESTING INSTRUCTIONS
Unit tests cover engine selection (default/explicit/fallback) and the migrator: CBC→GCM re-encryption, idempotency/resumability (already-GCM values skipped), reading CBC after the config is flipped first, and failure counting for undecryptable values. Existing
SecretsMigratorkey-rotation integration tests are preserved unchanged (the newprevious_secret_key/target_enginesignature is backward-compatible).ADDITIONAL INFORMATION
🤖 Generated with Claude Code