Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions UPDATING.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,36 @@ Both default to empty (no behavior change). They apply to both the `LOCAL_EXTENS

The Dynamic Group By chart customization now orders its display values according to the "Sort display control values" toggle: ascending (A–Z), descending (Z–A), or the dataset's source order when the toggle is unset. Previously the dropdown always sorted alphabetically. Existing dashboards where the toggle was never set will show options in source order instead of A–Z; open the customization and enable the toggle to restore alphabetical ordering.

### Selectable encryption engine for app-encrypted fields (AES-GCM)

App-encrypted fields (database passwords, SSH tunnel credentials, OAuth tokens, etc.) can now use authenticated **AES-GCM** encryption instead of the historical unauthenticated **AES-CBC**. A new config selects the engine for the default adapter:

```python
# "aes" (AES-CBC, historical default) | "aes-gcm" (authenticated, recommended for new installs)
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes"
```

**No action required / no behavior change:** the default remains `"aes"`, so existing installs are unaffected.

**Opting in on an existing install:** flipping the engine on a populated database without re-encrypting first will make stored secrets undecryptable, because the two ciphertext formats are not compatible. A migrator is provided. Recommended runbook:

1. Take a metadata-DB backup.
2. Re-encrypt existing secrets into the new engine (the `SECRET_KEY` is unchanged):
```bash
superset re-encrypt-secrets --engine aes-gcm
```
3. Set `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes-gcm"` in your config.
4. Restart Superset.
5. Re-run the migrator once more after the restart:
```bash
superset re-encrypt-secrets --engine aes-gcm
```
A live instance keeps writing *new* secrets as AES-CBC during the window between step 2 and the restart in step 4; this second pass sweeps those up (it is idempotent, so already-migrated values are skipped).

Schedule the cutover in a quiet window. Runtime reads use only the single configured engine, so in a multi-worker deployment there is an unavoidable brief decrypt-outage between the migration commit and the last worker restarting with the new config — each migrator run is transactional, but the fleet-wide cutover is not zero-downtime.

The migration is transactional (all-or-nothing) and idempotent — it can be safely re-run or resumed. Note that AES-GCM, unlike AES-CBC, does not support querying directly over encrypted columns; audit any code that filters on an encrypted column before switching. See the SIP at `docs/sip/authenticated-encryption-at-rest.md` for details.

### Granular Export Controls

A new feature flag `GRANULAR_EXPORT_CONTROLS` introduces three fine-grained permissions that replace the legacy `can_csv` permission:
Expand Down
136 changes: 136 additions & 0 deletions docs/sip/authenticated-encryption-at-rest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# SIP: Authenticated encryption (AES-GCM) for app-encrypted fields

## [DRAFT — proposal for discussion]

This document is a draft proposal accompanying the code in this PR. It is
intended to seed the formal SIP discussion. The code here ships the
backward-compatible engine selection **and** the re-encryption migrator
(Phases 1–2 below); both are opt-in and change nothing for existing installs by
default. Flipping the default for fresh installs (Phase 3) remains future work.

## Motivation

Superset app-encrypts a number of sensitive fields before persisting them to
the metadata database, including:

- database connection passwords and `encrypted_extra` (`superset/models/core.py`),
- SSH tunnel credentials — password, private key, private-key password
(`superset/databases/ssh_tunnel/models.py`),
- OAuth2 tokens and other secrets stored via `EncryptedType`.

These fields are encrypted with `sqlalchemy_utils.EncryptedType`, which
**defaults to `AesEngine` (AES-CBC)**. AES-CBC provides confidentiality but is
**unauthenticated**: it has no integrity tag. An attacker with write access to
the ciphertext (e.g. direct metadata-DB access, a backup, or a compromised
replica) can perform **bit-flipping / chosen-ciphertext manipulation** to
silently alter the decrypted plaintext of a secret without detection.

`AesGcmEngine` (AES-GCM) is authenticated encryption: tampering causes
decryption to fail loudly rather than yielding attacker-influenced plaintext.
Using authenticated encryption for secrets at rest is an ASVS L1 expectation
(11.3.2 / cryptography best practice).

`config.py` already documents that operators *can* switch to GCM by writing a
custom `AbstractEncryptedFieldAdapter`, but:

1. it is opt-in, undocumented as a security recommendation, and easy to miss;
2. there is **no migration path** — flipping the engine on a populated database
makes every existing secret undecryptable, because GCM ciphertext is not
format-compatible with CBC.

## Proposed change

A three-part change, delivered incrementally so existing deployments are never
broken:

### Phase 1 — engine selection (this PR)

- Add a `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE` config (`"aes"` | `"aes-gcm"`),
**defaulting to `"aes"`** (no behavior change for existing installs).
- Teach the default `SQLAlchemyUtilsAdapter` to honor it (an explicit `engine`
kwarg still wins, so the migrator can pin an engine).
- This lets **new** deployments choose AES-GCM from day one with a one-line
config, instead of writing a custom adapter.

### Phase 2 — CBC→GCM re-encryption migrator (this PR)

The existing `SecretsMigrator` (previously only used for `SECRET_KEY` rotation)
gains an **engine migration** mode that:

1. discovers every `EncryptedType` column (via `discover_encrypted_fields()`),
2. decrypts each value with the **source** engine (AES-CBC) under the current
`SECRET_KEY`,
3. re-encrypts with the **target** engine (AES-GCM),
4. runs transactionally per the existing all-or-nothing semantics, and is
idempotent per column (already-migrated values are skipped), so a run can be
safely repeated or resumed.

Exposed via a new `--engine` option on the existing CLI command:
`superset re-encrypt-secrets --engine aes-gcm`, runnable by operators with a DB
backup in hand. The `SECRET_KEY` is unchanged; an engine change and a key
rotation can also be combined (pass `--previous_secret_key` as well).

### Phase 3 — flip the default for new installs

Once the migrator and docs are in place, change the default to `"aes-gcm"` for
**fresh** installs only (e.g. keyed off an empty metadata DB / documented in
`UPDATING.md`), keeping existing installs on `"aes"` until they run Phase 2.

## New or changed public interfaces

- New config: `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE: Literal["aes", "aes-gcm"]`.
- New (Phase 2) CLI: `superset re-encrypt-secrets --engine <name>`.
- No schema changes; ciphertext format changes per migrated column.

## Migration plan and compatibility

- **Backward compatible by default.** Phase 1 changes nothing unless the
operator opts in.
- Switching an existing deployment to `"aes-gcm"` **without** running the Phase
2 migrator will make existing secrets undecryptable — this is called out in
the config comment and must be in `UPDATING.md`.
- Recommended operator runbook: take a metadata-DB backup → run
`re-encrypt-secrets --engine aes-gcm` → set
`SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes-gcm"` → restart → re-run
`re-encrypt-secrets --engine aes-gcm` once more to sweep up any secrets a live
instance wrote as AES-CBC during the cutover window. The canonical, more
detailed version of this runbook lives in `UPDATING.md`; this is a summary.
- `AesEngine` allows queryability over encrypted fields; AES-GCM does not.
Any code that filters/queries on an encrypted column directly must be audited
before Phase 3 (none is expected, but it must be verified).

## Rejected alternatives

- **Flip the default immediately.** Rejected: bricks every existing
deployment's secrets with no migration path.
- **Document-only (custom adapter).** Status quo; high friction and no
migration tooling — most operators will never do it.

## Open questions

- GCM→CBC rollback (for operators who need queryability) already works via the
same command (`re-encrypt-secrets --engine aes`), since the migrator is
engine-symmetric. Should rollback be documented as a supported path or
discouraged?
- The migrator already supports a concurrent `SECRET_KEY` rotation + engine
change in a single pass (pass `--previous_secret_key` alongside `--engine`).
Is that combination worth calling out in the operator docs, or kept advanced?
38 changes: 33 additions & 5 deletions superset/cli/update.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@

import superset.utils.database as database_utils
from superset.utils.decorators import transaction
from superset.utils.encrypt import SecretsMigrator
from superset.utils.encrypt import ENCRYPTION_ENGINES, SecretsMigrator

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -110,17 +110,45 @@ def update_api_docs() -> None:
help="An optional previous secret key, if PREVIOUS_SECRET_KEY "
"is not set on the config",
)
def re_encrypt_secrets(previous_secret_key: Optional[str] = None) -> None:
@click.option(
"--engine",
"-e",
"target_engine_name",
required=False,
type=click.Choice(sorted(ENCRYPTION_ENGINES), case_sensitive=False),
help="Re-encrypt all app-encrypted fields with this encryption engine "
"(e.g. 'aes-gcm' for authenticated encryption). The SECRET_KEY is "
"unchanged. Take a metadata-DB backup first, then set "
"SQLALCHEMY_ENCRYPTED_FIELD_ENGINE to the same value and restart.",
)
def re_encrypt_secrets(
previous_secret_key: Optional[str] = None,
target_engine_name: Optional[str] = None,
) -> None:
Comment thread
rusackas marked this conversation as resolved.
"""Re-encrypt every app-encrypted field via :class:`SecretsMigrator`.

Supports key rotation (``previous_secret_key``, falling back to the
``PREVIOUS_SECRET_KEY`` config) and engine migration (``target_engine_name``,
a case-insensitive ``ENCRYPTION_ENGINES`` key such as ``aes-gcm``); the two
can combine. With neither provided the command is a no-op. Exits non-zero on
failure.
"""
previous_secret_key = previous_secret_key or current_app.config.get(
"PREVIOUS_SECRET_KEY"
)
if previous_secret_key is None:
target_engine = (
ENCRYPTION_ENGINES[target_engine_name] if target_engine_name else None
)
if previous_secret_key is None and target_engine is None:
click.secho(
"No previous secret key provided; nothing to re-encrypt.",
"No previous secret key or target engine provided; nothing to re-encrypt.",
fg="yellow",
)
return
secrets_migrator = SecretsMigrator(previous_secret_key=previous_secret_key)
secrets_migrator = SecretsMigrator(
previous_secret_key=previous_secret_key,
target_engine=target_engine,
)
Comment thread
rusackas marked this conversation as resolved.
try:
stats = secrets_migrator.run()
except Exception as exc: # pylint: disable=broad-except
Comment thread
rusackas marked this conversation as resolved.
Expand Down
15 changes: 14 additions & 1 deletion superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,10 @@ def _try_json_readsha(filepath: str, length: int) -> str | None:
# as key material. Do note that AesEngine allows for queryability over the
# encrypted fields.
#
# To change the default engine you need to define your own adapter:
# To switch the engine used by the default adapter, prefer the
# ``SQLALCHEMY_ENCRYPTED_FIELD_ENGINE`` knob below (e.g. "aes-gcm"). Defining a
# custom adapter, as shown next, is only needed for behaviour the built-in
# engines do not cover:
#
# e.g.:
#
Expand All @@ -295,6 +298,16 @@ def _try_json_readsha(filepath: str, length: int) -> str | None:
SQLAlchemyUtilsAdapter
)

# Encryption engine used by the default SQLAlchemyUtilsAdapter for app-encrypted
# fields. Options:
# "aes" - AES-CBC (historical default; unauthenticated, queryable)
# "aes-gcm" - AES-GCM (authenticated encryption; recommended for NEW installs)
# WARNING: changing this on a database that already holds encrypted secrets
# (database passwords, SSH tunnel credentials, OAuth tokens, ...) will make
# those values undecryptable unless they are re-encrypted first. See the
# authenticated-encryption SIP/migration before switching an existing install.
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE: Literal["aes", "aes-gcm"] = "aes"
Comment thread
rusackas marked this conversation as resolved.

# Extends the default SQLGlot dialects with additional dialects
SQLGLOT_DIALECTS_EXTENSIONS: DialectExtensions | Callable[[], DialectExtensions] = {}

Expand Down
Loading
Loading