Skip to content

feat(datasets): soft-delete and restore#40130

Open
mikebridge wants to merge 44 commits into
apache:masterfrom
mikebridge:sc-106191-datasets-soft-delete
Open

feat(datasets): soft-delete and restore#40130
mikebridge wants to merge 44 commits into
apache:masterfrom
mikebridge:sc-106191-datasets-soft-delete

Conversation

@mikebridge

@mikebridge mikebridge commented May 14, 2026

Copy link
Copy Markdown
Contributor

SUMMARY

Wire datasets to the soft-delete infrastructure shipped in #39977. One of three entity rollouts (charts / dashboards / datasets) decomposing the soft-delete work per SIP-208 (passed 2026-05-08).

#39977 (b2320820b4) merged on 2026-05-29; this PR consumes the abstractions (SoftDeleteMixin, BaseRestoreCommand, BaseDeletedStateFilter, SoftDeleteApiMixin, find_existing_for_import) and wires them to SqlaTable.

What this PR ships

Migration (superset/migrations/versions/2026-05-08_12-10_3a8e6f2c1b95_add_deleted_at_to_tables.py)

Model + DAO

  • SqlaTable inherits SoftDeleteMixin (superset/connectors/sqla/models.py). The do_orm_execute listener from the infrastructure PR begins filtering SqlaTable queries (including relationship lazy loads, via with_loader_criteria(..., propagate_to_loaders=True)).
  • DeleteDatasetCommand routes through BaseDAO.delete() — which now detects the mixin and dispatches to soft_delete() — so no source change to the delete command is needed.
  • DatasetDAO.validate_uniqueness and validate_update_uniqueness (the create/update logical-uniqueness checks) run with execution_options(SKIP_VISIBILITY_FILTER_CLASSES={SqlaTable}) so a soft-deleted row still blocks creation of a new dataset at the same (database_id, catalog, schema, table_name). This converts what would have been an opaque 500 IntegrityError from the DB-level constraint into a clean 422 at the application layer.
  • superset/daos/datasource.py:build_dataset_query uses a Core select(...) rather than an ORM query, so the do_orm_execute listener didn't apply. Added an explicit ds_q = ds_q.where(ds_table.c.deleted_at.is_(None)) so soft-deleted datasets don't count toward the combined GET /api/v1/datasource/ totals or pagination slots.

API

  • RestoreDatasetCommand (superset/commands/dataset/restore.py) — 4 ClassVar declarations subclassing BaseRestoreCommand[SqlaTable], plus a validate() override that calls _has_active_logical_duplicate(model) and raises DatasetLogicalDuplicateError (HTTP 422) when another active dataset already references the same physical table.
  • POST /api/v1/dataset/<uuid>/restore endpoint (superset/datasets/api.py) with the standard decorator stack (@protect, @safe, @statsd_metrics, @event_logger.log_this_with_context). Permission model mirrors delete exactly: endpoint-level can_write (via "restore": "write" in method_permission_name), resource-level raise_for_ownership. UUIDs in the path per the new-endpoints convention. Maps DatasetLogicalDuplicateError to a 422 response.
  • DatasetDeletedStateFilter (superset/datasets/filters.py) — 2-line subclass of BaseDeletedStateFilter setting arg_name = "dataset_deleted_state" and model = SqlaTable. Wired into search_filters on DatasetRestApi.
  • DatasetRestApi gains SoftDeleteApiMixin and "restore" is added to include_route_methods.
  • DatasetRestoreFailedError exception type added (superset/commands/dataset/exceptions.py), inheriting UpdateFailedError for parity with the sibling failure exceptions in the module. DatasetLogicalDuplicateError added alongside it (HTTP 422).

Logical-duplicate restore guard

SqlaTable carries DB-level unique constraints on (database_id, schema, table_name) (legacy _customer_location_uc from 2016) and (database_id, catalog, schema, table_name) (the newer model-level constraint), so the database itself enforces the physical-identity invariant. The application-level guards in this PR are defense-in-depth — they convert what would otherwise be opaque 500 IntegrityError responses into clean 422s with operator-actionable messages:

  • Restore pathRestoreDatasetCommand._has_active_logical_duplicate() runs before clearing deleted_at. Surfaces the conflict as DatasetLogicalDuplicateError (422) with a message naming the conflict, rather than letting the constraint violation bubble at flush time.
  • Create pathDatasetDAO.validate_uniqueness bypasses the visibility filter so a soft-deleted row blocks creation of a new dataset with the same logical identity. The user gets the existing DatasetExistsValidationError (422) instead of a 500 IntegrityError.
  • Importer pathimport_dataset performs the same check before clearing deleted_at, raising ImportFailedError if a live duplicate exists.

Both Richard Fogaça and Amin Ghadersohi independently flagged the underlying scenario (delete A → create B at the same logical identity → restore A producing two live rows for one physical table). The DB constraint makes the dual-live state unreachable today, but the application-level guards keep the error model clean and survive any future relaxation of the DB constraint (the legacy _customer_location_uc was removed once before via df3d7e2eb9a4 and may be removed cleanly across all installations in the future).

Import pipeline (superset/commands/dataset/importers/v1/utils.py)

  • Updates import_dataset to follow the Option C permission model: re-importing a UUID that matches a soft-deleted dataset is treated as an implicit restore-and-update. For an owner (or admin) with can_write, the importer clears deleted_at on the existing row and routes import_from_dict through UPDATE, preserving the PK and the chart back-reference and table_columns / sql_metrics children. Non-owners get ImportFailedError. Callers without can_write also get ImportFailedError rather than silently receiving the soft-deleted row (Case B — protects callers from quietly reattaching to a deleted dataset). The full matrix is documented in the docstring on import_dataset.
  • An explicit db.session.flush() after clearing deleted_at makes the restore-then-UPDATE chain independent of session autoflush (import_from_dict re-queries by UUID; the listener filters deleted_at IS NULL; the explicit flush guarantees the cleared value is visible before that query).
  • The implicit-restore path forces sync = ["columns", "metrics"] so re-import is a clean replacement (orphan columns/metrics present in the live DB but absent from the upload are removed), matching what an explicit overwrite=True would do.
  • MultipleResultsFound fallback (legacy NULL-schema duplicate path): when this fires on the soft-deleted-restore branch, existing.deleted_at is rolled back to its snapshotted value and an ImportFailedError is raised. Returning the row unmodified would have produced a half-restored state (deleted_at cleared but upload contents dropped). The non-soft-deleted overwrite path keeps the legacy "return existing unmodified" contract from the original 2018 fix.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No UI changes in this PR — REST endpoints only. The frontend "Restore" button is a follow-up PR.

Endpoint Before After
DELETE /api/v1/dataset/<pk> Hard-deletes the row Sets deleted_at; row remains but is hidden from default queries
GET /api/v1/dataset/?...&dataset_deleted_state=only Filter doesn't exist (400) Returns soft-deleted datasets with deleted_at populated
GET /api/v1/dataset/?...&dataset_deleted_state=include Filter doesn't exist (400) Returns live + soft-deleted; deleted_at populated on deleted rows
POST /api/v1/dataset/<uuid>/restore 404 (no such endpoint) Clears deleted_at on the soft-deleted row, or 422 with DatasetLogicalDuplicateError if another active dataset references the same physical table
POST /api/v1/dataset/ with logical identity claimed by a soft-deleted row 500 IntegrityError from the DB-level constraint Clean 422 DatasetExistsValidationError from the DAO-level check
GET /api/v1/datasource/ (combined endpoint) Counted soft-deleted datasets in pagination totals Excludes them via explicit deleted_at IS NULL predicate on the Core select

TESTING INSTRUCTIONS

# Unit tests
pytest tests/unit_tests/commands/dataset/restore_test.py -v

# Integration tests
pytest tests/integration_tests/datasets/soft_delete_tests.py -v

The integration suite covers:

  • DELETE sets deleted_at instead of removing the row
  • Soft-deleted datasets excluded from default list and from individual GET (404)
  • dataset_deleted_state=only returns soft-deleted with deleted_at populated
  • dataset_deleted_state=include returns both, response augmented
  • POST restore clears deleted_at (admin and owner paths)
  • Restore returns 403 for non-owners (via raise_for_ownership)
  • Restore returns 404 for non-existent or already-live datasets
  • Restore returns 200 from a non-admin owner with can_write_Dataset (pins the method_permission_name = {..., "restore": "write"} contract — without the mapping, FAB falls back to can_restore_Dataset and 403s every non-admin)
  • validate_uniqueness bypasses the visibility filter so a soft-deleted row blocks new-dataset creation at the application layer with a clean 422 (test_create_blocked_by_soft_deleted_logical_duplicate)
  • Re-import of a soft-deleted UUID via the v1 importer succeeds (and only after permission check)
  • The MultipleResultsFound legacy-duplicate fallback path resolves correctly when the duplicate is soft-deleted (regression coverage for the visibility-filter bypass in the importer)
  • Charts referencing a soft-deleted dataset render an error at chart-load time (V1 documented behaviour)

The restore-side _has_active_logical_duplicate guard is covered at the unit level (tests/unit_tests/commands/dataset/restore_test.py::test_restore_dataset_logical_duplicate_raises, mocked) rather than as an integration test. The "active twin coexists with soft-deleted original" scenario the integration test would require can't be set up against a real DB: both DB-level unique constraints on tables reject step 2 (the create) before the test can stage the restore step. Comment block in the integration test documents this.

ADDITIONAL INFORMATION

  • Has associated issue: SIP-208 #39464
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible (nullable column + index; downgrade reverses both)
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided — additive only: nullable column with no default + a single-column index on the tables table. Instant on Postgres/MySQL/SQLite for any realistic deployment size; no lock-holding.
  • Introduces new feature or API — soft-delete behaviour, dataset_deleted_state rison filter, POST /api/v1/dataset/<uuid>/restore endpoint, deleted_at field on list responses when augmentation is opted into, DatasetLogicalDuplicateError (422) on restore conflict
  • Removes existing feature or API

Behaviour changes worth flagging

  • DELETE /api/v1/dataset/<pk> is now soft. Existing API consumers that called DELETE expecting permanent removal will see the dataset reappear if it's restored. The row is no longer findable via the default API after DELETE (the listener hides it), so most downstream behaviour is preserved — but anyone doing direct DB queries on the tables table will see the row.
  • Cascade behaviour (V1, per SIP-208): soft-delete does not cascade from a dataset to its charts. Charts that reference a soft-deleted dataset render an error at chart-load time — the existing behaviour for any unfound dataset. A future ticket may introduce a "degraded chart" state; that's out of scope here. The integration tests cover this case explicitly so the behaviour can't regress unnoticed.
  • Restore now refuses a logical duplicate with HTTP 422 (DatasetLogicalDuplicateError) rather than letting an opaque IntegrityError bubble. Conversely, creating a dataset at a logical identity owned by a soft-deleted row also refuses with 422 at the DAO layer instead of a 500 from the DB constraint.
  • Combined GET /api/v1/datasource/ pagination no longer counts soft-deleted datasets. Total counts and page contents are aligned with the visibility-filter contract for ORM queries.

Depends on

Coordination with sibling PRs

This PR's migration shares down_revision = 31dae2559c05 with:

The first of the three to merge (dashboards) keeps its down_revision; each later PR re-points its down_revision to the just-merged migration before merging — a plain rebase would create an alembic multi-head, which the Check DB migration conflict CI job will catch.

@github-actions github-actions Bot added risk:db-migration PRs that require a DB migration api Related to the REST API labels May 14, 2026
@netlify

netlify Bot commented May 14, 2026

Copy link
Copy Markdown

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 9a501c5
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a2af3c727de460008364401
😎 Deploy Preview https://deploy-preview-40130--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov

codecov Bot commented May 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.26087% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.90%. Comparing base (74845ea) to head (2f57b0b).

Files with missing lines Patch % Lines
superset/commands/dataset/importers/v1/utils.py 33.33% 19 Missing and 3 partials ⚠️
superset/datasets/api.py 86.95% 3 Missing ⚠️
superset/commands/dataset/restore.py 85.71% 1 Missing and 1 partial ⚠️
superset/daos/dataset.py 93.93% 1 Missing and 1 partial ⚠️
superset/daos/datasource.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40130      +/-   ##
==========================================
- Coverage   64.30%   59.90%   -4.40%     
==========================================
  Files        2657      797    -1860     
  Lines      144060    67014   -77046     
  Branches    33216     8288   -24928     
==========================================
- Hits        92641    40148   -52493     
+ Misses      49797    25238   -24559     
- Partials     1622     1628       +6     
Flag Coverage Δ
hive 39.47% <31.88%> (+0.02%) ⬆️
javascript ?
mysql 58.35% <78.26%> (+0.15%) ⬆️
postgres 58.41% <78.26%> (+0.15%) ⬆️
presto 41.06% <31.88%> (+0.02%) ⬆️
python 59.88% <78.26%> (+0.15%) ⬆️
sqlite 58.04% <78.26%> (+0.16%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mikebridge mikebridge force-pushed the sc-106191-datasets-soft-delete branch from fefb486 to ec98d74 Compare May 18, 2026 17:03
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 20, 2026
Eight items from the 2026-05-20 review on the infrastructure PR:

* **H1 — subclass walk cached via ``__init_subclass__``.** The previous
  ``_all_soft_delete_subclasses()`` walked ``SoftDeleteMixin.__subclasses__()``
  on every primary ORM SELECT. With one adopter today the cost is
  negligible, but the listener fires on every query in the app and the
  walk grows linearly with each adopted entity. ``SoftDeleteMixin`` now
  maintains a sorted ``_registered_subclasses`` list updated by
  ``__init_subclass__`` at class-definition time; the listener reads
  the cached list with no walk. Sort happens once per registration
  (rare event) rather than per query.

* **M3 — ``SoftDeleteMixin`` import in ``BaseDAO.delete`` moved to
  top.** ``SKIP_VISIBILITY_FILTER_CLASSES`` from the same module was
  already top-level; ``SoftDeleteMixin`` had no real reason to be
  inline. Cleaner.

* **M4 — pylint-disable in ``raise_for_ownership`` documented.** A
  top-level import here actually does cause a circular
  (security ↔ models.core ↔ superset's lazy
  ``feature_flag_manager``). Kept the inline import but added a
  comment naming the cycle so future maintainers don't try to "fix"
  it and re-trip the issue.

* **M5 — ``BaseDeletedStateFilter.model`` typed as
  ``ClassVar[type[SoftDeleteMixin]]``.** Previously ``Any``, which
  meant a subclass accidentally binding ``model`` to a
  non-soft-deletable entity would crash at runtime on
  ``.deleted_at`` rather than failing mypy.

* **M6 — listener placement documented.** ``setup_soft_delete_listener``
  is called outside the ``with app_context()`` block because the
  ``do_orm_execute`` hook attaches to the ``Session`` class
  (process-wide), not to a Session instance, so it doesn't need
  Flask app context. Added an explaining comment in
  ``init_app_in_ctx``'s caller; ``setup_db()`` already ran earlier
  so the Session import is initialised.

* **M7 — unit tests for ``BaseDeletedStateFilter`` +
  ``SoftDeleteApiMixin``.** New
  ``tests/unit_tests/views/test_soft_delete_filter.py`` (10 tests):
  filter no-op / include / only / case-insensitivity / unknown value;
  mixin no-op-without-flag / inject-with-flag / consume-flag-on-call;
  ``_serialize_deleted_at`` None / datetime handling.

* **Nit — explicit ``orig_resource is None`` guard in
  ``raise_for_ownership``.** Previously fell through to
  ``hasattr(None, "owners")`` returning ``False`` → ``owners = []`` →
  generic ownership error. Now raises the proper "resource removed
  before ownership could be verified" error so the cause is clear in
  the rare race-with-hard-delete case.

* **Nit — unit tests for ``find_existing_for_import`` /
  ``clear_soft_deleted_for_import``.** New
  ``tests/unit_tests/commands/importers/v1/test_find_existing_for_import.py``
  (5 tests): no-match-returns-None, live-row-as-is,
  soft-deleted-row-as-is (pure lookup, no side effect),
  ``clear`` hard-deletes via ORM (listener-firing path), and the
  composed find-then-clear contract that's the documented caller
  sequence.

Deferred with explanation in the PR-comment reply (not in this
commit):

* **H2 — non-restorable / GDPR hook**: V2 concern. Designing the
  interface without a concrete use case risks getting it wrong;
  follow-up ticket if compliance work is ever scoped.

* **M8 — ``deleted_at`` schema mutation**: documented in the entity-
  rollout PR bodies (apache#40128/apache#40129/apache#40130) rather than blocking
  infrastructure.

40 unit tests pass (was 26; added 14 covering filter, mixin, importer
helpers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-106191-datasets-soft-delete branch 5 times, most recently from 5995e2d to 8da8ce4 Compare May 20, 2026 22:36
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 20, 2026
The listener's loader_criteria was passing a concrete SQL expression
(``cls.where_not_deleted()``) which rendered as the raw table name
(``slices.deleted_at``) regardless of how the table was aliased in
the statement. When the same soft-deletable model appeared under an
alias in a JOIN — e.g., FAB's outer/inner reconstruction or
``report_schedule.chart`` aliasing ``slices AS chart`` — the JOIN's
ON clause produced ``Unknown column 'slices.deleted_at' in 'on
clause'`` and broke the entire query.

Surfaced as CI failures on the entity-rollout PRs (apache#40128 / apache#40129 /
apache#40130) — specifically pre-existing report-schedule tests that join
to ``slices AS chart`` started failing once ``Slice`` inherited
``SoftDeleteMixin``. The infrastructure PR itself doesn't exercise
this because no entity has the mixin yet.

Fix: switch the criteria to a lambda (the form Bayer's canonical
pattern uses). SQLAlchemy invokes the lambda per occurrence and
adapts the column reference to the alias at each site. The lambda's
body is trivial (``c.deleted_at.is_(None)``) so the
``DeferredLambdaElement`` parser handles it without issue — we
originally moved AWAY from a lambda because of an
``if cls in bypass_classes`` conditional that DeferredLambdaElement
mis-parsed as a SQL ``IN`` operator, but the conditional now lives
outside the lambda (in the per-class iteration), so the lambda body
itself is clean.

Test added:
test_listener_adapts_criteria_to_aliased_table_in_joins —
reproduces the aliased-join shape using the synthetic
``_SoftDeletableChild`` model. Without the fix, this query raises
``OperationalError: no such column: ..._soft_deletable_child.deleted_at``
because the criteria renders as the raw table name. With the fix,
the query runs cleanly.

45 unit tests pass (was 44 + 1 regression test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-106191-datasets-soft-delete branch 2 times, most recently from 1d78910 to 8cb25e1 Compare May 21, 2026 15:44
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 25, 2026
Eight items from the 2026-05-20 review on the infrastructure PR:

* **H1 — subclass walk cached via ``__init_subclass__``.** The previous
  ``_all_soft_delete_subclasses()`` walked ``SoftDeleteMixin.__subclasses__()``
  on every primary ORM SELECT. With one adopter today the cost is
  negligible, but the listener fires on every query in the app and the
  walk grows linearly with each adopted entity. ``SoftDeleteMixin`` now
  maintains a sorted ``_registered_subclasses`` list updated by
  ``__init_subclass__`` at class-definition time; the listener reads
  the cached list with no walk. Sort happens once per registration
  (rare event) rather than per query.

* **M3 — ``SoftDeleteMixin`` import in ``BaseDAO.delete`` moved to
  top.** ``SKIP_VISIBILITY_FILTER_CLASSES`` from the same module was
  already top-level; ``SoftDeleteMixin`` had no real reason to be
  inline. Cleaner.

* **M4 — pylint-disable in ``raise_for_ownership`` documented.** A
  top-level import here actually does cause a circular
  (security ↔ models.core ↔ superset's lazy
  ``feature_flag_manager``). Kept the inline import but added a
  comment naming the cycle so future maintainers don't try to "fix"
  it and re-trip the issue.

* **M5 — ``BaseDeletedStateFilter.model`` typed as
  ``ClassVar[type[SoftDeleteMixin]]``.** Previously ``Any``, which
  meant a subclass accidentally binding ``model`` to a
  non-soft-deletable entity would crash at runtime on
  ``.deleted_at`` rather than failing mypy.

* **M6 — listener placement documented.** ``setup_soft_delete_listener``
  is called outside the ``with app_context()`` block because the
  ``do_orm_execute`` hook attaches to the ``Session`` class
  (process-wide), not to a Session instance, so it doesn't need
  Flask app context. Added an explaining comment in
  ``init_app_in_ctx``'s caller; ``setup_db()`` already ran earlier
  so the Session import is initialised.

* **M7 — unit tests for ``BaseDeletedStateFilter`` +
  ``SoftDeleteApiMixin``.** New
  ``tests/unit_tests/views/test_soft_delete_filter.py`` (10 tests):
  filter no-op / include / only / case-insensitivity / unknown value;
  mixin no-op-without-flag / inject-with-flag / consume-flag-on-call;
  ``_serialize_deleted_at`` None / datetime handling.

* **Nit — explicit ``orig_resource is None`` guard in
  ``raise_for_ownership``.** Previously fell through to
  ``hasattr(None, "owners")`` returning ``False`` → ``owners = []`` →
  generic ownership error. Now raises the proper "resource removed
  before ownership could be verified" error so the cause is clear in
  the rare race-with-hard-delete case.

* **Nit — unit tests for ``find_existing_for_import`` /
  ``clear_soft_deleted_for_import``.** New
  ``tests/unit_tests/commands/importers/v1/test_find_existing_for_import.py``
  (5 tests): no-match-returns-None, live-row-as-is,
  soft-deleted-row-as-is (pure lookup, no side effect),
  ``clear`` hard-deletes via ORM (listener-firing path), and the
  composed find-then-clear contract that's the documented caller
  sequence.

Deferred with explanation in the PR-comment reply (not in this
commit):

* **H2 — non-restorable / GDPR hook**: V2 concern. Designing the
  interface without a concrete use case risks getting it wrong;
  follow-up ticket if compliance work is ever scoped.

* **M8 — ``deleted_at`` schema mutation**: documented in the entity-
  rollout PR bodies (apache#40128/apache#40129/apache#40130) rather than blocking
  infrastructure.

40 unit tests pass (was 26; added 14 covering filter, mixin, importer
helpers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 25, 2026
The listener's loader_criteria was passing a concrete SQL expression
(``cls.where_not_deleted()``) which rendered as the raw table name
(``slices.deleted_at``) regardless of how the table was aliased in
the statement. When the same soft-deletable model appeared under an
alias in a JOIN — e.g., FAB's outer/inner reconstruction or
``report_schedule.chart`` aliasing ``slices AS chart`` — the JOIN's
ON clause produced ``Unknown column 'slices.deleted_at' in 'on
clause'`` and broke the entire query.

Surfaced as CI failures on the entity-rollout PRs (apache#40128 / apache#40129 /
apache#40130) — specifically pre-existing report-schedule tests that join
to ``slices AS chart`` started failing once ``Slice`` inherited
``SoftDeleteMixin``. The infrastructure PR itself doesn't exercise
this because no entity has the mixin yet.

Fix: switch the criteria to a lambda (the form Bayer's canonical
pattern uses). SQLAlchemy invokes the lambda per occurrence and
adapts the column reference to the alias at each site. The lambda's
body is trivial (``c.deleted_at.is_(None)``) so the
``DeferredLambdaElement`` parser handles it without issue — we
originally moved AWAY from a lambda because of an
``if cls in bypass_classes`` conditional that DeferredLambdaElement
mis-parsed as a SQL ``IN`` operator, but the conditional now lives
outside the lambda (in the per-class iteration), so the lambda body
itself is clean.

Test added:
test_listener_adapts_criteria_to_aliased_table_in_joins —
reproduces the aliased-join shape using the synthetic
``_SoftDeletableChild`` model. Without the fix, this query raises
``OperationalError: no such column: ..._soft_deletable_child.deleted_at``
because the criteria renders as the raw table name. With the fix,
the query runs cleanly.

45 unit tests pass (was 44 + 1 regression test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-106191-datasets-soft-delete branch 2 times, most recently from 5cb3b5f to bf87b03 Compare May 25, 2026 19:41
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 25, 2026
CI test-postgres (current) on PR apache#40130 cascade-failed with:

  psycopg2.errors.UniqueViolation: duplicate key value violates unique
  constraint "_customer_location_uc"
  DETAIL: Key (database_id, schema, table_name)=(1, public, birth_names)
          already exists.

Root cause: dashboard_utils.get_table queries SqlaTable through the
ORM, which the soft-delete listener now filters. When a prior test in
the same session soft-deletes the birth_names row (any test exercising
the new soft-delete DELETE behavior on datasets), the row stays in the
DB but is hidden from the helper. create_table_metadata's "doesn't
exist, INSERT" path then collides with the underlying (database_id,
schema, table_name) unique constraint that survives soft-delete (the
constraint is enforced even though the SQLAlchemy comment claims it's
not physical — Postgres test environments enforce it via a named
constraint that an old migration tried to drop).

Two-part fix in tests/integration_tests/dashboard_utils.py:

1. get_table attaches a per-query SKIP_VISIBILITY_FILTER_CLASSES
   execution option so it sees soft-deleted leftovers. Scoped to the
   single query — no leak to anything else in the session.

2. create_table_metadata restores any soft-deleted row it finds before
   mutating it for the new test's setup. Without this, the test would
   succeed in finding the row but then operate on a row whose
   `deleted_at` is still set, which the listener would hide from any
   subsequent API queries inside the same test.

Mirrors the pattern the dashboards branch (sc-106190) uses for
insert_dashboard's defensive cleanup, scoped narrowly to the test
helper and limited to SqlaTable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 28, 2026
CI test-postgres (current) on PR apache#40130 cascade-failed with:

  psycopg2.errors.UniqueViolation: duplicate key value violates unique
  constraint "_customer_location_uc"
  DETAIL: Key (database_id, schema, table_name)=(1, public, birth_names)
          already exists.

Root cause: dashboard_utils.get_table queries SqlaTable through the
ORM, which the soft-delete listener now filters. When a prior test in
the same session soft-deletes the birth_names row (any test exercising
the new soft-delete DELETE behavior on datasets), the row stays in the
DB but is hidden from the helper. create_table_metadata's "doesn't
exist, INSERT" path then collides with the underlying (database_id,
schema, table_name) unique constraint that survives soft-delete (the
constraint is enforced even though the SQLAlchemy comment claims it's
not physical — Postgres test environments enforce it via a named
constraint that an old migration tried to drop).

Two-part fix in tests/integration_tests/dashboard_utils.py:

1. get_table attaches a per-query SKIP_VISIBILITY_FILTER_CLASSES
   execution option so it sees soft-deleted leftovers. Scoped to the
   single query — no leak to anything else in the session.

2. create_table_metadata restores any soft-deleted row it finds before
   mutating it for the new test's setup. Without this, the test would
   succeed in finding the row but then operate on a row whose
   `deleted_at` is still set, which the listener would hide from any
   subsequent API queries inside the same test.

Mirrors the pattern the dashboards branch (sc-106190) uses for
insert_dashboard's defensive cleanup, scoped narrowly to the test
helper and limited to SqlaTable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-106191-datasets-soft-delete branch from 6a9640a to c9c9011 Compare May 28, 2026 19:59
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 28, 2026
Eight items from the 2026-05-20 review on the infrastructure PR:

* **H1 — subclass walk cached via ``__init_subclass__``.** The previous
  ``_all_soft_delete_subclasses()`` walked ``SoftDeleteMixin.__subclasses__()``
  on every primary ORM SELECT. With one adopter today the cost is
  negligible, but the listener fires on every query in the app and the
  walk grows linearly with each adopted entity. ``SoftDeleteMixin`` now
  maintains a sorted ``_registered_subclasses`` list updated by
  ``__init_subclass__`` at class-definition time; the listener reads
  the cached list with no walk. Sort happens once per registration
  (rare event) rather than per query.

* **M3 — ``SoftDeleteMixin`` import in ``BaseDAO.delete`` moved to
  top.** ``SKIP_VISIBILITY_FILTER_CLASSES`` from the same module was
  already top-level; ``SoftDeleteMixin`` had no real reason to be
  inline. Cleaner.

* **M4 — pylint-disable in ``raise_for_ownership`` documented.** A
  top-level import here actually does cause a circular
  (security ↔ models.core ↔ superset's lazy
  ``feature_flag_manager``). Kept the inline import but added a
  comment naming the cycle so future maintainers don't try to "fix"
  it and re-trip the issue.

* **M5 — ``BaseDeletedStateFilter.model`` typed as
  ``ClassVar[type[SoftDeleteMixin]]``.** Previously ``Any``, which
  meant a subclass accidentally binding ``model`` to a
  non-soft-deletable entity would crash at runtime on
  ``.deleted_at`` rather than failing mypy.

* **M6 — listener placement documented.** ``setup_soft_delete_listener``
  is called outside the ``with app_context()`` block because the
  ``do_orm_execute`` hook attaches to the ``Session`` class
  (process-wide), not to a Session instance, so it doesn't need
  Flask app context. Added an explaining comment in
  ``init_app_in_ctx``'s caller; ``setup_db()`` already ran earlier
  so the Session import is initialised.

* **M7 — unit tests for ``BaseDeletedStateFilter`` +
  ``SoftDeleteApiMixin``.** New
  ``tests/unit_tests/views/test_soft_delete_filter.py`` (10 tests):
  filter no-op / include / only / case-insensitivity / unknown value;
  mixin no-op-without-flag / inject-with-flag / consume-flag-on-call;
  ``_serialize_deleted_at`` None / datetime handling.

* **Nit — explicit ``orig_resource is None`` guard in
  ``raise_for_ownership``.** Previously fell through to
  ``hasattr(None, "owners")`` returning ``False`` → ``owners = []`` →
  generic ownership error. Now raises the proper "resource removed
  before ownership could be verified" error so the cause is clear in
  the rare race-with-hard-delete case.

* **Nit — unit tests for ``find_existing_for_import`` /
  ``clear_soft_deleted_for_import``.** New
  ``tests/unit_tests/commands/importers/v1/test_find_existing_for_import.py``
  (5 tests): no-match-returns-None, live-row-as-is,
  soft-deleted-row-as-is (pure lookup, no side effect),
  ``clear`` hard-deletes via ORM (listener-firing path), and the
  composed find-then-clear contract that's the documented caller
  sequence.

Deferred with explanation in the PR-comment reply (not in this
commit):

* **H2 — non-restorable / GDPR hook**: V2 concern. Designing the
  interface without a concrete use case risks getting it wrong;
  follow-up ticket if compliance work is ever scoped.

* **M8 — ``deleted_at`` schema mutation**: documented in the entity-
  rollout PR bodies (apache#40128/apache#40129/apache#40130) rather than blocking
  infrastructure.

40 unit tests pass (was 26; added 14 covering filter, mixin, importer
helpers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 28, 2026
The listener's loader_criteria was passing a concrete SQL expression
(``cls.where_not_deleted()``) which rendered as the raw table name
(``slices.deleted_at``) regardless of how the table was aliased in
the statement. When the same soft-deletable model appeared under an
alias in a JOIN — e.g., FAB's outer/inner reconstruction or
``report_schedule.chart`` aliasing ``slices AS chart`` — the JOIN's
ON clause produced ``Unknown column 'slices.deleted_at' in 'on
clause'`` and broke the entire query.

Surfaced as CI failures on the entity-rollout PRs (apache#40128 / apache#40129 /
apache#40130) — specifically pre-existing report-schedule tests that join
to ``slices AS chart`` started failing once ``Slice`` inherited
``SoftDeleteMixin``. The infrastructure PR itself doesn't exercise
this because no entity has the mixin yet.

Fix: switch the criteria to a lambda (the form Bayer's canonical
pattern uses). SQLAlchemy invokes the lambda per occurrence and
adapts the column reference to the alias at each site. The lambda's
body is trivial (``c.deleted_at.is_(None)``) so the
``DeferredLambdaElement`` parser handles it without issue — we
originally moved AWAY from a lambda because of an
``if cls in bypass_classes`` conditional that DeferredLambdaElement
mis-parsed as a SQL ``IN`` operator, but the conditional now lives
outside the lambda (in the per-class iteration), so the lambda body
itself is clean.

Test added:
test_listener_adapts_criteria_to_aliased_table_in_joins —
reproduces the aliased-join shape using the synthetic
``_SoftDeletableChild`` model. Without the fix, this query raises
``OperationalError: no such column: ..._soft_deletable_child.deleted_at``
because the criteria renders as the raw table name. With the fix,
the query runs cleanly.

45 unit tests pass (was 44 + 1 regression test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 28, 2026
CI test-postgres (current) on PR apache#40130 cascade-failed with:

  psycopg2.errors.UniqueViolation: duplicate key value violates unique
  constraint "_customer_location_uc"
  DETAIL: Key (database_id, schema, table_name)=(1, public, birth_names)
          already exists.

Root cause: dashboard_utils.get_table queries SqlaTable through the
ORM, which the soft-delete listener now filters. When a prior test in
the same session soft-deletes the birth_names row (any test exercising
the new soft-delete DELETE behavior on datasets), the row stays in the
DB but is hidden from the helper. create_table_metadata's "doesn't
exist, INSERT" path then collides with the underlying (database_id,
schema, table_name) unique constraint that survives soft-delete (the
constraint is enforced even though the SQLAlchemy comment claims it's
not physical — Postgres test environments enforce it via a named
constraint that an old migration tried to drop).

Two-part fix in tests/integration_tests/dashboard_utils.py:

1. get_table attaches a per-query SKIP_VISIBILITY_FILTER_CLASSES
   execution option so it sees soft-deleted leftovers. Scoped to the
   single query — no leak to anything else in the session.

2. create_table_metadata restores any soft-deleted row it finds before
   mutating it for the new test's setup. Without this, the test would
   succeed in finding the row but then operate on a row whose
   `deleted_at` is still set, which the listener would hide from any
   subsequent API queries inside the same test.

Mirrors the pattern the dashboards branch (sc-106190) uses for
insert_dashboard's defensive cleanup, scoped narrowly to the test
helper and limited to SqlaTable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-106191-datasets-soft-delete branch from c9c9011 to 83445b3 Compare May 28, 2026 20:08
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 28, 2026
Three SQLAlchemy-review follow-ups in import_dataset, mirroring the
charts (16b5ae9) and dashboards (adf511b) passes:

1. **Explicit flush on restore-and-update path + rewritten comment.**
   The implicit-restore flow relied on autoflush to push the deleted_at
   = NULL update before SqlaTable.import_from_dict's internal query-by-
   uuid lookup ran. Without it, the listener would have filtered the
   row out, import_from_dict would have taken the "new object" branch,
   and the insert would have hit the uuid unique constraint at flush
   time. Adding db.session.flush() after clearing deleted_at makes the
   contract independent of autoflush. The inline comment is rewritten
   to describe the actual mechanism (the previous wording claimed
   config["id"] "routes import_from_dict through UPDATE", but
   ImportExportMixin strips non-export_fields keys — so config["id"]
   is defensive and the real bind happens via uuid uniqueness inside
   import_from_dict).

2. **Force sync on the implicit-restore path.** Previously
   `sync = ["columns", "metrics"] if overwrite else []`, so re-importing
   a soft-deleted dataset hit `sync = []` and ImportExportMixin would
   upsert children by UUID without removing rows present in the live
   DB but absent from the upload. Net effect: a stale-then-restored
   dataset carried orphan columns from before the soft-delete — a
   surprising merge of two states. Now ``sync = ["columns", "metrics"]
   if (overwrite or is_soft_deleted_match) else []`` so re-import of a
   soft-deleted UUID is treated as a clean replacement, matching what
   an explicit overwrite would do anyway.

   New test test_import_soft_deleted_dataset_restore_removes_orphan_children
   pins this contract.

3. **MultipleResultsFound fallback bypass.** The post-exception query
   `db.session.query(SqlaTable).filter_by(uuid=...).one()` didn't
   bypass the soft-delete listener, so a soft-deleted duplicate would
   be hidden and `.one()` would raise NoResultFound, masking the
   original MultipleResultsFound. Added SKIP_VISIBILITY_FILTER_CLASSES
   execution option to the fallback so it can locate either state.

All three items raised by the SQLAlchemy lens review on PR apache#40130.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-106191-datasets-soft-delete branch from acc8884 to 72049a9 Compare June 1, 2026 18:50
Mike Bridge and others added 26 commits June 11, 2026 11:42
The migration ``down_revision`` was ``33d7e0e21daa`` (the
``add_semantic_layers_and_views`` migration from 2025-11-04), but
master has advanced seven revisions to ``a1b2c3d4e5f6``
(``add_granular_export_permissions`` from 2026-03-02). Without this
update, ``alembic upgrade head`` fails with "Multiple head
revisions present" once this migration lands on a database that
has already applied master's chain.

The sibling charts (apache#40129) and dashboards (apache#40128) PRs need the
same update. Cross-PR merge-order coordination still applies —
whichever lands first stays at ``a1b2c3d4e5f6``, the others rebase
onto the newly-merged revision.

Also updated the ``Revises:`` docstring line on the migration's
module header for consistency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ic head)

CI failed across all three soft-delete PRs with "Multiple head
revisions present" after the recent down_revision update. Root
cause: I misread the alembic chain order based on file dates
rather than the actual ``down_revision`` linkage.

Tracing the chain on master:

  4b2a8c9d3e1f → a1b2c3d4e5f6 → ce6bd21901ab → 33d7e0e21daa

Nothing chains off ``33d7e0e21daa``; ``ce6bd21901ab`` chains off
``a1b2c3d4e5f6``. The true alembic head is ``33d7e0e21daa``, not
``a1b2c3d4e5f6``. File-creation dates are misleading because
``33d7e0e21daa`` (created 2025-11-04 on a feature branch) was
merged after the 2026-03-02 batch via a different branch — file
dates do not reflect alembic chain order in the codebase.

Reverting both ``down_revision`` and the matching ``Revises:``
docstring line back to ``33d7e0e21daa``.

The sibling charts (apache#40129) and dashboards (apache#40128) PRs get the
same revert in companion commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per bito's review on apache#40130 comment-4606035962 — adds one-line
docstrings to ``upgrade()`` and ``downgrade()`` in the datasets
soft-delete migration. The functions are short and self-evident
from their body, but the docstrings give IDE tooltips and Sphinx-
style migration listings a per-function summary that doesn't
require reading the body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ``test_restore_blocked_by_active_logical_duplicate`` integration
test was failing on test-sqlite and test-postgres with
``IntegrityError: UNIQUE constraint failed`` (or the equivalent
``_customer_location_uc`` violation on Postgres). Root cause: the
``tables`` table carries two DB-level unique constraints on the
logical identity:

1. ``UniqueConstraint("database_id", "catalog", "schema",
   "table_name")`` declared in the SqlaTable model.
2. ``_customer_location_uc`` on ``(database_id, schema,
   table_name)`` from the 2016 ``b4456560d4f3`` migration, still
   present on test databases (the 2024 ``df3d7e2eb9a4`` removal
   migration runs only on a fresh upgrade chain).

Either constraint rejects the test's ``db.session.add(twin) +
db.session.commit()`` step before the application-level restore
check can be exercised, so the scenario the test was written for
(soft-deleted A coexisting with active B at the same logical
identity) is unreachable in practice.

Richard's review concern at ``api.py:963`` and Amin's matching
HIGH finding at ``restore.py:46`` were about the *application
layer* uniqueness check being filtered (visibility filter hides
soft-deleted rows from the DAO's ``validate_uniqueness``). The
DB-level constraint masks this in production: a delete -> create
sequence is rejected by the DB with an opaque IntegrityError
rather than a clean 422.

The application-level fixes are kept as defense-in-depth:

- ``DatasetDAO.validate_uniqueness`` /
  ``validate_update_uniqueness`` use
  ``execution_options(SKIP_VISIBILITY_FILTER_CLASSES={SqlaTable})``
  so the application returns a clean 422 instead of a 500
  IntegrityError on the create path. This is exercised by the
  surviving ``test_create_blocked_by_soft_deleted_logical_duplicate``
  integration test, which goes through the API and produces a 422
  before the DB constraint fires.
- ``RestoreDatasetCommand._has_active_logical_duplicate`` is
  retained for symmetry — fires cleanly if the DB constraint is
  ever relaxed (it has been removed once before per
  ``df3d7e2eb9a4``).
- Unit-level coverage of the restore-side check is preserved in
  ``tests/unit_tests/commands/dataset/restore_test.py::
  test_restore_dataset_logical_duplicate_raises`` (mocked
  ``_has_active_logical_duplicate``).

Replaced the deleted integration test with a comment block at the
same location documenting why the test is unreachable, so a future
contributor doesn't re-introduce it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ise on multiples

CI was failing on test-mysql with ``MultipleResultsFound`` during
fixture setup for ``TestEmbeddedDashboardDAO.test_get_by_uuid`` (and
cascading through many downstream tests in the same suite). Root
cause: the ``get_table`` test helper in
``tests/integration_tests/dashboard_utils.py`` used
``.one_or_none()`` after the soft-delete-bypass execution_option,
and the bypass exposes *both* a live row and a soft-deleted row when
they coexist with the same ``(database_id, schema, table_name)`` but
different ``catalog`` (e.g. ``NULL`` vs ``"public"``). The filter
in this helper doesn't include catalog, so the bypass query
legitimately matches more than one row — and ``.one_or_none()`` is
intolerant of that.

Postgres and SQLite happened to run the suite in an order that didn't
materialize the two-row state for this query; MySQL did, so the
failure showed up there first.

Replaced ``.one_or_none()`` with
``.order_by(SqlaTable.deleted_at.is_(None).desc(), SqlaTable.id).first()``.
The live row sorts first; the soft-deleted row is the fallback. The
helper's downstream contract ("return the row if one exists, else
``None``") is preserved without raising on the multi-row case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ring cleanup

Two follow-ups from the latest reviews on apache#40130:

1. (Richard) Restore-via-import: roll back ``deleted_at`` when the
   legacy ``MultipleResultsFound`` fallback fires on the
   soft-deleted path.

   Prior flow on the soft-deleted-restore path:
   - clear ``existing.deleted_at = None``
   - flush
   - call ``SqlaTable.import_from_dict(config, ...)``
   - if it raises ``MultipleResultsFound`` (legacy NULL-schema
     ambiguity), the fallback returns the row unmodified

   The unmodified return is wrong in the soft-deleted case: the
   row's ``deleted_at`` was already cleared, but the upload
   contents were not applied. The operator gets a "restore" they
   didn't ask for, with no warning that the upload was dropped.

   Fix: snapshot ``existing.deleted_at`` before the clear (capturing
   it in a local). When ``MultipleResultsFound`` fires AND
   ``is_soft_deleted_match`` is True, restore the snapshotted
   ``deleted_at`` value, flush, and raise ``ImportFailedError``
   with a message explaining the legacy-ambiguity cause. The
   operator can then resolve the duplicate manually (rename one of
   the conflicting rows) and retry. The non-soft-deleted overwrite
   path keeps the legacy "return unmodified" contract since the
   semantics there match the original 2018 fix.

2. (bito) Docstring on
   ``test_create_blocked_by_soft_deleted_logical_duplicate``
   referenced ``test_restore_blocked_by_active_logical_duplicate``,
   which was removed in c4560c1 (unreachable due to DB-level
   unique constraint). Rewrote the docstring to describe the
   actual contract the test now pins: ``validate_uniqueness``
   bypasses the visibility filter, so a soft-deleted twin blocks
   creates at the application layer with a clean 422 instead of an
   opaque 500 IntegrityError from the DB-level constraint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…" in all .po

`pybabel update` fuzzy-matches the new exception string against the existing
"Dataset could not be created."/"updated." entries in every catalog, which
trips the translation-regression CI check.

Pre-seeding an empty `msgstr ""` for the new msgid keeps pybabel from
inventing a fuzzy match — translators can fill in the real string later
through the normal flow.

Affects all 26 active catalogs.
… guard

When ``find_existing_for_import`` resolves a soft-deleted row by UUID and
``import_from_dict`` subsequently hits the legacy NULL-schema ambiguity
(``MultipleResultsFound``), the importer now rolls back the ``deleted_at``
clear and raises ``ImportFailedError`` — see commit 65d0e5b.

The pre-existing test was written for the older behaviour (return the
soft-deleted row from the fallback ``.one()`` via the visibility-filter
bypass). With the guard in place that path is no longer reached: the
soft-delete branch raises before the fallback runs.

Rewrite the assertion to pin the new contract:

  1. ``ImportFailedError`` is raised with the "ambiguous legacy duplicate"
     message.
  2. The soft-deleted row's ``deleted_at`` is unchanged after the call —
     no half-restored state.
``DatasetDAO.validate_uniqueness`` and ``validate_update_uniqueness``
build an inner SqlaTable query, set
``execution_options(SKIP_VISIBILITY_FILTER_CLASSES=...)`` on it, and
then wrap it in ``db.session.query(inner.exists()).scalar()``.

The per-query bypass never reaches the soft-delete listener: the
OUTER query is what executes, and its ``execution_options`` are
independent of the inner Query's. The listener sees no bypass and
attaches ``with_loader_criteria(SqlaTable, deleted_at IS NULL,
include_aliases=True)`` to the outer statement, which propagates
the filter into the EXISTS subquery and hides any soft-deleted row.

Repro confirmed by attaching a real ``do_orm_execute`` listener to a
SQLite session: with the per-query bypass, the listener fires with
``per_query=None, per_session=None``, applies the filter, and EXISTS
returns False (soft-deleted row excluded). The compiled SQL is
identical on Postgres, MySQL, and SQLite — so this is broken on all
three dialects in principle. In CI, Postgres and SQLite happen to
pass anyway (likely a session-info leak from an earlier test), but
MySQL trips ``test_create_blocked_by_soft_deleted_logical_duplicate``
deterministically: the create returns 201 instead of 422, leaving a
duplicate row that cascades into
``test_restore_soft_deleted_dataset``.

Fix: wrap the EXISTS scalar() in the documented
``skip_visibility_filter(db.session, SqlaTable)`` context manager —
session.info is read by the listener regardless of which Query
execute fires it (the canonical pattern documented in helpers.py for
exactly this FAB-style inner/outer reconstruction case).

Drop the now-dead ``SKIP_VISIBILITY_FILTER_CLASSES`` import as it is
no longer referenced from this module.
…resent

Adds the first test coverage for migration ``3a8e6f2c1b95_add_deleted_at_to_tables``:

* ``upgrade()`` and ``downgrade()`` are exercised end-to-end against an
  in-memory SQLite engine with a real Alembic ``Operations`` context —
  not just the helper functions in isolation.
* The downgrade-with-soft-deleted-rows-present case is pinned as a
  contract: rows that had ``deleted_at`` set before downgrade survive
  the migration and become indistinguishable from live rows. This
  matches the "Rollback note" already documented in ``UPDATING.md``;
  the test makes the behaviour regression-proof.
* Idempotency of both directions is covered: the migration helpers
  (``add_columns``, ``create_index``, ``drop_columns``, ``drop_index``)
  are skip-if-exists / skip-if-not-exists, and running either direction
  twice must not raise. Future helper-implementation changes that
  silently break this property would fail the test.

The test uses a minimal pre-seeded ``tables`` stub (id + table_name)
rather than the real model — the migration only touches ``deleted_at``
and its index, so only the columns that participate need to exist.
…test

Superset is on SQLAlchemy 1.4; ``Connection`` doesn't have a ``commit()``
method until 2.0. The implicit transaction over the single ``with
engine.connect() as conn:`` block keeps reads-after-writes visible
within the same connection.
Address review findings on the datasets soft-delete/restore work:

- Extract the active-logical-duplicate query duplicated across
  RestoreDatasetCommand and the v1 importer into a single
  DatasetDAO.has_active_logical_duplicate(model) helper, and normalize
  the catalog (catalog or database default) the same way
  validate_uniqueness does so create/update/restore/import agree on
  what counts as the same physical table.
- Make _escape_like coerce non-string filter values to str, closing an
  AttributeError server-error path for LIKE-family operators whose
  ColumnOperator.value is typed Any.
- Tag the deferred superset.models.helpers imports in DatasetDAO with
  the constitution's "avoid app-init regression" rationale (PR apache#40573).
- Repoint restore unit tests at the DAO helper and add DAO-level tests
  pinning the catalog-normalization behaviour.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…docstring

Follow-up from the local multi-lens review of the soft-delete remediation:

- Add unit tests pinning _escape_like's non-string/None coercion (the fix
  closing the AttributeError LIKE path had shipped without a test), plus an
  operator-level guard that a numeric LIKE value no longer raises.
- Sharpen DatasetDAO.has_active_logical_duplicate's docstring: state the
  one-sided catalog normalization honestly (the symmetric stored-NULL case is
  not caught), restore the dropped warning against adding skip_visibility_filter
  here, and note the session-attached model.database assumption.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aset_test.py

Move the has_active_logical_duplicate catalog-normalization tests from the
stray tests/unit_tests/daos/test_dataset_dao.py into the existing
tests/unit_tests/dao/dataset_test.py beside the other DatasetDAO tests, and
update the breadcrumb in soft_delete_tests.py to the new path. No behaviour
change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ry/finally cleanup

- get_table (dashboard_utils) now includes the normalized catalog in the
  lookup so a fixture can't bind to a different catalog variant when rows
  share (database_id, schema, table_name) (codeant).
- Wrap the five TestDatasetSoftDelete methods that soft-delete-then-restore
  in try/finally via a _restore_dataset helper, so a failed assertion can't
  strand a soft-deleted row for later tests (eschutho).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… default

Follow-up to the catalog-determinism change: filtering get_table on only
database.get_default_catalog() regressed find-or-create for the example
datasets, which are seeded with catalog=NULL. Under the Postgres examples
DB used in CI, get_default_catalog() returns a non-NULL string, so the
strict equality matched nothing and create_table_metadata wrongly took the
create branch, colliding with the surviving unique constraint.

Match catalog IS NULL OR catalog == default (the way validate_uniqueness's
'table.catalog or default' treats them), which keeps the NULL-seeded rows
findable on every dialect while still excluding an unrelated third catalog.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e msgid to .pot

- Assert the soft-delete DELETE returns 200 in the list/filter/no-cascade
  tests (bito CWE-483: the status was previously unchecked there).
- Add 'Dataset could not be restored.' to messages.pot (the .po placeholders
  were already seeded across all languages; the template was missing it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four review fixes (apache#40130) for paths that must still account for soft-deleted
SqlaTable rows now that datasets join the visibility filter:

- importer (utils): key Case B on is_soft_deleted (not the fused
  needs_mutation) so an active dataset re-imported with overwrite=True but no
  can_write returns the existing row; and block the create path when a
  soft-deleted dataset already claims the same physical table (new UUID would
  otherwise create an active twin of a hidden row), mirroring REST create's
  validate_uniqueness. Adds DatasetDAO.find_soft_deleted_logical_duplicate.

- database delete: count datasets with the visibility filter bypassed, so a
  database whose datasets are all soft-deleted still blocks hard deletion
  instead of looking empty.

- security manager: the database-rename perm maintenance enumerates datasets
  with the visibility filter bypassed, so soft-deleted datasets' perm strings
  (and their charts') are rewritten too — no stale perms resurrected on restore.

- DAO uniqueness: null-aware catalog predicate (_catalog_identity_filter)
  shared by validate_uniqueness/validate_update_uniqueness/
  has_active_logical_duplicate, so a default-catalog twin is caught whether
  either row stored catalog as the default value or NULL.

Adds unit coverage (DAO null-aware + twin finder, importer Case B + twin block,
database delete) and an integration test for soft-deleted rename perm rewrite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DeleteDatabaseCommand now counts soft-deleted datasets (bypassing the
visibility filter) so they still block a database delete. Document the
consequence: with soft-delete and no purge in v1, a database that has had
datasets can't be deleted via the API once they're soft-deleted, until the
planned purge capability lands.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… migration

Two pre-merge fixes (the migration one is the active cause of CI's multi-head
cascade):

- DatasetDeletedStateFilter now restricts soft-deleted rows to the restore
  audience (owners/admins), mirroring Dashboard/Chart DeletedStateFilter and
  RestoreDatasetCommand's raise_for_ownership — a read-access non-owner can no
  longer enumerate soft-deleted datasets they could never restore. Live rows
  keep normal DatasourceFilter visibility. Adds integration coverage (owner sees
  own deleted; gamma with datasource access does not).

- Re-point the add_deleted_at_to_tables migration's down_revision from
  33d7e0e21daa to 31dae2559c05 (master's current head) so it no longer creates
  an alembic multi-head against current master (which was failing docker-build
  and all integration jobs). Will be re-pointed onto the charts migration during
  the charts->dashboards->datasets merge sequence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror of the dashboards/charts split: import_dataset no longer fuses
overwrite and restore behind needs_mutation. A soft-deleted match is a
RESTORE (own can_write Case-B gate + ownership check + active-logical-
duplicate guard + snapshot/clear deleted_at); an alive match is an
OVERWRITE (return existing unless overwrite+can_write, then ownership
check). The MultipleResultsFound rollback (original_deleted_at) and the
create-path soft-deleted-duplicate guard are preserved unchanged.
Behaviour unchanged; importer matrix tests still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parallel to the dashboards restore-failure test: assert that an error
raised inside RestoreDatasetCommand.run surfaces as a clean 422 via the
DatasetRestoreFailedError handler rather than an unhandled 500.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review nit: the restore-failure test's cleanup block duplicated
the _restore_dataset() helper. Call the helper instead (DRY; also safer —
it uses one_or_none()).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…all broke CI

2fc212e called self._restore_dataset from TestDatasetRestore, but the
helper was defined on TestDatasetSoftDelete (no inheritance), failing all
backend CI jobs with AttributeError. Hoist the helper to module level and
keep a thin class-method shim for the existing TestDatasetSoftDelete call
sites. Verified against the local Postgres test DB.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…raint docs

Review-panel findings (multi-lens review of this branch):

- DuplicateDatasetCommand: replace the name-only, visibility-filtered
  find_one_or_none pre-check with the shared DatasetDAO.validate_uniqueness
  (scoped to the base model's database/schema, catalog-NULL-aware, bypasses
  the soft-delete filter). A soft-deleted twin was invisible to the old
  check, so the duplicate proceeded into an IntegrityError — or an active
  twin that permanently blocks restore where no DB constraint applies.
  New session-backed unit test pins the refusal.

- Correct the constraint story in comments/docstrings (restore.py,
  restore_test.py, soft_delete_tests.py): DB-level enforcement is
  inconsistent across schema builds. create_all materializes the
  metadata-only 4-column model constraint; migration-built databases still
  carry the legacy 3-column _customer_location_uc because the 2024
  df3d7e2eb9a4 drop migration is a silent no-op (passes a list to
  generic_find_uq_constraint_name, which compares against a set — never
  matches; verified empirically on Postgres). The application-level checks
  give a clean 422 either way.

- importer: use existing.restore() instead of writing deleted_at directly;
  chain the MultipleResultsFound cause (from ex, not from None) so the
  ambiguous-legacy-duplicate error keeps its traceback.

- filters.py: drop docstring references to sibling filter classes that do
  not exist in this tree yet.

- New test: build_dataset_query excludes soft-deleted rows via Core
  execution — the one path the do_orm_execute listener cannot cover; the
  explicit deleted_at IS NULL clause now has a failing test if removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@bito-code-review bito-code-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #0a8946

Actionable Suggestions - 4
  • tests/integration_tests/security_tests.py - 1
  • superset/datasets/api.py - 1
  • tests/integration_tests/datasets/soft_delete_tests.py - 1
  • superset/translations/zh_TW/LC_MESSAGES/messages.po - 1
Additional Suggestions - 7
  • superset/daos/dataset.py - 2
    • Missing catalog tests for validate_update_uniqueness · Line 232-263
      `validate_update_uniqueness` now calls `database.get_default_catalog()` and `_catalog_identity_filter` for null-aware catalog matching, but the existing test at line 32 only exercises schema differences. Two catalog-coverage patterns exist for `has_active_logical_duplicate` (lines 286–302 and 308–322) and should be replicated here: (1) passing `catalog=None` with a mocked default must still block a twin, and (2) an explicit non-default catalog must not falsely match a `NULL` row.
    • Missing unit tests for catalog predicate · Line 158-181
      `_catalog_identity_filter` is tested indirectly via `has_active_logical_duplicate` and `find_soft_deleted_logical_duplicate`, but `validate_uniqueness` and `validate_update_uniqueness` — the two call sites added in this diff — have no direct unit tests covering the null-aware catalog predicate. Add tests in `tests/unit_tests/dao/dataset_test.py` following the pattern in `test_logical_duplicate_catalog_predicate_is_null_aware` (line 90).
  • superset/translations/ar/LC_MESSAGES/messages.po - 1
    • Missing Arabic translation · Line 4697-4698
      The `msgstr` is empty. Arabic users with this locale will see the untranslated English string "Dataset could not be restored." instead of the proper Arabic translation. Compare with the pattern used for similar messages in the same file (e.g., "تعذر إنشاء مجموعة بيانات." at line 4692).
  • superset/translations/it/LC_MESSAGES/messages.po - 1
    • Missing Italian translation · Line 4602-4603
      The new `msgid "Dataset could not be restored."` has an empty `msgstr ""`. Italian users will see untranslated English text when this error occurs. Other similar 'Dataset could not be X' messages in this file use the translation 'La tua query non può essere salvata' — consider following the same pattern.
  • superset/commands/dataset/importers/v1/utils.py - 1
    • Consistency: Import Source Mismatch · Line 39-39
      The `SKIP_VISIBILITY_FILTER_CLASSES` constant is imported from `superset.constants` here, but the same file's sibling module `superset/commands/importers/v1/utils.py` imports it from `superset.models.helpers`. The canonical source is `superset.constants`; other files re-export via `superset.models.helpers`. For consistency with the broader import pattern in this command layer, prefer `superset.models.helpers`.
  • tests/integration_tests/datasets/soft_delete_tests.py - 2
    • Semantic duplication in cleanup block · Line 379-391
      The `finally` block re-implements the visibility-filter bypass + conditional restore pattern that the module-level `_restore_dataset` helper (lines 36-52) already provides. Extract the restore logic to avoid duplication.
    • Identical code to module helper · Line 455-462
      The finally block is an exact duplicate of the module-level `_restore_dataset` helper (lines 36-52). Replace with a call to the helper.
Review Details
  • Files reviewed - 52 · Commit Range: 40ecaa8..9a501c5
    • superset/commands/database/delete.py
    • superset/commands/dataset/duplicate.py
    • superset/commands/dataset/exceptions.py
    • superset/commands/dataset/importers/v1/utils.py
    • superset/commands/dataset/restore.py
    • superset/connectors/sqla/models.py
    • superset/daos/base.py
    • superset/daos/dataset.py
    • superset/daos/datasource.py
    • superset/datasets/api.py
    • superset/datasets/filters.py
    • superset/migrations/versions/2026-05-08_12-10_3a8e6f2c1b95_add_deleted_at_to_tables.py
    • superset/security/manager.py
    • superset/translations/ar/LC_MESSAGES/messages.po
    • superset/translations/ca/LC_MESSAGES/messages.po
    • superset/translations/cs/LC_MESSAGES/messages.po
    • superset/translations/de/LC_MESSAGES/messages.po
    • superset/translations/en/LC_MESSAGES/messages.po
    • superset/translations/es/LC_MESSAGES/messages.po
    • superset/translations/fa/LC_MESSAGES/messages.po
    • superset/translations/fi/LC_MESSAGES/messages.po
    • superset/translations/fr/LC_MESSAGES/messages.po
    • superset/translations/it/LC_MESSAGES/messages.po
    • superset/translations/ja/LC_MESSAGES/messages.po
    • superset/translations/ko/LC_MESSAGES/messages.po
    • superset/translations/lv/LC_MESSAGES/messages.po
    • superset/translations/messages.pot
    • superset/translations/mi/LC_MESSAGES/messages.po
    • superset/translations/nl/LC_MESSAGES/messages.po
    • superset/translations/pl/LC_MESSAGES/messages.po
    • superset/translations/pt/LC_MESSAGES/messages.po
    • superset/translations/pt_BR/LC_MESSAGES/messages.po
    • superset/translations/ru/LC_MESSAGES/messages.po
    • superset/translations/sk/LC_MESSAGES/messages.po
    • superset/translations/sl/LC_MESSAGES/messages.po
    • superset/translations/th/LC_MESSAGES/messages.po
    • superset/translations/tr/LC_MESSAGES/messages.po
    • superset/translations/uk/LC_MESSAGES/messages.po
    • superset/translations/zh/LC_MESSAGES/messages.po
    • superset/translations/zh_TW/LC_MESSAGES/messages.po
    • tests/integration_tests/dashboard_utils.py
    • tests/integration_tests/datasets/api_tests.py
    • tests/integration_tests/datasets/soft_delete_tests.py
    • tests/integration_tests/security_tests.py
    • tests/unit_tests/commands/dataset/restore_test.py
    • tests/unit_tests/commands/dataset/test_duplicate.py
    • tests/unit_tests/dao/base_dao_test.py
    • tests/unit_tests/dao/dataset_test.py
    • tests/unit_tests/dao/datasource_test.py
    • tests/unit_tests/databases/commands/delete_test.py
    • tests/unit_tests/datasets/commands/importers/v1/import_test.py
    • tests/unit_tests/migrations/test_add_deleted_at_to_tables.py
  • Files skipped - 1
    • UPDATING.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread tests/integration_tests/security_tests.py
Comment thread superset/datasets/api.py
Comment thread tests/integration_tests/datasets/soft_delete_tests.py
Comment thread superset/translations/zh_TW/LC_MESSAGES/messages.po
…NG.md

Panel-review docs parity with dashboards/charts: DELETE and bulk-delete
OpenAPI docstrings now describe soft-delete semantics and the restore
path; UPDATING.md notes bulk delete is also soft and that the
deleted-state filter is owner-scoped for non-admins.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mikebridge

Copy link
Copy Markdown
Contributor Author

Final triage sweep of the open bot threads (codeant + bito), regenerated by the recent pushes — resolving as not actionable:

  • Docstring suggestions (migration upgrade/downgrade fns, exception classes, test helpers): not this repo's convention — migration functions and test helpers are consistently undocstringed across the codebase; the substantive helpers added by this PR carry docstrings where they earn them.
  • Empty msgstr entries: by design — new source strings ship with empty msgstr (gettext falls back to the English msgid); translations land via the project translation workflow. Documented in earlier triage comments on this PR.
  • Migration filename hyphens: the YYYY-MM-DD_HH-MM_hash_description.py pattern is Superset's documented migration naming convention (CLAUDE.md); alembic loads migrations by path, not import. The file compiles and CI is green.
  • Test-hygiene nits (try/finally placement, mock verification): cosmetic; suites are green and the integration DB is rebuilt per branch.
  • "Missing integration test" for the restore-failure 422 handler: it exists — tests/integration_tests/datasets/soft_delete_tests.py::TestDatasetRestore::test_restore_failure_returns_422 patches the command to raise and asserts the 422 mapping.

@bito-code-review

bito-code-review Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Code Review Agent Run #8c0fd2

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 9a501c5..2f57b0b
    • superset/datasets/api.py
  • Files skipped - 1
    • UPDATING.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the REST API change:backend Requires changing the backend data:dataset Related to dataset configurations i18n:brazilian i18n:chinese Translation related to Chinese language i18n:czech i18n:dutch i18n:french Translation related to French language i18n:italian Translation related to Italian language i18n:japanese Translation related to Japanese language i18n:korean Translation related to Korean language i18n:latvian i18n:persian i18n:portuguese i18n:russian Translation related to Russian language i18n:slovak i18n:spanish Translation related to Spanish language i18n:traditional-chinese i18n:ukrainian i18n Namespace | Anything related to localization risk:db-migration PRs that require a DB migration size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants