feat(datasets): soft-delete and restore#40130
Conversation
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #40130 +/- ##
==========================================
- Coverage 64.30% 59.90% -4.40%
==========================================
Files 2657 797 -1860
Lines 144060 67014 -77046
Branches 33216 8288 -24928
==========================================
- Hits 92641 40148 -52493
+ Misses 49797 25238 -24559
- Partials 1622 1628 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
fefb486 to
ec98d74
Compare
Eight items from the 2026-05-20 review on the infrastructure PR: * **H1 — subclass walk cached via ``__init_subclass__``.** The previous ``_all_soft_delete_subclasses()`` walked ``SoftDeleteMixin.__subclasses__()`` on every primary ORM SELECT. With one adopter today the cost is negligible, but the listener fires on every query in the app and the walk grows linearly with each adopted entity. ``SoftDeleteMixin`` now maintains a sorted ``_registered_subclasses`` list updated by ``__init_subclass__`` at class-definition time; the listener reads the cached list with no walk. Sort happens once per registration (rare event) rather than per query. * **M3 — ``SoftDeleteMixin`` import in ``BaseDAO.delete`` moved to top.** ``SKIP_VISIBILITY_FILTER_CLASSES`` from the same module was already top-level; ``SoftDeleteMixin`` had no real reason to be inline. Cleaner. * **M4 — pylint-disable in ``raise_for_ownership`` documented.** A top-level import here actually does cause a circular (security ↔ models.core ↔ superset's lazy ``feature_flag_manager``). Kept the inline import but added a comment naming the cycle so future maintainers don't try to "fix" it and re-trip the issue. * **M5 — ``BaseDeletedStateFilter.model`` typed as ``ClassVar[type[SoftDeleteMixin]]``.** Previously ``Any``, which meant a subclass accidentally binding ``model`` to a non-soft-deletable entity would crash at runtime on ``.deleted_at`` rather than failing mypy. * **M6 — listener placement documented.** ``setup_soft_delete_listener`` is called outside the ``with app_context()`` block because the ``do_orm_execute`` hook attaches to the ``Session`` class (process-wide), not to a Session instance, so it doesn't need Flask app context. Added an explaining comment in ``init_app_in_ctx``'s caller; ``setup_db()`` already ran earlier so the Session import is initialised. * **M7 — unit tests for ``BaseDeletedStateFilter`` + ``SoftDeleteApiMixin``.** New ``tests/unit_tests/views/test_soft_delete_filter.py`` (10 tests): filter no-op / include / only / case-insensitivity / unknown value; mixin no-op-without-flag / inject-with-flag / consume-flag-on-call; ``_serialize_deleted_at`` None / datetime handling. * **Nit — explicit ``orig_resource is None`` guard in ``raise_for_ownership``.** Previously fell through to ``hasattr(None, "owners")`` returning ``False`` → ``owners = []`` → generic ownership error. Now raises the proper "resource removed before ownership could be verified" error so the cause is clear in the rare race-with-hard-delete case. * **Nit — unit tests for ``find_existing_for_import`` / ``clear_soft_deleted_for_import``.** New ``tests/unit_tests/commands/importers/v1/test_find_existing_for_import.py`` (5 tests): no-match-returns-None, live-row-as-is, soft-deleted-row-as-is (pure lookup, no side effect), ``clear`` hard-deletes via ORM (listener-firing path), and the composed find-then-clear contract that's the documented caller sequence. Deferred with explanation in the PR-comment reply (not in this commit): * **H2 — non-restorable / GDPR hook**: V2 concern. Designing the interface without a concrete use case risks getting it wrong; follow-up ticket if compliance work is ever scoped. * **M8 — ``deleted_at`` schema mutation**: documented in the entity- rollout PR bodies (apache#40128/apache#40129/apache#40130) rather than blocking infrastructure. 40 unit tests pass (was 26; added 14 covering filter, mixin, importer helpers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5995e2d to
8da8ce4
Compare
The listener's loader_criteria was passing a concrete SQL expression (``cls.where_not_deleted()``) which rendered as the raw table name (``slices.deleted_at``) regardless of how the table was aliased in the statement. When the same soft-deletable model appeared under an alias in a JOIN — e.g., FAB's outer/inner reconstruction or ``report_schedule.chart`` aliasing ``slices AS chart`` — the JOIN's ON clause produced ``Unknown column 'slices.deleted_at' in 'on clause'`` and broke the entire query. Surfaced as CI failures on the entity-rollout PRs (apache#40128 / apache#40129 / apache#40130) — specifically pre-existing report-schedule tests that join to ``slices AS chart`` started failing once ``Slice`` inherited ``SoftDeleteMixin``. The infrastructure PR itself doesn't exercise this because no entity has the mixin yet. Fix: switch the criteria to a lambda (the form Bayer's canonical pattern uses). SQLAlchemy invokes the lambda per occurrence and adapts the column reference to the alias at each site. The lambda's body is trivial (``c.deleted_at.is_(None)``) so the ``DeferredLambdaElement`` parser handles it without issue — we originally moved AWAY from a lambda because of an ``if cls in bypass_classes`` conditional that DeferredLambdaElement mis-parsed as a SQL ``IN`` operator, but the conditional now lives outside the lambda (in the per-class iteration), so the lambda body itself is clean. Test added: test_listener_adapts_criteria_to_aliased_table_in_joins — reproduces the aliased-join shape using the synthetic ``_SoftDeletableChild`` model. Without the fix, this query raises ``OperationalError: no such column: ..._soft_deletable_child.deleted_at`` because the criteria renders as the raw table name. With the fix, the query runs cleanly. 45 unit tests pass (was 44 + 1 regression test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1d78910 to
8cb25e1
Compare
Eight items from the 2026-05-20 review on the infrastructure PR: * **H1 — subclass walk cached via ``__init_subclass__``.** The previous ``_all_soft_delete_subclasses()`` walked ``SoftDeleteMixin.__subclasses__()`` on every primary ORM SELECT. With one adopter today the cost is negligible, but the listener fires on every query in the app and the walk grows linearly with each adopted entity. ``SoftDeleteMixin`` now maintains a sorted ``_registered_subclasses`` list updated by ``__init_subclass__`` at class-definition time; the listener reads the cached list with no walk. Sort happens once per registration (rare event) rather than per query. * **M3 — ``SoftDeleteMixin`` import in ``BaseDAO.delete`` moved to top.** ``SKIP_VISIBILITY_FILTER_CLASSES`` from the same module was already top-level; ``SoftDeleteMixin`` had no real reason to be inline. Cleaner. * **M4 — pylint-disable in ``raise_for_ownership`` documented.** A top-level import here actually does cause a circular (security ↔ models.core ↔ superset's lazy ``feature_flag_manager``). Kept the inline import but added a comment naming the cycle so future maintainers don't try to "fix" it and re-trip the issue. * **M5 — ``BaseDeletedStateFilter.model`` typed as ``ClassVar[type[SoftDeleteMixin]]``.** Previously ``Any``, which meant a subclass accidentally binding ``model`` to a non-soft-deletable entity would crash at runtime on ``.deleted_at`` rather than failing mypy. * **M6 — listener placement documented.** ``setup_soft_delete_listener`` is called outside the ``with app_context()`` block because the ``do_orm_execute`` hook attaches to the ``Session`` class (process-wide), not to a Session instance, so it doesn't need Flask app context. Added an explaining comment in ``init_app_in_ctx``'s caller; ``setup_db()`` already ran earlier so the Session import is initialised. * **M7 — unit tests for ``BaseDeletedStateFilter`` + ``SoftDeleteApiMixin``.** New ``tests/unit_tests/views/test_soft_delete_filter.py`` (10 tests): filter no-op / include / only / case-insensitivity / unknown value; mixin no-op-without-flag / inject-with-flag / consume-flag-on-call; ``_serialize_deleted_at`` None / datetime handling. * **Nit — explicit ``orig_resource is None`` guard in ``raise_for_ownership``.** Previously fell through to ``hasattr(None, "owners")`` returning ``False`` → ``owners = []`` → generic ownership error. Now raises the proper "resource removed before ownership could be verified" error so the cause is clear in the rare race-with-hard-delete case. * **Nit — unit tests for ``find_existing_for_import`` / ``clear_soft_deleted_for_import``.** New ``tests/unit_tests/commands/importers/v1/test_find_existing_for_import.py`` (5 tests): no-match-returns-None, live-row-as-is, soft-deleted-row-as-is (pure lookup, no side effect), ``clear`` hard-deletes via ORM (listener-firing path), and the composed find-then-clear contract that's the documented caller sequence. Deferred with explanation in the PR-comment reply (not in this commit): * **H2 — non-restorable / GDPR hook**: V2 concern. Designing the interface without a concrete use case risks getting it wrong; follow-up ticket if compliance work is ever scoped. * **M8 — ``deleted_at`` schema mutation**: documented in the entity- rollout PR bodies (apache#40128/apache#40129/apache#40130) rather than blocking infrastructure. 40 unit tests pass (was 26; added 14 covering filter, mixin, importer helpers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The listener's loader_criteria was passing a concrete SQL expression (``cls.where_not_deleted()``) which rendered as the raw table name (``slices.deleted_at``) regardless of how the table was aliased in the statement. When the same soft-deletable model appeared under an alias in a JOIN — e.g., FAB's outer/inner reconstruction or ``report_schedule.chart`` aliasing ``slices AS chart`` — the JOIN's ON clause produced ``Unknown column 'slices.deleted_at' in 'on clause'`` and broke the entire query. Surfaced as CI failures on the entity-rollout PRs (apache#40128 / apache#40129 / apache#40130) — specifically pre-existing report-schedule tests that join to ``slices AS chart`` started failing once ``Slice`` inherited ``SoftDeleteMixin``. The infrastructure PR itself doesn't exercise this because no entity has the mixin yet. Fix: switch the criteria to a lambda (the form Bayer's canonical pattern uses). SQLAlchemy invokes the lambda per occurrence and adapts the column reference to the alias at each site. The lambda's body is trivial (``c.deleted_at.is_(None)``) so the ``DeferredLambdaElement`` parser handles it without issue — we originally moved AWAY from a lambda because of an ``if cls in bypass_classes`` conditional that DeferredLambdaElement mis-parsed as a SQL ``IN`` operator, but the conditional now lives outside the lambda (in the per-class iteration), so the lambda body itself is clean. Test added: test_listener_adapts_criteria_to_aliased_table_in_joins — reproduces the aliased-join shape using the synthetic ``_SoftDeletableChild`` model. Without the fix, this query raises ``OperationalError: no such column: ..._soft_deletable_child.deleted_at`` because the criteria renders as the raw table name. With the fix, the query runs cleanly. 45 unit tests pass (was 44 + 1 regression test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5cb3b5f to
bf87b03
Compare
CI test-postgres (current) on PR apache#40130 cascade-failed with: psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "_customer_location_uc" DETAIL: Key (database_id, schema, table_name)=(1, public, birth_names) already exists. Root cause: dashboard_utils.get_table queries SqlaTable through the ORM, which the soft-delete listener now filters. When a prior test in the same session soft-deletes the birth_names row (any test exercising the new soft-delete DELETE behavior on datasets), the row stays in the DB but is hidden from the helper. create_table_metadata's "doesn't exist, INSERT" path then collides with the underlying (database_id, schema, table_name) unique constraint that survives soft-delete (the constraint is enforced even though the SQLAlchemy comment claims it's not physical — Postgres test environments enforce it via a named constraint that an old migration tried to drop). Two-part fix in tests/integration_tests/dashboard_utils.py: 1. get_table attaches a per-query SKIP_VISIBILITY_FILTER_CLASSES execution option so it sees soft-deleted leftovers. Scoped to the single query — no leak to anything else in the session. 2. create_table_metadata restores any soft-deleted row it finds before mutating it for the new test's setup. Without this, the test would succeed in finding the row but then operate on a row whose `deleted_at` is still set, which the listener would hide from any subsequent API queries inside the same test. Mirrors the pattern the dashboards branch (sc-106190) uses for insert_dashboard's defensive cleanup, scoped narrowly to the test helper and limited to SqlaTable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI test-postgres (current) on PR apache#40130 cascade-failed with: psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "_customer_location_uc" DETAIL: Key (database_id, schema, table_name)=(1, public, birth_names) already exists. Root cause: dashboard_utils.get_table queries SqlaTable through the ORM, which the soft-delete listener now filters. When a prior test in the same session soft-deletes the birth_names row (any test exercising the new soft-delete DELETE behavior on datasets), the row stays in the DB but is hidden from the helper. create_table_metadata's "doesn't exist, INSERT" path then collides with the underlying (database_id, schema, table_name) unique constraint that survives soft-delete (the constraint is enforced even though the SQLAlchemy comment claims it's not physical — Postgres test environments enforce it via a named constraint that an old migration tried to drop). Two-part fix in tests/integration_tests/dashboard_utils.py: 1. get_table attaches a per-query SKIP_VISIBILITY_FILTER_CLASSES execution option so it sees soft-deleted leftovers. Scoped to the single query — no leak to anything else in the session. 2. create_table_metadata restores any soft-deleted row it finds before mutating it for the new test's setup. Without this, the test would succeed in finding the row but then operate on a row whose `deleted_at` is still set, which the listener would hide from any subsequent API queries inside the same test. Mirrors the pattern the dashboards branch (sc-106190) uses for insert_dashboard's defensive cleanup, scoped narrowly to the test helper and limited to SqlaTable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6a9640a to
c9c9011
Compare
Eight items from the 2026-05-20 review on the infrastructure PR: * **H1 — subclass walk cached via ``__init_subclass__``.** The previous ``_all_soft_delete_subclasses()`` walked ``SoftDeleteMixin.__subclasses__()`` on every primary ORM SELECT. With one adopter today the cost is negligible, but the listener fires on every query in the app and the walk grows linearly with each adopted entity. ``SoftDeleteMixin`` now maintains a sorted ``_registered_subclasses`` list updated by ``__init_subclass__`` at class-definition time; the listener reads the cached list with no walk. Sort happens once per registration (rare event) rather than per query. * **M3 — ``SoftDeleteMixin`` import in ``BaseDAO.delete`` moved to top.** ``SKIP_VISIBILITY_FILTER_CLASSES`` from the same module was already top-level; ``SoftDeleteMixin`` had no real reason to be inline. Cleaner. * **M4 — pylint-disable in ``raise_for_ownership`` documented.** A top-level import here actually does cause a circular (security ↔ models.core ↔ superset's lazy ``feature_flag_manager``). Kept the inline import but added a comment naming the cycle so future maintainers don't try to "fix" it and re-trip the issue. * **M5 — ``BaseDeletedStateFilter.model`` typed as ``ClassVar[type[SoftDeleteMixin]]``.** Previously ``Any``, which meant a subclass accidentally binding ``model`` to a non-soft-deletable entity would crash at runtime on ``.deleted_at`` rather than failing mypy. * **M6 — listener placement documented.** ``setup_soft_delete_listener`` is called outside the ``with app_context()`` block because the ``do_orm_execute`` hook attaches to the ``Session`` class (process-wide), not to a Session instance, so it doesn't need Flask app context. Added an explaining comment in ``init_app_in_ctx``'s caller; ``setup_db()`` already ran earlier so the Session import is initialised. * **M7 — unit tests for ``BaseDeletedStateFilter`` + ``SoftDeleteApiMixin``.** New ``tests/unit_tests/views/test_soft_delete_filter.py`` (10 tests): filter no-op / include / only / case-insensitivity / unknown value; mixin no-op-without-flag / inject-with-flag / consume-flag-on-call; ``_serialize_deleted_at`` None / datetime handling. * **Nit — explicit ``orig_resource is None`` guard in ``raise_for_ownership``.** Previously fell through to ``hasattr(None, "owners")`` returning ``False`` → ``owners = []`` → generic ownership error. Now raises the proper "resource removed before ownership could be verified" error so the cause is clear in the rare race-with-hard-delete case. * **Nit — unit tests for ``find_existing_for_import`` / ``clear_soft_deleted_for_import``.** New ``tests/unit_tests/commands/importers/v1/test_find_existing_for_import.py`` (5 tests): no-match-returns-None, live-row-as-is, soft-deleted-row-as-is (pure lookup, no side effect), ``clear`` hard-deletes via ORM (listener-firing path), and the composed find-then-clear contract that's the documented caller sequence. Deferred with explanation in the PR-comment reply (not in this commit): * **H2 — non-restorable / GDPR hook**: V2 concern. Designing the interface without a concrete use case risks getting it wrong; follow-up ticket if compliance work is ever scoped. * **M8 — ``deleted_at`` schema mutation**: documented in the entity- rollout PR bodies (apache#40128/apache#40129/apache#40130) rather than blocking infrastructure. 40 unit tests pass (was 26; added 14 covering filter, mixin, importer helpers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The listener's loader_criteria was passing a concrete SQL expression (``cls.where_not_deleted()``) which rendered as the raw table name (``slices.deleted_at``) regardless of how the table was aliased in the statement. When the same soft-deletable model appeared under an alias in a JOIN — e.g., FAB's outer/inner reconstruction or ``report_schedule.chart`` aliasing ``slices AS chart`` — the JOIN's ON clause produced ``Unknown column 'slices.deleted_at' in 'on clause'`` and broke the entire query. Surfaced as CI failures on the entity-rollout PRs (apache#40128 / apache#40129 / apache#40130) — specifically pre-existing report-schedule tests that join to ``slices AS chart`` started failing once ``Slice`` inherited ``SoftDeleteMixin``. The infrastructure PR itself doesn't exercise this because no entity has the mixin yet. Fix: switch the criteria to a lambda (the form Bayer's canonical pattern uses). SQLAlchemy invokes the lambda per occurrence and adapts the column reference to the alias at each site. The lambda's body is trivial (``c.deleted_at.is_(None)``) so the ``DeferredLambdaElement`` parser handles it without issue — we originally moved AWAY from a lambda because of an ``if cls in bypass_classes`` conditional that DeferredLambdaElement mis-parsed as a SQL ``IN`` operator, but the conditional now lives outside the lambda (in the per-class iteration), so the lambda body itself is clean. Test added: test_listener_adapts_criteria_to_aliased_table_in_joins — reproduces the aliased-join shape using the synthetic ``_SoftDeletableChild`` model. Without the fix, this query raises ``OperationalError: no such column: ..._soft_deletable_child.deleted_at`` because the criteria renders as the raw table name. With the fix, the query runs cleanly. 45 unit tests pass (was 44 + 1 regression test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI test-postgres (current) on PR apache#40130 cascade-failed with: psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "_customer_location_uc" DETAIL: Key (database_id, schema, table_name)=(1, public, birth_names) already exists. Root cause: dashboard_utils.get_table queries SqlaTable through the ORM, which the soft-delete listener now filters. When a prior test in the same session soft-deletes the birth_names row (any test exercising the new soft-delete DELETE behavior on datasets), the row stays in the DB but is hidden from the helper. create_table_metadata's "doesn't exist, INSERT" path then collides with the underlying (database_id, schema, table_name) unique constraint that survives soft-delete (the constraint is enforced even though the SQLAlchemy comment claims it's not physical — Postgres test environments enforce it via a named constraint that an old migration tried to drop). Two-part fix in tests/integration_tests/dashboard_utils.py: 1. get_table attaches a per-query SKIP_VISIBILITY_FILTER_CLASSES execution option so it sees soft-deleted leftovers. Scoped to the single query — no leak to anything else in the session. 2. create_table_metadata restores any soft-deleted row it finds before mutating it for the new test's setup. Without this, the test would succeed in finding the row but then operate on a row whose `deleted_at` is still set, which the listener would hide from any subsequent API queries inside the same test. Mirrors the pattern the dashboards branch (sc-106190) uses for insert_dashboard's defensive cleanup, scoped narrowly to the test helper and limited to SqlaTable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c9c9011 to
83445b3
Compare
Three SQLAlchemy-review follow-ups in import_dataset, mirroring the charts (16b5ae9) and dashboards (adf511b) passes: 1. **Explicit flush on restore-and-update path + rewritten comment.** The implicit-restore flow relied on autoflush to push the deleted_at = NULL update before SqlaTable.import_from_dict's internal query-by- uuid lookup ran. Without it, the listener would have filtered the row out, import_from_dict would have taken the "new object" branch, and the insert would have hit the uuid unique constraint at flush time. Adding db.session.flush() after clearing deleted_at makes the contract independent of autoflush. The inline comment is rewritten to describe the actual mechanism (the previous wording claimed config["id"] "routes import_from_dict through UPDATE", but ImportExportMixin strips non-export_fields keys — so config["id"] is defensive and the real bind happens via uuid uniqueness inside import_from_dict). 2. **Force sync on the implicit-restore path.** Previously `sync = ["columns", "metrics"] if overwrite else []`, so re-importing a soft-deleted dataset hit `sync = []` and ImportExportMixin would upsert children by UUID without removing rows present in the live DB but absent from the upload. Net effect: a stale-then-restored dataset carried orphan columns from before the soft-delete — a surprising merge of two states. Now ``sync = ["columns", "metrics"] if (overwrite or is_soft_deleted_match) else []`` so re-import of a soft-deleted UUID is treated as a clean replacement, matching what an explicit overwrite would do anyway. New test test_import_soft_deleted_dataset_restore_removes_orphan_children pins this contract. 3. **MultipleResultsFound fallback bypass.** The post-exception query `db.session.query(SqlaTable).filter_by(uuid=...).one()` didn't bypass the soft-delete listener, so a soft-deleted duplicate would be hidden and `.one()` would raise NoResultFound, masking the original MultipleResultsFound. Added SKIP_VISIBILITY_FILTER_CLASSES execution option to the fallback so it can locate either state. All three items raised by the SQLAlchemy lens review on PR apache#40130. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
acc8884 to
72049a9
Compare
The migration ``down_revision`` was ``33d7e0e21daa`` (the ``add_semantic_layers_and_views`` migration from 2025-11-04), but master has advanced seven revisions to ``a1b2c3d4e5f6`` (``add_granular_export_permissions`` from 2026-03-02). Without this update, ``alembic upgrade head`` fails with "Multiple head revisions present" once this migration lands on a database that has already applied master's chain. The sibling charts (apache#40129) and dashboards (apache#40128) PRs need the same update. Cross-PR merge-order coordination still applies — whichever lands first stays at ``a1b2c3d4e5f6``, the others rebase onto the newly-merged revision. Also updated the ``Revises:`` docstring line on the migration's module header for consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ic head) CI failed across all three soft-delete PRs with "Multiple head revisions present" after the recent down_revision update. Root cause: I misread the alembic chain order based on file dates rather than the actual ``down_revision`` linkage. Tracing the chain on master: 4b2a8c9d3e1f → a1b2c3d4e5f6 → ce6bd21901ab → 33d7e0e21daa Nothing chains off ``33d7e0e21daa``; ``ce6bd21901ab`` chains off ``a1b2c3d4e5f6``. The true alembic head is ``33d7e0e21daa``, not ``a1b2c3d4e5f6``. File-creation dates are misleading because ``33d7e0e21daa`` (created 2025-11-04 on a feature branch) was merged after the 2026-03-02 batch via a different branch — file dates do not reflect alembic chain order in the codebase. Reverting both ``down_revision`` and the matching ``Revises:`` docstring line back to ``33d7e0e21daa``. The sibling charts (apache#40129) and dashboards (apache#40128) PRs get the same revert in companion commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per bito's review on apache#40130 comment-4606035962 — adds one-line docstrings to ``upgrade()`` and ``downgrade()`` in the datasets soft-delete migration. The functions are short and self-evident from their body, but the docstrings give IDE tooltips and Sphinx- style migration listings a per-function summary that doesn't require reading the body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ``test_restore_blocked_by_active_logical_duplicate`` integration
test was failing on test-sqlite and test-postgres with
``IntegrityError: UNIQUE constraint failed`` (or the equivalent
``_customer_location_uc`` violation on Postgres). Root cause: the
``tables`` table carries two DB-level unique constraints on the
logical identity:
1. ``UniqueConstraint("database_id", "catalog", "schema",
"table_name")`` declared in the SqlaTable model.
2. ``_customer_location_uc`` on ``(database_id, schema,
table_name)`` from the 2016 ``b4456560d4f3`` migration, still
present on test databases (the 2024 ``df3d7e2eb9a4`` removal
migration runs only on a fresh upgrade chain).
Either constraint rejects the test's ``db.session.add(twin) +
db.session.commit()`` step before the application-level restore
check can be exercised, so the scenario the test was written for
(soft-deleted A coexisting with active B at the same logical
identity) is unreachable in practice.
Richard's review concern at ``api.py:963`` and Amin's matching
HIGH finding at ``restore.py:46`` were about the *application
layer* uniqueness check being filtered (visibility filter hides
soft-deleted rows from the DAO's ``validate_uniqueness``). The
DB-level constraint masks this in production: a delete -> create
sequence is rejected by the DB with an opaque IntegrityError
rather than a clean 422.
The application-level fixes are kept as defense-in-depth:
- ``DatasetDAO.validate_uniqueness`` /
``validate_update_uniqueness`` use
``execution_options(SKIP_VISIBILITY_FILTER_CLASSES={SqlaTable})``
so the application returns a clean 422 instead of a 500
IntegrityError on the create path. This is exercised by the
surviving ``test_create_blocked_by_soft_deleted_logical_duplicate``
integration test, which goes through the API and produces a 422
before the DB constraint fires.
- ``RestoreDatasetCommand._has_active_logical_duplicate`` is
retained for symmetry — fires cleanly if the DB constraint is
ever relaxed (it has been removed once before per
``df3d7e2eb9a4``).
- Unit-level coverage of the restore-side check is preserved in
``tests/unit_tests/commands/dataset/restore_test.py::
test_restore_dataset_logical_duplicate_raises`` (mocked
``_has_active_logical_duplicate``).
Replaced the deleted integration test with a comment block at the
same location documenting why the test is unreachable, so a future
contributor doesn't re-introduce it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ise on multiples
CI was failing on test-mysql with ``MultipleResultsFound`` during
fixture setup for ``TestEmbeddedDashboardDAO.test_get_by_uuid`` (and
cascading through many downstream tests in the same suite). Root
cause: the ``get_table`` test helper in
``tests/integration_tests/dashboard_utils.py`` used
``.one_or_none()`` after the soft-delete-bypass execution_option,
and the bypass exposes *both* a live row and a soft-deleted row when
they coexist with the same ``(database_id, schema, table_name)`` but
different ``catalog`` (e.g. ``NULL`` vs ``"public"``). The filter
in this helper doesn't include catalog, so the bypass query
legitimately matches more than one row — and ``.one_or_none()`` is
intolerant of that.
Postgres and SQLite happened to run the suite in an order that didn't
materialize the two-row state for this query; MySQL did, so the
failure showed up there first.
Replaced ``.one_or_none()`` with
``.order_by(SqlaTable.deleted_at.is_(None).desc(), SqlaTable.id).first()``.
The live row sorts first; the soft-deleted row is the fallback. The
helper's downstream contract ("return the row if one exists, else
``None``") is preserved without raising on the multi-row case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ring cleanup Two follow-ups from the latest reviews on apache#40130: 1. (Richard) Restore-via-import: roll back ``deleted_at`` when the legacy ``MultipleResultsFound`` fallback fires on the soft-deleted path. Prior flow on the soft-deleted-restore path: - clear ``existing.deleted_at = None`` - flush - call ``SqlaTable.import_from_dict(config, ...)`` - if it raises ``MultipleResultsFound`` (legacy NULL-schema ambiguity), the fallback returns the row unmodified The unmodified return is wrong in the soft-deleted case: the row's ``deleted_at`` was already cleared, but the upload contents were not applied. The operator gets a "restore" they didn't ask for, with no warning that the upload was dropped. Fix: snapshot ``existing.deleted_at`` before the clear (capturing it in a local). When ``MultipleResultsFound`` fires AND ``is_soft_deleted_match`` is True, restore the snapshotted ``deleted_at`` value, flush, and raise ``ImportFailedError`` with a message explaining the legacy-ambiguity cause. The operator can then resolve the duplicate manually (rename one of the conflicting rows) and retry. The non-soft-deleted overwrite path keeps the legacy "return unmodified" contract since the semantics there match the original 2018 fix. 2. (bito) Docstring on ``test_create_blocked_by_soft_deleted_logical_duplicate`` referenced ``test_restore_blocked_by_active_logical_duplicate``, which was removed in c4560c1 (unreachable due to DB-level unique constraint). Rewrote the docstring to describe the actual contract the test now pins: ``validate_uniqueness`` bypasses the visibility filter, so a soft-deleted twin blocks creates at the application layer with a clean 422 instead of an opaque 500 IntegrityError from the DB-level constraint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…" in all .po `pybabel update` fuzzy-matches the new exception string against the existing "Dataset could not be created."/"updated." entries in every catalog, which trips the translation-regression CI check. Pre-seeding an empty `msgstr ""` for the new msgid keeps pybabel from inventing a fuzzy match — translators can fill in the real string later through the normal flow. Affects all 26 active catalogs.
… guard When ``find_existing_for_import`` resolves a soft-deleted row by UUID and ``import_from_dict`` subsequently hits the legacy NULL-schema ambiguity (``MultipleResultsFound``), the importer now rolls back the ``deleted_at`` clear and raises ``ImportFailedError`` — see commit 65d0e5b. The pre-existing test was written for the older behaviour (return the soft-deleted row from the fallback ``.one()`` via the visibility-filter bypass). With the guard in place that path is no longer reached: the soft-delete branch raises before the fallback runs. Rewrite the assertion to pin the new contract: 1. ``ImportFailedError`` is raised with the "ambiguous legacy duplicate" message. 2. The soft-deleted row's ``deleted_at`` is unchanged after the call — no half-restored state.
``DatasetDAO.validate_uniqueness`` and ``validate_update_uniqueness`` build an inner SqlaTable query, set ``execution_options(SKIP_VISIBILITY_FILTER_CLASSES=...)`` on it, and then wrap it in ``db.session.query(inner.exists()).scalar()``. The per-query bypass never reaches the soft-delete listener: the OUTER query is what executes, and its ``execution_options`` are independent of the inner Query's. The listener sees no bypass and attaches ``with_loader_criteria(SqlaTable, deleted_at IS NULL, include_aliases=True)`` to the outer statement, which propagates the filter into the EXISTS subquery and hides any soft-deleted row. Repro confirmed by attaching a real ``do_orm_execute`` listener to a SQLite session: with the per-query bypass, the listener fires with ``per_query=None, per_session=None``, applies the filter, and EXISTS returns False (soft-deleted row excluded). The compiled SQL is identical on Postgres, MySQL, and SQLite — so this is broken on all three dialects in principle. In CI, Postgres and SQLite happen to pass anyway (likely a session-info leak from an earlier test), but MySQL trips ``test_create_blocked_by_soft_deleted_logical_duplicate`` deterministically: the create returns 201 instead of 422, leaving a duplicate row that cascades into ``test_restore_soft_deleted_dataset``. Fix: wrap the EXISTS scalar() in the documented ``skip_visibility_filter(db.session, SqlaTable)`` context manager — session.info is read by the listener regardless of which Query execute fires it (the canonical pattern documented in helpers.py for exactly this FAB-style inner/outer reconstruction case). Drop the now-dead ``SKIP_VISIBILITY_FILTER_CLASSES`` import as it is no longer referenced from this module.
…resent Adds the first test coverage for migration ``3a8e6f2c1b95_add_deleted_at_to_tables``: * ``upgrade()`` and ``downgrade()`` are exercised end-to-end against an in-memory SQLite engine with a real Alembic ``Operations`` context — not just the helper functions in isolation. * The downgrade-with-soft-deleted-rows-present case is pinned as a contract: rows that had ``deleted_at`` set before downgrade survive the migration and become indistinguishable from live rows. This matches the "Rollback note" already documented in ``UPDATING.md``; the test makes the behaviour regression-proof. * Idempotency of both directions is covered: the migration helpers (``add_columns``, ``create_index``, ``drop_columns``, ``drop_index``) are skip-if-exists / skip-if-not-exists, and running either direction twice must not raise. Future helper-implementation changes that silently break this property would fail the test. The test uses a minimal pre-seeded ``tables`` stub (id + table_name) rather than the real model — the migration only touches ``deleted_at`` and its index, so only the columns that participate need to exist.
…test Superset is on SQLAlchemy 1.4; ``Connection`` doesn't have a ``commit()`` method until 2.0. The implicit transaction over the single ``with engine.connect() as conn:`` block keeps reads-after-writes visible within the same connection.
Address review findings on the datasets soft-delete/restore work: - Extract the active-logical-duplicate query duplicated across RestoreDatasetCommand and the v1 importer into a single DatasetDAO.has_active_logical_duplicate(model) helper, and normalize the catalog (catalog or database default) the same way validate_uniqueness does so create/update/restore/import agree on what counts as the same physical table. - Make _escape_like coerce non-string filter values to str, closing an AttributeError server-error path for LIKE-family operators whose ColumnOperator.value is typed Any. - Tag the deferred superset.models.helpers imports in DatasetDAO with the constitution's "avoid app-init regression" rationale (PR apache#40573). - Repoint restore unit tests at the DAO helper and add DAO-level tests pinning the catalog-normalization behaviour. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…docstring Follow-up from the local multi-lens review of the soft-delete remediation: - Add unit tests pinning _escape_like's non-string/None coercion (the fix closing the AttributeError LIKE path had shipped without a test), plus an operator-level guard that a numeric LIKE value no longer raises. - Sharpen DatasetDAO.has_active_logical_duplicate's docstring: state the one-sided catalog normalization honestly (the symmetric stored-NULL case is not caught), restore the dropped warning against adding skip_visibility_filter here, and note the session-attached model.database assumption. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aset_test.py Move the has_active_logical_duplicate catalog-normalization tests from the stray tests/unit_tests/daos/test_dataset_dao.py into the existing tests/unit_tests/dao/dataset_test.py beside the other DatasetDAO tests, and update the breadcrumb in soft_delete_tests.py to the new path. No behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ry/finally cleanup - get_table (dashboard_utils) now includes the normalized catalog in the lookup so a fixture can't bind to a different catalog variant when rows share (database_id, schema, table_name) (codeant). - Wrap the five TestDatasetSoftDelete methods that soft-delete-then-restore in try/finally via a _restore_dataset helper, so a failed assertion can't strand a soft-deleted row for later tests (eschutho). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… default Follow-up to the catalog-determinism change: filtering get_table on only database.get_default_catalog() regressed find-or-create for the example datasets, which are seeded with catalog=NULL. Under the Postgres examples DB used in CI, get_default_catalog() returns a non-NULL string, so the strict equality matched nothing and create_table_metadata wrongly took the create branch, colliding with the surviving unique constraint. Match catalog IS NULL OR catalog == default (the way validate_uniqueness's 'table.catalog or default' treats them), which keeps the NULL-seeded rows findable on every dialect while still excluding an unrelated third catalog. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e msgid to .pot - Assert the soft-delete DELETE returns 200 in the list/filter/no-cascade tests (bito CWE-483: the status was previously unchecked there). - Add 'Dataset could not be restored.' to messages.pot (the .po placeholders were already seeded across all languages; the template was missing it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four review fixes (apache#40130) for paths that must still account for soft-deleted SqlaTable rows now that datasets join the visibility filter: - importer (utils): key Case B on is_soft_deleted (not the fused needs_mutation) so an active dataset re-imported with overwrite=True but no can_write returns the existing row; and block the create path when a soft-deleted dataset already claims the same physical table (new UUID would otherwise create an active twin of a hidden row), mirroring REST create's validate_uniqueness. Adds DatasetDAO.find_soft_deleted_logical_duplicate. - database delete: count datasets with the visibility filter bypassed, so a database whose datasets are all soft-deleted still blocks hard deletion instead of looking empty. - security manager: the database-rename perm maintenance enumerates datasets with the visibility filter bypassed, so soft-deleted datasets' perm strings (and their charts') are rewritten too — no stale perms resurrected on restore. - DAO uniqueness: null-aware catalog predicate (_catalog_identity_filter) shared by validate_uniqueness/validate_update_uniqueness/ has_active_logical_duplicate, so a default-catalog twin is caught whether either row stored catalog as the default value or NULL. Adds unit coverage (DAO null-aware + twin finder, importer Case B + twin block, database delete) and an integration test for soft-deleted rename perm rewrite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DeleteDatabaseCommand now counts soft-deleted datasets (bypassing the visibility filter) so they still block a database delete. Document the consequence: with soft-delete and no purge in v1, a database that has had datasets can't be deleted via the API once they're soft-deleted, until the planned purge capability lands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… migration Two pre-merge fixes (the migration one is the active cause of CI's multi-head cascade): - DatasetDeletedStateFilter now restricts soft-deleted rows to the restore audience (owners/admins), mirroring Dashboard/Chart DeletedStateFilter and RestoreDatasetCommand's raise_for_ownership — a read-access non-owner can no longer enumerate soft-deleted datasets they could never restore. Live rows keep normal DatasourceFilter visibility. Adds integration coverage (owner sees own deleted; gamma with datasource access does not). - Re-point the add_deleted_at_to_tables migration's down_revision from 33d7e0e21daa to 31dae2559c05 (master's current head) so it no longer creates an alembic multi-head against current master (which was failing docker-build and all integration jobs). Will be re-pointed onto the charts migration during the charts->dashboards->datasets merge sequence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror of the dashboards/charts split: import_dataset no longer fuses overwrite and restore behind needs_mutation. A soft-deleted match is a RESTORE (own can_write Case-B gate + ownership check + active-logical- duplicate guard + snapshot/clear deleted_at); an alive match is an OVERWRITE (return existing unless overwrite+can_write, then ownership check). The MultipleResultsFound rollback (original_deleted_at) and the create-path soft-deleted-duplicate guard are preserved unchanged. Behaviour unchanged; importer matrix tests still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parallel to the dashboards restore-failure test: assert that an error raised inside RestoreDatasetCommand.run surfaces as a clean 422 via the DatasetRestoreFailedError handler rather than an unhandled 500. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review nit: the restore-failure test's cleanup block duplicated the _restore_dataset() helper. Call the helper instead (DRY; also safer — it uses one_or_none()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…all broke CI 2fc212e called self._restore_dataset from TestDatasetRestore, but the helper was defined on TestDatasetSoftDelete (no inheritance), failing all backend CI jobs with AttributeError. Hoist the helper to module level and keep a thin class-method shim for the existing TestDatasetSoftDelete call sites. Verified against the local Postgres test DB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…raint docs Review-panel findings (multi-lens review of this branch): - DuplicateDatasetCommand: replace the name-only, visibility-filtered find_one_or_none pre-check with the shared DatasetDAO.validate_uniqueness (scoped to the base model's database/schema, catalog-NULL-aware, bypasses the soft-delete filter). A soft-deleted twin was invisible to the old check, so the duplicate proceeded into an IntegrityError — or an active twin that permanently blocks restore where no DB constraint applies. New session-backed unit test pins the refusal. - Correct the constraint story in comments/docstrings (restore.py, restore_test.py, soft_delete_tests.py): DB-level enforcement is inconsistent across schema builds. create_all materializes the metadata-only 4-column model constraint; migration-built databases still carry the legacy 3-column _customer_location_uc because the 2024 df3d7e2eb9a4 drop migration is a silent no-op (passes a list to generic_find_uq_constraint_name, which compares against a set — never matches; verified empirically on Postgres). The application-level checks give a clean 422 either way. - importer: use existing.restore() instead of writing deleted_at directly; chain the MultipleResultsFound cause (from ex, not from None) so the ambiguous-legacy-duplicate error keeps its traceback. - filters.py: drop docstring references to sibling filter classes that do not exist in this tree yet. - New test: build_dataset_query excludes soft-deleted rows via Core execution — the one path the do_orm_execute listener cannot cover; the explicit deleted_at IS NULL clause now has a failing test if removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review Agent Run #0a8946
Actionable Suggestions - 4
-
tests/integration_tests/security_tests.py - 1
- Mock setup without verification · Line 574-574
-
superset/datasets/api.py - 1
- Missing integration tests for error paths · Line 933-989
-
tests/integration_tests/datasets/soft_delete_tests.py - 1
- Missing finally block in test cleanup · Line 337-340
-
superset/translations/zh_TW/LC_MESSAGES/messages.po - 1
- Missing zh_TW translation · Line 4643-4644
Additional Suggestions - 7
-
superset/daos/dataset.py - 2
-
Missing catalog tests for validate_update_uniqueness · Line 232-263`validate_update_uniqueness` now calls `database.get_default_catalog()` and `_catalog_identity_filter` for null-aware catalog matching, but the existing test at line 32 only exercises schema differences. Two catalog-coverage patterns exist for `has_active_logical_duplicate` (lines 286–302 and 308–322) and should be replicated here: (1) passing `catalog=None` with a mocked default must still block a twin, and (2) an explicit non-default catalog must not falsely match a `NULL` row.
-
Missing unit tests for catalog predicate · Line 158-181`_catalog_identity_filter` is tested indirectly via `has_active_logical_duplicate` and `find_soft_deleted_logical_duplicate`, but `validate_uniqueness` and `validate_update_uniqueness` — the two call sites added in this diff — have no direct unit tests covering the null-aware catalog predicate. Add tests in `tests/unit_tests/dao/dataset_test.py` following the pattern in `test_logical_duplicate_catalog_predicate_is_null_aware` (line 90).
-
-
superset/translations/ar/LC_MESSAGES/messages.po - 1
-
Missing Arabic translation · Line 4697-4698The `msgstr` is empty. Arabic users with this locale will see the untranslated English string "Dataset could not be restored." instead of the proper Arabic translation. Compare with the pattern used for similar messages in the same file (e.g., "تعذر إنشاء مجموعة بيانات." at line 4692).
-
-
superset/translations/it/LC_MESSAGES/messages.po - 1
-
Missing Italian translation · Line 4602-4603The new `msgid "Dataset could not be restored."` has an empty `msgstr ""`. Italian users will see untranslated English text when this error occurs. Other similar 'Dataset could not be X' messages in this file use the translation 'La tua query non può essere salvata' — consider following the same pattern.
-
-
superset/commands/dataset/importers/v1/utils.py - 1
-
Consistency: Import Source Mismatch · Line 39-39The `SKIP_VISIBILITY_FILTER_CLASSES` constant is imported from `superset.constants` here, but the same file's sibling module `superset/commands/importers/v1/utils.py` imports it from `superset.models.helpers`. The canonical source is `superset.constants`; other files re-export via `superset.models.helpers`. For consistency with the broader import pattern in this command layer, prefer `superset.models.helpers`.
-
-
tests/integration_tests/datasets/soft_delete_tests.py - 2
-
Semantic duplication in cleanup block · Line 379-391The `finally` block re-implements the visibility-filter bypass + conditional restore pattern that the module-level `_restore_dataset` helper (lines 36-52) already provides. Extract the restore logic to avoid duplication.
-
Identical code to module helper · Line 455-462The finally block is an exact duplicate of the module-level `_restore_dataset` helper (lines 36-52). Replace with a call to the helper.
-
Review Details
-
Files reviewed - 52 · Commit Range:
40ecaa8..9a501c5- superset/commands/database/delete.py
- superset/commands/dataset/duplicate.py
- superset/commands/dataset/exceptions.py
- superset/commands/dataset/importers/v1/utils.py
- superset/commands/dataset/restore.py
- superset/connectors/sqla/models.py
- superset/daos/base.py
- superset/daos/dataset.py
- superset/daos/datasource.py
- superset/datasets/api.py
- superset/datasets/filters.py
- superset/migrations/versions/2026-05-08_12-10_3a8e6f2c1b95_add_deleted_at_to_tables.py
- superset/security/manager.py
- superset/translations/ar/LC_MESSAGES/messages.po
- superset/translations/ca/LC_MESSAGES/messages.po
- superset/translations/cs/LC_MESSAGES/messages.po
- superset/translations/de/LC_MESSAGES/messages.po
- superset/translations/en/LC_MESSAGES/messages.po
- superset/translations/es/LC_MESSAGES/messages.po
- superset/translations/fa/LC_MESSAGES/messages.po
- superset/translations/fi/LC_MESSAGES/messages.po
- superset/translations/fr/LC_MESSAGES/messages.po
- superset/translations/it/LC_MESSAGES/messages.po
- superset/translations/ja/LC_MESSAGES/messages.po
- superset/translations/ko/LC_MESSAGES/messages.po
- superset/translations/lv/LC_MESSAGES/messages.po
- superset/translations/messages.pot
- superset/translations/mi/LC_MESSAGES/messages.po
- superset/translations/nl/LC_MESSAGES/messages.po
- superset/translations/pl/LC_MESSAGES/messages.po
- superset/translations/pt/LC_MESSAGES/messages.po
- superset/translations/pt_BR/LC_MESSAGES/messages.po
- superset/translations/ru/LC_MESSAGES/messages.po
- superset/translations/sk/LC_MESSAGES/messages.po
- superset/translations/sl/LC_MESSAGES/messages.po
- superset/translations/th/LC_MESSAGES/messages.po
- superset/translations/tr/LC_MESSAGES/messages.po
- superset/translations/uk/LC_MESSAGES/messages.po
- superset/translations/zh/LC_MESSAGES/messages.po
- superset/translations/zh_TW/LC_MESSAGES/messages.po
- tests/integration_tests/dashboard_utils.py
- tests/integration_tests/datasets/api_tests.py
- tests/integration_tests/datasets/soft_delete_tests.py
- tests/integration_tests/security_tests.py
- tests/unit_tests/commands/dataset/restore_test.py
- tests/unit_tests/commands/dataset/test_duplicate.py
- tests/unit_tests/dao/base_dao_test.py
- tests/unit_tests/dao/dataset_test.py
- tests/unit_tests/dao/datasource_test.py
- tests/unit_tests/databases/commands/delete_test.py
- tests/unit_tests/datasets/commands/importers/v1/import_test.py
- tests/unit_tests/migrations/test_add_deleted_at_to_tables.py
-
Files skipped - 1
- UPDATING.md - Reason: Filter setting
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
…NG.md Panel-review docs parity with dashboards/charts: DELETE and bulk-delete OpenAPI docstrings now describe soft-delete semantics and the restore path; UPDATING.md notes bulk delete is also soft and that the deleted-state filter is owner-scoped for non-admins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Final triage sweep of the open bot threads (codeant + bito), regenerated by the recent pushes — resolving as not actionable:
|
Code Review Agent Run #8c0fd2Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
Wire datasets to the soft-delete infrastructure shipped in #39977. One of three entity rollouts (charts / dashboards / datasets) decomposing the soft-delete work per SIP-208 (passed 2026-05-08).
#39977 (
b2320820b4) merged on 2026-05-29; this PR consumes the abstractions (SoftDeleteMixin,BaseRestoreCommand,BaseDeletedStateFilter,SoftDeleteApiMixin,find_existing_for_import) and wires them toSqlaTable.What this PR ships
Migration (
superset/migrations/versions/2026-05-08_12-10_3a8e6f2c1b95_add_deleted_at_to_tables.py)revision = 3a8e6f2c1b95,down_revision = 31dae2559c05. Adds a nullabledeleted_atcolumn andix_tables_deleted_atindex totables. Reversible. Usessuperset.migrations.shared.utilshelpers so the migration is idempotent.down_revision = 31dae2559c05with the charts (feat(charts): soft-delete and restore #40129) and dashboards (feat(dashboards): soft-delete and restore #40128) sibling PRs. Merge sequence: dashboards (feat(dashboards): soft-delete and restore #40128) → charts (feat(charts): soft-delete and restore #40129) → datasets (feat(datasets): soft-delete and restore #40130); each later PR re-points itsdown_revisionto the just-merged migration before merging (a plain rebase would create an alembic multi-head).Model + DAO
SqlaTableinheritsSoftDeleteMixin(superset/connectors/sqla/models.py). Thedo_orm_executelistener from the infrastructure PR begins filteringSqlaTablequeries (including relationship lazy loads, viawith_loader_criteria(..., propagate_to_loaders=True)).DeleteDatasetCommandroutes throughBaseDAO.delete()— which now detects the mixin and dispatches tosoft_delete()— so no source change to the delete command is needed.DatasetDAO.validate_uniquenessandvalidate_update_uniqueness(the create/update logical-uniqueness checks) run withexecution_options(SKIP_VISIBILITY_FILTER_CLASSES={SqlaTable})so a soft-deleted row still blocks creation of a new dataset at the same(database_id, catalog, schema, table_name). This converts what would have been an opaque 500IntegrityErrorfrom the DB-level constraint into a clean 422 at the application layer.superset/daos/datasource.py:build_dataset_queryuses a Coreselect(...)rather than an ORM query, so thedo_orm_executelistener didn't apply. Added an explicitds_q = ds_q.where(ds_table.c.deleted_at.is_(None))so soft-deleted datasets don't count toward the combinedGET /api/v1/datasource/totals or pagination slots.API
RestoreDatasetCommand(superset/commands/dataset/restore.py) — 4 ClassVar declarations subclassingBaseRestoreCommand[SqlaTable], plus avalidate()override that calls_has_active_logical_duplicate(model)and raisesDatasetLogicalDuplicateError(HTTP 422) when another active dataset already references the same physical table.POST /api/v1/dataset/<uuid>/restoreendpoint (superset/datasets/api.py) with the standard decorator stack (@protect,@safe,@statsd_metrics,@event_logger.log_this_with_context). Permission model mirrors delete exactly: endpoint-levelcan_write(via"restore": "write"inmethod_permission_name), resource-levelraise_for_ownership. UUIDs in the path per the new-endpoints convention. MapsDatasetLogicalDuplicateErrorto a 422 response.DatasetDeletedStateFilter(superset/datasets/filters.py) — 2-line subclass ofBaseDeletedStateFiltersettingarg_name = "dataset_deleted_state"andmodel = SqlaTable. Wired intosearch_filtersonDatasetRestApi.DatasetRestApigainsSoftDeleteApiMixinand"restore"is added toinclude_route_methods.DatasetRestoreFailedErrorexception type added (superset/commands/dataset/exceptions.py), inheritingUpdateFailedErrorfor parity with the sibling failure exceptions in the module.DatasetLogicalDuplicateErroradded alongside it (HTTP 422).Logical-duplicate restore guard
SqlaTablecarries DB-level unique constraints on(database_id, schema, table_name)(legacy_customer_location_ucfrom 2016) and(database_id, catalog, schema, table_name)(the newer model-level constraint), so the database itself enforces the physical-identity invariant. The application-level guards in this PR are defense-in-depth — they convert what would otherwise be opaque 500IntegrityErrorresponses into clean 422s with operator-actionable messages:RestoreDatasetCommand._has_active_logical_duplicate()runs before clearingdeleted_at. Surfaces the conflict asDatasetLogicalDuplicateError(422) with a message naming the conflict, rather than letting the constraint violation bubble at flush time.DatasetDAO.validate_uniquenessbypasses the visibility filter so a soft-deleted row blocks creation of a new dataset with the same logical identity. The user gets the existingDatasetExistsValidationError(422) instead of a 500 IntegrityError.import_datasetperforms the same check before clearingdeleted_at, raisingImportFailedErrorif a live duplicate exists.Both Richard Fogaça and Amin Ghadersohi independently flagged the underlying scenario (delete A → create B at the same logical identity → restore A producing two live rows for one physical table). The DB constraint makes the dual-live state unreachable today, but the application-level guards keep the error model clean and survive any future relaxation of the DB constraint (the legacy
_customer_location_ucwas removed once before viadf3d7e2eb9a4and may be removed cleanly across all installations in the future).Import pipeline (
superset/commands/dataset/importers/v1/utils.py)import_datasetto follow the Option C permission model: re-importing a UUID that matches a soft-deleted dataset is treated as an implicit restore-and-update. For an owner (or admin) withcan_write, the importer clearsdeleted_aton the existing row and routesimport_from_dictthrough UPDATE, preserving the PK and the chart back-reference andtable_columns/sql_metricschildren. Non-owners getImportFailedError. Callers withoutcan_writealso getImportFailedErrorrather than silently receiving the soft-deleted row (Case B — protects callers from quietly reattaching to a deleted dataset). The full matrix is documented in the docstring onimport_dataset.db.session.flush()after clearingdeleted_atmakes the restore-then-UPDATE chain independent of session autoflush (import_from_dictre-queries by UUID; the listener filtersdeleted_at IS NULL; the explicit flush guarantees the cleared value is visible before that query).sync = ["columns", "metrics"]so re-import is a clean replacement (orphan columns/metrics present in the live DB but absent from the upload are removed), matching what an explicitoverwrite=Truewould do.MultipleResultsFoundfallback (legacy NULL-schema duplicate path): when this fires on the soft-deleted-restore branch,existing.deleted_atis rolled back to its snapshotted value and anImportFailedErroris raised. Returning the row unmodified would have produced a half-restored state (deleted_atcleared but upload contents dropped). The non-soft-deleted overwrite path keeps the legacy "return existing unmodified" contract from the original 2018 fix.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
No UI changes in this PR — REST endpoints only. The frontend "Restore" button is a follow-up PR.
DELETE /api/v1/dataset/<pk>deleted_at; row remains but is hidden from default queriesGET /api/v1/dataset/?...&dataset_deleted_state=onlydeleted_atpopulatedGET /api/v1/dataset/?...&dataset_deleted_state=includedeleted_atpopulated on deleted rowsPOST /api/v1/dataset/<uuid>/restoredeleted_aton the soft-deleted row, or 422 withDatasetLogicalDuplicateErrorif another active dataset references the same physical tablePOST /api/v1/dataset/with logical identity claimed by a soft-deleted rowDatasetExistsValidationErrorfrom the DAO-level checkGET /api/v1/datasource/(combined endpoint)deleted_at IS NULLpredicate on the CoreselectTESTING INSTRUCTIONS
The integration suite covers:
deleted_atinstead of removing the rowdataset_deleted_state=onlyreturns soft-deleted withdeleted_atpopulateddataset_deleted_state=includereturns both, response augmenteddeleted_at(admin and owner paths)raise_for_ownership)can_write_Dataset(pins themethod_permission_name = {..., "restore": "write"}contract — without the mapping, FAB falls back tocan_restore_Datasetand 403s every non-admin)validate_uniquenessbypasses the visibility filter so a soft-deleted row blocks new-dataset creation at the application layer with a clean 422 (test_create_blocked_by_soft_deleted_logical_duplicate)MultipleResultsFoundlegacy-duplicate fallback path resolves correctly when the duplicate is soft-deleted (regression coverage for the visibility-filter bypass in the importer)The restore-side
_has_active_logical_duplicateguard is covered at the unit level (tests/unit_tests/commands/dataset/restore_test.py::test_restore_dataset_logical_duplicate_raises, mocked) rather than as an integration test. The "active twin coexists with soft-deleted original" scenario the integration test would require can't be set up against a real DB: both DB-level unique constraints ontablesreject step 2 (the create) before the test can stage the restore step. Comment block in the integration test documents this.ADDITIONAL INFORMATION
tablestable. Instant on Postgres/MySQL/SQLite for any realistic deployment size; no lock-holding.dataset_deleted_staterison filter,POST /api/v1/dataset/<uuid>/restoreendpoint,deleted_atfield on list responses when augmentation is opted into,DatasetLogicalDuplicateError(422) on restore conflictBehaviour changes worth flagging
DELETE /api/v1/dataset/<pk>is now soft. Existing API consumers that called DELETE expecting permanent removal will see the dataset reappear if it's restored. The row is no longer findable via the default API after DELETE (the listener hides it), so most downstream behaviour is preserved — but anyone doing direct DB queries on thetablestable will see the row.DatasetLogicalDuplicateError) rather than letting an opaqueIntegrityErrorbubble. Conversely, creating a dataset at a logical identity owned by a soft-deleted row also refuses with 422 at the DAO layer instead of a 500 from the DB constraint.GET /api/v1/datasource/pagination no longer counts soft-deleted datasets. Total counts and page contents are aligned with the visibility-filter contract for ORM queries.Depends on
Coordination with sibling PRs
This PR's migration shares
down_revision = 31dae2559c05with:7c4a8d09ca379e1f3b8c4d2aThe first of the three to merge (dashboards) keeps its
down_revision; each later PR re-points itsdown_revisionto the just-merged migration before merging — a plain rebase would create an alembic multi-head, which theCheck DB migration conflictCI job will catch.