refactor(db): composite PK on M2M association tables (sc-105349)#39859
refactor(db): composite PK on M2M association tables (sc-105349)#39859mikebridge wants to merge 22 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #39859 +/- ##
==========================================
- Coverage 64.30% 63.72% -0.59%
==========================================
Files 2657 2657
Lines 144060 144060
Branches 33216 33216
==========================================
- Hits 92641 91803 -838
- Misses 49797 50633 +836
- Partials 1622 1624 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| """ | ||
| # Identifiers come from the AFFECTED_TABLES whitelist, not user input. | ||
| sql = sa.text( | ||
| f"DELETE FROM {t.name} WHERE {t.fk1} IS NULL OR {t.fk2} IS NULL" # noqa: S608 |
There was a problem hiding this comment.
The combination of sa.text() and f" makes me nervous. Let's build a proper query here using SQLAlchemy core/ORM instead of building the SQL with string interpolation.
| sample_sql = sa.text( | ||
| f"SELECT {t.fk1}, {t.fk2}, id FROM {t.name} WHERE id NOT IN (" # noqa: S608 | ||
| f" SELECT keep_id FROM (" | ||
| f" SELECT MIN(id) AS keep_id FROM {t.name} " | ||
| f"GROUP BY {t.fk1}, {t.fk2}" | ||
| f" ) AS s" | ||
| f") LIMIT 10" | ||
| ) | ||
| sample = list(conn.execute(sample_sql)) | ||
| sql = sa.text( | ||
| f"DELETE FROM {t.name} WHERE id NOT IN (" # noqa: S608 | ||
| f" SELECT keep_id FROM (" | ||
| f" SELECT MIN(id) AS keep_id FROM {t.name} " | ||
| f"GROUP BY {t.fk1}, {t.fk2}" | ||
| f" ) AS s" | ||
| f")" | ||
| ) |
There was a problem hiding this comment.
Same here, let's use a programmatic query instead of building the string.
| f"SELECT COUNT(*) FROM (" # noqa: S608 | ||
| f" SELECT 1 FROM {t.name} GROUP BY {t.fk1}, {t.fk2} HAVING COUNT(*) > 1" | ||
| f") AS s" |
There was a problem hiding this comment.
Same here. Also, not that it's usually much more legible to simply use triple quotes if we wanted to build the query as a string:
sql = f"""
SELECT COUNT(*) FROM (
SELECT 1 FROM {t.name}
GROUP BY {t.fk1}, {t.fk2}
HAVING COUNT(*) > 1
) AS s
"""There was a problem hiding this comment.
These are updated---I am also running the down migrations locally against sqlite, postgres and mysql to make sure they do something sensible.
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dd4d48d to
0e5380d
Compare
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3870647 to
7ae35b4
Compare
|
Hi folks. I'm attending an onsite and don't have much availability this week. At Airbnb, we have tables like |
Hey @michael-s-molina! Yes, this will only touch the join tables that join to users/dashboards/charts/roles so the cardinality should be quite a bit lower than for something like logging. We can touch base when you have some time and maybe I can get some data from you that will help me gauge impact. My hypothesis is that the worst-case scenario is under two minutes. One big side-issue here is fixing the data-integrity vulnerabilities with the missing Anyway, a subset of this R is to remove a blocker for Versioning, but it's not strictly necessary to fix every table in this iteration. Ideally I'd like to fix all eight tables at once though. |
…105349) Add a "Sizing the maintenance window on PostgreSQL" sub-section to the operator runbook. The simple per-table COUNT/duplicate/NULL queries that were already there are dialect-portable but only count rows; operators on PostgreSQL with large deployments need to characterize the migration's runtime cost before scheduling it. Adds four diagnostic queries: - Per-table size, row count (from pg_class.reltuples), and which migration path each table will take (recreate-rewrite vs direct ALTER). Sizes the work concretely. - Aggregated duplicate-row roll-up: dup_groups + total rows_dropped per table. Replaces eight separate per-table queries with one consolidated result for audit/dump-before-apply decisions. - External-FK pre-flight check (the same one the migration runs at upgrade time and aborts on). Lets operators surface any blocking external reference ahead of the maintenance window. Should be empty on a stock install. - Lock-window estimate for the two full-rewrite tables, using pg_relation_size and a conservative 100 MB/s rewrite throughput assumption. The other six use direct ALTER and are dominated by composite-index build time (seconds for low-millions-of-rows tables). Prompted by reviewer feedback on apache#39859 from a large deployment asking how to size the maintenance window. The original pre-flight queries are kept for cross-dialect operators (MySQL, SQLite) since the new queries use PostgreSQL-specific catalog views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This should help to assess the impact on postgres: -- =====================================================================
-- 1. Row counts and on-disk size per affected table
-- Tells us how much work the migration will do per table.
-- The two tables marked WITH UNIQUE go through the slower
-- "full table rewrite" path; the other six are direct ALTER.
-- =====================================================================
WITH affected(name, has_unique) AS (
VALUES
('dashboard_roles', false),
('dashboard_slices', true), -- full rewrite
('dashboard_user', false),
('report_schedule_user', true), -- full rewrite
('rls_filter_roles', false),
('rls_filter_tables', false),
('slice_user', false),
('sqlatable_user', false)
)
SELECT
a.name AS table_name,
CASE WHEN a.has_unique THEN 'recreate (full rewrite)'
ELSE 'direct ALTER' END AS migration_path,
c.reltuples::bigint AS estimated_rows,
pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_size_pretty(pg_indexes_size(c.oid)) AS index_size
FROM affected a
JOIN pg_class c ON c.relname = a.name AND c.relkind = 'r'
ORDER BY pg_total_relation_size(c.oid) DESC; ... and these will give some more details on some of the other data integrity issues: -- =====================================================================
-- 2. Exact duplicate-row counts per table.
-- The migration deletes duplicates (keeping MIN(id) per group).
-- If any of these counts are nonzero, audit-sensitive operators
-- should dump the affected rows BEFORE applying the migration.
-- =====================================================================
SELECT 'dashboard_roles' AS t, COUNT(*) AS dup_groups, SUM(c) - COUNT(*) AS rows_dropped
FROM (SELECT COUNT(*) c FROM dashboard_roles GROUP BY dashboard_id, role_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_slices', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_slices GROUP BY dashboard_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_user GROUP BY user_id, dashboard_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id, report_schedule_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_roles', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_roles GROUP BY role_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_tables', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_tables GROUP BY table_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'slice_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM slice_user GROUP BY user_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'sqlatable_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM sqlatable_user GROUP BY user_id, table_id HAVING COUNT(*) > 1) g
ORDER BY rows_dropped DESC NULLS LAST; -- =====================================================================
-- 3. NULL-FK row counts.
-- The migration deletes any row with NULL in either FK column
-- (since PK columns must be NOT NULL). Should normally be zero;
-- nonzero means application bugs or manual SQL produced bad rows.
-- =====================================================================
SELECT 'dashboard_roles' AS t, COUNT(*) FILTER (WHERE dashboard_id IS NULL OR role_id IS NULL) AS null_fk_rows FROM dashboard_roles
UNION ALL SELECT 'dashboard_slices', COUNT(*) FILTER (WHERE dashboard_id IS NULL OR slice_id IS NULL) FROM dashboard_slices
UNION ALL SELECT 'dashboard_user', COUNT(*) FILTER (WHERE user_id IS NULL OR dashboard_id IS NULL) FROM dashboard_user
UNION ALL SELECT 'report_schedule_user', COUNT(*) FILTER (WHERE user_id IS NULL OR report_schedule_id IS NULL) FROM report_schedule_user
UNION ALL SELECT 'rls_filter_roles', COUNT(*) FILTER (WHERE role_id IS NULL OR rls_filter_id IS NULL) FROM rls_filter_roles
UNION ALL SELECT 'rls_filter_tables', COUNT(*) FILTER (WHERE table_id IS NULL OR rls_filter_id IS NULL) FROM rls_filter_tables
UNION ALL SELECT 'slice_user', COUNT(*) FILTER (WHERE user_id IS NULL OR slice_id IS NULL) FROM slice_user
UNION ALL SELECT 'sqlatable_user', COUNT(*) FILTER (WHERE user_id IS NULL OR table_id IS NULL) FROM sqlatable_user
ORDER BY null_fk_rows DESC; If this returns any rows it'd really be a bad thing: -- =====================================================================
-- 4. External FK references to the soon-to-be-removed `id` columns.
-- The migration runs this same check as a pre-flight assertion and
-- aborts if anything is found. Run it ahead of time so you know
-- what (if anything) needs to be migrated/dropped first. On a
-- standard Superset deployment this should return zero rows.
-- (Default schema only; multi-schema deployments need to broaden.)
-- =====================================================================
SELECT
rc.constraint_name,
kcu.table_schema || '.' || kcu.table_name AS referencing_table,
kcu.column_name AS referencing_column,
ccu.table_name AS referenced_table,
ccu.column_name AS referenced_column
FROM information_schema.referential_constraints rc
JOIN information_schema.key_column_usage kcu
ON kcu.constraint_name = rc.constraint_name
AND kcu.constraint_schema = rc.constraint_schema
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = rc.constraint_name
AND ccu.constraint_schema = rc.constraint_schema
WHERE ccu.table_name IN (
'dashboard_roles','dashboard_slices','dashboard_user',
'report_schedule_user','rls_filter_roles','rls_filter_tables',
'slice_user','sqlatable_user')
AND ccu.column_name = 'id'; -- =====================================================================
-- 5. Lock-window sizing for the recreate path on the two heaviest
-- tables. The "full table rewrite" path takes an ACCESS EXCLUSIVE
-- lock on the table for the duration. Estimate the rewrite time
-- by combining heap size with your hardware's sequential write
-- rate (≈100–200 MB/s on commodity SSD; faster on NVMe).
-- =====================================================================
SELECT
c.relname AS table_name,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_relation_size(c.oid) / 1024 / 1024 AS heap_size_mb,
-- conservative estimate: 100 MB/s effective rewrite throughput
ROUND(pg_relation_size(c.oid) / 1024 / 1024 / 100.0, 1) AS est_rewrite_seconds_at_100mbs
FROM pg_class c
WHERE c.relname IN ('dashboard_slices', 'report_schedule_user'); |
|
Here's the mysql equivalent: 1. Per-table size, row count, and which migration path each takes. Note that on InnoDB, ADD/DROP PRIMARY KEY rebuilds the clustered index, so all eight tables undergo a full rebuild — not just the two that go through the explicit SELECT
TABLE_NAME AS table_name,
CASE WHEN TABLE_NAME IN ('dashboard_slices', 'report_schedule_user')
THEN 'recreate (explicit, drops UNIQUE)'
ELSE 'direct ALTER (still rebuilds InnoDB clustered index)'
END AS migration_path,
TABLE_ROWS AS estimated_rows,
CONCAT(ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 1), ' MB') AS total_size,
CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB') AS heap_size,
CONCAT(ROUND(INDEX_LENGTH / 1024 / 1024, 1), ' MB') AS index_size
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
SELECT 'dashboard_roles' AS t, COUNT(*) AS dup_groups, SUM(c) - COUNT(*) AS rows_dropped
FROM (SELECT COUNT(*) c FROM dashboard_roles GROUP BY dashboard_id, role_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_slices', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_slices GROUP BY dashboard_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_user GROUP BY user_id, dashboard_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id, report_schedule_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_roles', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_roles GROUP BY role_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_tables', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_tables GROUP BY table_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'slice_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM slice_user GROUP BY user_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'sqlatable_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM sqlatable_user GROUP BY user_id, table_id HAVING COUNT(*) > 1) g
ORDER BY rows_dropped DESC;
SELECT
CONSTRAINT_NAME,
CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS referencing_table,
COLUMN_NAME AS referencing_column,
REFERENCED_TABLE_NAME AS referenced_table,
REFERENCED_COLUMN_NAME AS referenced_column
FROM information_schema.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = DATABASE()
AND REFERENCED_TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
AND REFERENCED_COLUMN_NAME = 'id';
SELECT
TABLE_NAME AS table_name,
CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB') AS heap_size,
ROUND(DATA_LENGTH / 1024 / 1024, 1) AS heap_size_mb,
ROUND(DATA_LENGTH / 1024 / 1024 / 100.0, 1) AS est_rewrite_seconds_at_100mbs
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
ORDER BY DATA_LENGTH DESC; |
…5349) Justin Park (@justinpark) reported on apache#39859: MySQLdb.OperationalError: (1832, "Cannot change column 'dashboard_id': used in a foreign key constraint 'fk_dashboard_roles_dashboard_id_dashboards'") Root cause: ``batch_op.alter_column(fk1, nullable=False)`` for the six non-UNIQUE association tables emits ``ALTER COLUMN`` on a column that participates in an FK constraint. MySQL 8 rejects this with ERROR 1832 when the table has data — even when the change is just ``NULL`` → ``NOT NULL`` and the column is already part of a freshly-added composite primary key (which InnoDB has just made implicitly NOT NULL anyway). The error fires on populated tables only; CI's ``test-mysql`` shard runs against empty tables and so didn't catch this, while a real production-shaped install does. The ``alter_column`` was only ever needed for SQLite, where composite ``PRIMARY KEY`` does not promote constituent columns to ``NOT NULL`` (a long-standing SQLite quirk — only ``INTEGER PRIMARY KEY`` does). PostgreSQL and MySQL implicitly promote PK columns to ``NOT NULL`` as part of ``ADD PRIMARY KEY``, so the explicit step is unnecessary on both — and on MySQL it's actively broken on populated tables. Fix: extract the ``alter_column`` pair into a helper ``_enforce_not_null_for_sqlite()`` that no-ops on Postgres and MySQL. Both branches of the per-table upgrade (the ``recreate="always"`` path for the two UNIQUE-bearing tables, and the direct-ALTER path for the other six) now call the helper instead of inlining the ``alter_column``. Verified end-to-end: downgrade-then-upgrade against MySQL with ~12M total junction rows (10M dashboard_slices + 1M each slice_user/dashboard_user + 100K dashboard_roles) completes in 1m 39s with no ERROR 1832. The 44 in-memory SQLite tests still pass. Considered Justin's alternative (drop FKs on MySQL across all eight tables, unifying the two branches) but rejected as more invasive — it would require capturing FK metadata and explicitly re-creating the FKs for the six non-recreate tables, since they don't go through the ``copy_from`` path that re-creates FKs automatically. The SQLite-only approach is more targeted: it removes the operation that MySQL rejects rather than working around the rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three improvements from @aminghadersohi's review on apache#39859: 1. **`fk["name"]` unguarded in ``_downgrade_mysql_table`` re-add loop** The drop loop gates on ``if fk_name := fk.get("name"):`` but the re-add loop accessed ``fk["name"]`` unconditionally in an f-string. MySQL/InnoDB always assigns FK names, so this branch was defensive, but the asymmetry was confusing. Symmetrized via ``continue`` at the top of the re-add loop. 2. **``ondelete`` whitelist before raw-SQL interpolation** The value comes from MySQL's ``information_schema`` (not user input), but interpolating a reflected string into raw SQL without a guard left a "what if an unexpected value appears" footgun. Added ``_VALID_ONDELETE_ACTIONS`` (the four SQL-standard actions) and a ``RuntimeError`` when an unexpected value is reflected. 3. **Direct ALTER on PostgreSQL for tables with pre-existing UNIQUE** ``recreate="always"`` is dialect-agnostic — on PostgreSQL it triggers ``CREATE TABLE AS SELECT → DROP → RENAME`` holding ``ACCESS EXCLUSIVE`` for the full table-copy duration. For a multi-million-row ``dashboard_slices``, that lock window can be noticeable. The reflected UNIQUE constraint has a stable name on PostgreSQL (default ``<table>_<cols>_key`` convention), so dropping it directly and then running structural change as direct ALTER avoids the copy entirely. The reflected UNIQUE name is wrapped in a new ``_drop_redundant_unique_by_name()`` helper. Postgres takes the direct path; MySQL keeps ``recreate="always"`` because InnoDB binds FKs to the UNIQUE's underlying index for back-reference (``DROP CONSTRAINT`` on the UNIQUE there raises ``ERROR 1553``); SQLite keeps ``recreate="always"`` because unnamed UNIQUEs reflect with ``name=None`` and can't be dropped by name. Verified end-to-end: downgrade-then-upgrade against MySQL with ~12M total junction rows seeded completes in ~1m 41s (within the range of the prior measurements). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
887d49c to
e69c731
Compare
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…105349) Add a "Sizing the maintenance window on PostgreSQL" sub-section to the operator runbook. The simple per-table COUNT/duplicate/NULL queries that were already there are dialect-portable but only count rows; operators on PostgreSQL with large deployments need to characterize the migration's runtime cost before scheduling it. Adds four diagnostic queries: - Per-table size, row count (from pg_class.reltuples), and which migration path each table will take (recreate-rewrite vs direct ALTER). Sizes the work concretely. - Aggregated duplicate-row roll-up: dup_groups + total rows_dropped per table. Replaces eight separate per-table queries with one consolidated result for audit/dump-before-apply decisions. - External-FK pre-flight check (the same one the migration runs at upgrade time and aborts on). Lets operators surface any blocking external reference ahead of the maintenance window. Should be empty on a stock install. - Lock-window estimate for the two full-rewrite tables, using pg_relation_size and a conservative 100 MB/s rewrite throughput assumption. The other six use direct ALTER and are dominated by composite-index build time (seconds for low-millions-of-rows tables). Prompted by reviewer feedback on apache#39859 from a large deployment asking how to size the maintenance window. The original pre-flight queries are kept for cross-dialect operators (MySQL, SQLite) since the new queries use PostgreSQL-specific catalog views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…5349) Justin Park (@justinpark) reported on apache#39859: MySQLdb.OperationalError: (1832, "Cannot change column 'dashboard_id': used in a foreign key constraint 'fk_dashboard_roles_dashboard_id_dashboards'") Root cause: ``batch_op.alter_column(fk1, nullable=False)`` for the six non-UNIQUE association tables emits ``ALTER COLUMN`` on a column that participates in an FK constraint. MySQL 8 rejects this with ERROR 1832 when the table has data — even when the change is just ``NULL`` → ``NOT NULL`` and the column is already part of a freshly-added composite primary key (which InnoDB has just made implicitly NOT NULL anyway). The error fires on populated tables only; CI's ``test-mysql`` shard runs against empty tables and so didn't catch this, while a real production-shaped install does. The ``alter_column`` was only ever needed for SQLite, where composite ``PRIMARY KEY`` does not promote constituent columns to ``NOT NULL`` (a long-standing SQLite quirk — only ``INTEGER PRIMARY KEY`` does). PostgreSQL and MySQL implicitly promote PK columns to ``NOT NULL`` as part of ``ADD PRIMARY KEY``, so the explicit step is unnecessary on both — and on MySQL it's actively broken on populated tables. Fix: extract the ``alter_column`` pair into a helper ``_enforce_not_null_for_sqlite()`` that no-ops on Postgres and MySQL. Both branches of the per-table upgrade (the ``recreate="always"`` path for the two UNIQUE-bearing tables, and the direct-ALTER path for the other six) now call the helper instead of inlining the ``alter_column``. Verified end-to-end: downgrade-then-upgrade against MySQL with ~12M total junction rows (10M dashboard_slices + 1M each slice_user/dashboard_user + 100K dashboard_roles) completes in 1m 39s with no ERROR 1832. The 44 in-memory SQLite tests still pass. Considered Justin's alternative (drop FKs on MySQL across all eight tables, unifying the two branches) but rejected as more invasive — it would require capturing FK metadata and explicitly re-creating the FKs for the six non-recreate tables, since they don't go through the ``copy_from`` path that re-creates FKs automatically. The SQLite-only approach is more targeted: it removes the operation that MySQL rejects rather than working around the rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three improvements from @aminghadersohi's review on apache#39859: 1. **`fk["name"]` unguarded in ``_downgrade_mysql_table`` re-add loop** The drop loop gates on ``if fk_name := fk.get("name"):`` but the re-add loop accessed ``fk["name"]`` unconditionally in an f-string. MySQL/InnoDB always assigns FK names, so this branch was defensive, but the asymmetry was confusing. Symmetrized via ``continue`` at the top of the re-add loop. 2. **``ondelete`` whitelist before raw-SQL interpolation** The value comes from MySQL's ``information_schema`` (not user input), but interpolating a reflected string into raw SQL without a guard left a "what if an unexpected value appears" footgun. Added ``_VALID_ONDELETE_ACTIONS`` (the four SQL-standard actions) and a ``RuntimeError`` when an unexpected value is reflected. 3. **Direct ALTER on PostgreSQL for tables with pre-existing UNIQUE** ``recreate="always"`` is dialect-agnostic — on PostgreSQL it triggers ``CREATE TABLE AS SELECT → DROP → RENAME`` holding ``ACCESS EXCLUSIVE`` for the full table-copy duration. For a multi-million-row ``dashboard_slices``, that lock window can be noticeable. The reflected UNIQUE constraint has a stable name on PostgreSQL (default ``<table>_<cols>_key`` convention), so dropping it directly and then running structural change as direct ALTER avoids the copy entirely. The reflected UNIQUE name is wrapped in a new ``_drop_redundant_unique_by_name()`` helper. Postgres takes the direct path; MySQL keeps ``recreate="always"`` because InnoDB binds FKs to the UNIQUE's underlying index for back-reference (``DROP CONSTRAINT`` on the UNIQUE there raises ``ERROR 1553``); SQLite keeps ``recreate="always"`` because unnamed UNIQUEs reflect with ``name=None`` and can't be dropped by name. Verified end-to-end: downgrade-then-upgrade against MySQL with ~12M total junction rows seeded completes in ~1m 41s (within the range of the prior measurements). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace synthetic id INTEGER PRIMARY KEY with composite PRIMARY KEY (fk1, fk2) on the eight pure-junction tables: dashboard_roles, dashboard_slices, dashboard_user, report_schedule_user, rls_filter_roles, rls_filter_tables, slice_user, sqlatable_user. The redundant UNIQUE(fk1, fk2) on dashboard_slices and report_schedule_user is dropped (subsumed by the new PK). Migration handles dialect quirks: copy_from for tables with pre-existing UNIQUE (so SQLite's anonymous-constraint reflection doesn't matter), wrapped- subquery dedupe for MySQL (ERROR 1093), sa.Identity(always=False) on downgrade to backfill the restored id column without NOT NULL violations, and distinct PK names per direction (pk_<table> on upgrade, <table>_pkey on downgrade) to avoid round-trip index-name collisions on Postgres. ORM Table() definitions updated to match. UPDATING.md entry added with operator runbook (BI-tool impact, pre-flight inventory queries, dedupe-row- loss notice, pg_dump workaround, FK-NOT-NULL downgrade asymmetry note). Tests: 8 schema-shape assertions (post-upgrade), 8 duplicate-rejection unit tests, 8 distinct-pair sanity tests, 1 round-trip + idempotency test (in-memory SQLite via Alembic MigrationContext). Continuum-restore verification against the new shape is out of scope for this PR; it is the responsibility of the versioning epic (sc-103156). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups from PR review: 1. ``dashboard_roles.dashboard_id`` was created nullable in revision e11ccdd12658 but was missing from ``TABLES_WITH_NULLABLE_FKS``. A production database with a stray NULL ``dashboard_id`` row would have failed the PK-add with a cryptic constraint violation. Fix by running the NULL-FK cleanup on every affected table — it is a no-op DELETE on tables whose FK columns are already NOT NULL, and it eliminates the risk of further drift in the hardcoded set. ``dashboard_roles`` is added to the documentation set; the runtime now does not consult it. 2. The unit-test parent-table name for ``rls_filter_roles`` and ``rls_filter_tables`` was ``rls_filter`` (does not exist) instead of the real parent ``row_level_security_filters``. Test passes either way (the in-memory FK is self-consistent), but the parameter is now accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four operator-experience improvements from the second review pass: 1. ``TABLES_WITH_NULLABLE_FKS`` is now explicitly documented as an informational set that is not consulted at runtime; the comment explains the previous ``dashboard_roles`` omission was the bug that motivated the always-run cleanup. 2. ``_delete_null_fk_rows`` docstring updated to match the "always run" semantics (was still claiming "called only on tables in TABLES_WITH_NULLABLE_FKS"). 3. ``_check_no_external_fks_to_id`` now documents its scope limitation: ``Inspector.get_table_names()`` returns the default schema only, so cross-schema FKs in non-standard multi-schema PostgreSQL deployments would not be caught. The single-schema case (Superset's documented deployment) is fully covered. 4. ``_dedupe_by_min_id`` now logs a sample of up to 10 discarded ``(fk1, fk2, id)`` tuples at WARN before deletion, so operators can audit which rows the ``MIN(id)`` policy drops. The keep- original policy is correct in practice but discards later re-grants on ownership tables; the sample makes that visible. 5. ``UPDATING.md`` documents the upgrade/downgrade primary-key name divergence (``pk_<table>`` vs ``<table>_pkey``) so operators using schema-comparison tools don't mistake it for migration drift. No schema or runtime-behaviour changes. All 44 migration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI test-mysql failed with: MySQLdb.OperationalError: (1826, "Duplicate foreign key constraint name 'fk_dashboard_slices_slice_id_slices'") Root cause: MySQL scopes foreign-key constraint names per-database, not per-table (PostgreSQL and SQLite scope per-table). The ``batch_alter_table(... recreate="always", copy_from=...)`` path used for ``dashboard_slices`` and ``report_schedule_user`` builds ``_alembic_tmp_<table>`` carrying the original FK names from ``copy_from`` while the original table still holds those names — MySQL rejects the temp-table creation with ERROR 1826. Fix: on MySQL only, drop the original FK constraints by name before the ``batch_alter_table`` runs. The ``copy_from`` re-creates them on the rebuilt table with their original names, so the post-migration shape is unchanged. On PostgreSQL and SQLite the original code path still runs unchanged. Local SQLite tests (44 passed, 1 skipped) still pass; CI will validate on MySQL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two MySQL-only failures in the downgrade path, found by running the full migration history against a fresh MySQL 8 container: 1. ``MySQLdb.OperationalError: (1553, "Cannot drop index 'PRIMARY': needed in a foreign key constraint")``. InnoDB uses the composite PK index to back the FK on the leftmost column. The downgrade tried to drop the composite PK before dropping the FKs, orphaning the FK's backing index. PostgreSQL and SQLite create separate indexes for FK columns and don't trip on this. 2. ``Field 'id' doesn't have a default value`` on subsequent INSERT. ``sa.Identity(always=False)`` only emits ``AUTO_INCREMENT`` on MySQL when the column is created with ``primary_key=True`` — our portable path adds the column first then creates the PK separately, so MySQL leaves the column without auto-generation. Existing rows would all collide on id=0; future inserts fail because no default. Postgres' ``GENERATED BY DEFAULT AS IDENTITY`` and SQLite's ``INTEGER PRIMARY KEY`` rowid alias don't have this gap. Fix: extract ``_downgrade_mysql_table()`` that emits the canonical MySQL idiom — drop FKs, then a single ALTER combining ``DROP PRIMARY KEY, ADD COLUMN id INT NOT NULL AUTO_INCREMENT, ADD PRIMARY KEY (id)`` (which backfills existing rows with sequential ids and preserves AUTO_INCREMENT), restore the redundant UNIQUE on the 2 tables that originally had it, and re-add the FKs with their original names. Postgres and SQLite keep the existing portable ``batch_alter_table`` path. Raw SQL is unavoidable for the combined-ALTER form; per the constitution it's allowed for dialect-specific DDL with no SQLA equivalent, with triple-quoted strings for legibility. Verified end-to-end: upgrade → downgrade → upgrade against a fresh MySQL 8 container with INSERT-without-id sanity check showing the restored ``id`` column auto-increments correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Found by running fresh-install + round-trip against a real SQLite DB: 6 of the 8 affected tables had FK columns that were originally declared nullable. PostgreSQL and MySQL implicitly promote the constituent columns of an ``ALTER TABLE ... ADD PRIMARY KEY`` to ``NOT NULL``; SQLite does not (it's a long-standing SQLite quirk — only ``INTEGER PRIMARY KEY`` enforces NOT NULL on a composite-PK column). Result: a fresh SQLite install would accept ``INSERT INTO dashboard_slices (NULL, 5)`` despite both columns being part of the composite PK. Our integration tests previously masked this: the test fixture seeds columns with ``nullable=False``, so the post-upgrade NOT NULL assertion passed regardless of whether the migration enforced it. Fix: add explicit ``batch_op.alter_column(fk, nullable=False)`` for both FK columns inside the per-table batch_alter_table block. On PostgreSQL and MySQL this is a no-op (PK already implies NOT NULL); on SQLite it adds the missing NOT NULL declaration so a fresh install matches the data-model.md "After" contract. Verified end-to-end: - Postgres + MySQL: column shape unchanged (still NOT NULL) - SQLite fresh install + round-trip: all 8 tables now have NOT NULL on FK columns, ``INSERT (NULL, 5)`` correctly rejected with IntegrityError on dashboard_slices, dashboard_user, sqlatable_user Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI cypress + playwright shards were red with: ERROR [flask_migrate] Error: Multiple head revisions are present for given argument 'head' The recent rebase onto master pulled in ``33d7e0e21daa_add_semantic_layers_and_views.py`` (from PR apache#37815, "semantic layer extension"), which had been authored against ``ce6bd21901ab`` as its parent — the same parent our migration referenced. After the rebase both migrations point at ``ce6bd21901ab``, producing two heads and breaking ``flask db upgrade head`` for any downstream consumer (CI's Cypress / Playwright shards spin up a real Superset instance via ``superset db upgrade``, which is why those shards failed first; the integration shards run against a precomputed schema and didn't surface this). Fix: chain our migration after the semantic-layer migration by pointing ``down_revision`` at ``33d7e0e21daa``. The chain is now linear: ... → ce6bd21901ab → 33d7e0e21daa (semantic layers) → 2bee73611e32 (composite PK, this PR) Verified with ``superset db heads`` (returns single head ``2bee73611e32``) and the local migration test suite (44 passed, 1 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…105349) Add a "Sizing the maintenance window on PostgreSQL" sub-section to the operator runbook. The simple per-table COUNT/duplicate/NULL queries that were already there are dialect-portable but only count rows; operators on PostgreSQL with large deployments need to characterize the migration's runtime cost before scheduling it. Adds four diagnostic queries: - Per-table size, row count (from pg_class.reltuples), and which migration path each table will take (recreate-rewrite vs direct ALTER). Sizes the work concretely. - Aggregated duplicate-row roll-up: dup_groups + total rows_dropped per table. Replaces eight separate per-table queries with one consolidated result for audit/dump-before-apply decisions. - External-FK pre-flight check (the same one the migration runs at upgrade time and aborts on). Lets operators surface any blocking external reference ahead of the maintenance window. Should be empty on a stock install. - Lock-window estimate for the two full-rewrite tables, using pg_relation_size and a conservative 100 MB/s rewrite throughput assumption. The other six use direct ALTER and are dominated by composite-index build time (seconds for low-millions-of-rows tables). Prompted by reviewer feedback on apache#39859 from a large deployment asking how to size the maintenance window. The original pre-flight queries are kept for cross-dialect operators (MySQL, SQLite) since the new queries use PostgreSQL-specific catalog views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…349) Mirror of the PostgreSQL diagnostic queries added in 1114877, adapted for MySQL/InnoDB. One important difference: InnoDB rebuilds the clustered index on every PK change, so all eight tables undergo a full table rebuild on MySQL — not just the two that go through the explicit ``recreate="always"`` path. The lock-window estimate query is updated to cover all eight rather than just two, and the "migration_path" column makes the rebuild expectation explicit ("direct ALTER (still rebuilds InnoDB clustered index)"). Other notes: - ``information_schema.TABLES.TABLE_ROWS`` is an InnoDB estimate, analogous to PostgreSQL's ``reltuples``; documented inline. - ``KEY_COLUMN_USAGE`` carries both sides of the FK in a single row on MySQL, so the external-FK pre-flight check is simpler than the PostgreSQL version (no joins between three views). - The aggregated dedupe query is portable standard SQL; included verbatim for copy-paste convenience. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ``docker-compose-mysql.yml``, a compose-override file that swaps the default Postgres metadata DB for MySQL 8 with one extra ``-f`` flag: docker compose -f docker-compose.yml -f docker-compose-mysql.yml up Useful for evaluating dialect-specific behaviour (e.g., the runtime cost of DDL migrations on a deployment whose production metadata DB is MySQL — the question raised by review feedback on this PR). Mirrors the connection settings used by CI's ``test-mysql`` shard: ``mysql+mysqldb`` dialect, charset ``utf8mb4`` with binary_prefix. Host port defaults to 13306 (configurable via ``DATABASE_PORT_MYSQL``) to avoid colliding with a native MySQL install on 3306. A separate volume (``db_home_mysql``) keeps MySQL data isolated from the Postgres ``db_home`` volume, so switching between the two with ``-f`` flag toggles doesn't corrupt either side. The Postgres-specific init scripts under ``docker/docker-entrypoint-initdb.d/`` are not mounted on the MySQL service (they are postgres-only). Examples / cypress fixtures still load via ``superset-init``'s post-startup steps, which run ``superset load-examples`` against whichever metadata DB is in use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix two follow-on issues reported when starting the dev stack with docker-compose-mysql.yml: 1. ``superset-init`` step 4 (load-examples) fails with ``MySQLdb.OperationalError: (2002, "Can't connect to server on 'db'")`` because the analytics-examples DB connection inherits ``EXAMPLES_PORT=5432`` (Postgres port) from ``docker/.env``. The override flipped ``DATABASE_DIALECT`` to ``mysql+mysqldb`` but left the EXAMPLES_* group on Postgres defaults, so the URI became ``mysql+mysqldb://examples:examples@db:5432/examples`` — MySQL container has no listener on 5432. Fix: add ``EXAMPLES_HOST/PORT/DB/USER/PASSWORD`` and a complete ``SUPERSET__SQLALCHEMY_EXAMPLES_URI`` to the ``mysql-env`` anchor. 2. The Postgres init scripts under ``docker/docker-entrypoint-initdb.d/`` (``cypress-init.sh``, ``examples-init.sh``) get mounted on the MySQL container too — compose merges volume lists. They invoke ``psql`` which doesn't exist in the MySQL image, abort with ``psql: command not found``, and prevent the ``examples`` DB from being created. Fix: add a MySQL-specific init script ``docker/mysql-init/examples-init.sql`` that creates the ``examples`` database and user, and mount it at ``/docker-entrypoint-initdb.d`` in the override. Compose's later-takes-precedence rule on duplicate volume targets displaces the Postgres init dir, so the MySQL container only sees the MySQL-compatible script. (Used a plain duplicate-target mount rather than the ``!override`` tag because pre-commit's ``check-yaml`` doesn't recognize Compose's custom YAML tags.) Recovery for an existing failed MySQL stack: ``docker compose -f docker-compose.yml -f docker-compose-mysql.yml down``, then ``docker volume rm superset_db_home_mysql`` (so the new init script runs on the next fresh boot), then ``up`` again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add ``scripts/seed_junction_load.py``, a backend-agnostic script that
bulk-inserts synthetic parent rows (dashboards, slices, users, roles,
tables, dbs) and many-to-many junction rows for the four largest
association tables targeted by the composite-PK migration:
``dashboard_slices``, ``slice_user``, ``dashboard_user``,
``dashboard_roles``.
Designed for measuring migration runtime at varying scales — run with
a series of size flags (100K / 1M / 5M / 10M for the target table)
and time the migration at each scale to verify the predicted
``O(N log N)`` extrapolation against real numbers.
Properties:
- **Reproducible**: deterministic cross-product walk through parent IDs
produces a stable pair sequence; re-running is replayable.
- **Idempotent**: re-running with the same target is a no-op; with a
higher target, only new rows are added.
- **Backend-agnostic**: connects via Superset's standard ``DATABASE_*``
env vars (or ``SUPERSET__SQLALCHEMY_DATABASE_URI``). Branches on
dialect for ``BINARY(16)`` vs ``UUID`` vs TEXT/BLOB UUID columns.
- **Batched**: bulk INSERT 10K rows per statement.
- **Per-phase timing**: logs elapsed wall time for the parents phase,
the junctions phase as a whole, and per junction-table.
- **Avoidance set**: loads existing junction pairs into a Python set
so re-runs on top of pre-existing data don't collide on the
uniqueness constraint.
Usage (inside the Superset container):
docker exec superset-superset-1 \\
/app/.venv/bin/python /app/scripts/seed_junction_load.py \\
--dashboard-slices 1000000
Defaults target a "large multi-team install" shape: 1M
``dashboard_slices``, 100K each ``slice_user`` / ``dashboard_user``,
10K ``dashboard_roles``. Override per-table via flags.
Tested locally on MySQL (the user's current eval stack):
- 200/100/100/50 row mini-run produced expected counts.
- Re-running at the same target is a no-op (idempotent).
- ``--dry-run`` plans without writing.
Junction tables not yet covered (``sqlatable_user``, ``rls_filter_*``,
``report_schedule_user``) are typically small in production and
require additional parent seeding (RLS filters, report schedules)
that wasn't worth the scope here. Adding them is straightforward by
extending ``JUNCTIONS`` and writing the corresponding parent seeder.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the stress-test seed script with an optional duplicate-row
injection step, used to measure the empirical cost of the migration's
``_dedupe_by_min_id`` phase.
Usage: after running the normal seed at a given scale, add
``--dirty-duplicates-pct 5`` (or any non-zero value) to inject that
percentage of duplicate ``(fk1, fk2)`` rows into each non-UNIQUE
junction (slice_user, dashboard_user, dashboard_roles —
dashboard_slices is skipped because its UNIQUE constraint, present
both pre- and post-migration, rejects duplicates).
Pre-condition: requires the DB to be at the pre-migration revision
(33d7e0e21daa). The post-migration composite PK rejects duplicates,
so attempting to inject on the upgraded schema errors out.
Empirical result on MySQL @ 10M dashboard_slices + ~2.1M other
junction rows + 105K injected duplicates (5% on the 3 non-UNIQUE
tables):
Upgrade time: 1m 36s vs clean baseline 1m 37s
→ dedupe cost is within measurement noise; the table-scan that
the migration already performs dominates whether or not
duplicates exist.
This empirically confirms what the cost-model predicted: the
``_dedupe_by_min_id`` GROUP BY scan is the dominant cost of that
phase, and the actual per-duplicate DELETE is negligible.
NULL-FK injection deliberately skipped — would require altering the
six non-UNIQUE FK columns from NOT NULL back to nullable (the
migration's downgrade keeps them NOT NULL by design), which adds
per-backend ALTER complexity for a code path that's structurally
identical in cost shape (DELETE WHERE col IS NULL is the same scan
shape as the dedupe scan).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…5349) Justin Park (@justinpark) reported on apache#39859: MySQLdb.OperationalError: (1832, "Cannot change column 'dashboard_id': used in a foreign key constraint 'fk_dashboard_roles_dashboard_id_dashboards'") Root cause: ``batch_op.alter_column(fk1, nullable=False)`` for the six non-UNIQUE association tables emits ``ALTER COLUMN`` on a column that participates in an FK constraint. MySQL 8 rejects this with ERROR 1832 when the table has data — even when the change is just ``NULL`` → ``NOT NULL`` and the column is already part of a freshly-added composite primary key (which InnoDB has just made implicitly NOT NULL anyway). The error fires on populated tables only; CI's ``test-mysql`` shard runs against empty tables and so didn't catch this, while a real production-shaped install does. The ``alter_column`` was only ever needed for SQLite, where composite ``PRIMARY KEY`` does not promote constituent columns to ``NOT NULL`` (a long-standing SQLite quirk — only ``INTEGER PRIMARY KEY`` does). PostgreSQL and MySQL implicitly promote PK columns to ``NOT NULL`` as part of ``ADD PRIMARY KEY``, so the explicit step is unnecessary on both — and on MySQL it's actively broken on populated tables. Fix: extract the ``alter_column`` pair into a helper ``_enforce_not_null_for_sqlite()`` that no-ops on Postgres and MySQL. Both branches of the per-table upgrade (the ``recreate="always"`` path for the two UNIQUE-bearing tables, and the direct-ALTER path for the other six) now call the helper instead of inlining the ``alter_column``. Verified end-to-end: downgrade-then-upgrade against MySQL with ~12M total junction rows (10M dashboard_slices + 1M each slice_user/dashboard_user + 100K dashboard_roles) completes in 1m 39s with no ERROR 1832. The 44 in-memory SQLite tests still pass. Considered Justin's alternative (drop FKs on MySQL across all eight tables, unifying the two branches) but rejected as more invasive — it would require capturing FK metadata and explicitly re-creating the FKs for the six non-recreate tables, since they don't go through the ``copy_from`` path that re-creates FKs automatically. The SQLite-only approach is more targeted: it removes the operation that MySQL rejects rather than working around the rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three improvements from @aminghadersohi's review on apache#39859: 1. **`fk["name"]` unguarded in ``_downgrade_mysql_table`` re-add loop** The drop loop gates on ``if fk_name := fk.get("name"):`` but the re-add loop accessed ``fk["name"]`` unconditionally in an f-string. MySQL/InnoDB always assigns FK names, so this branch was defensive, but the asymmetry was confusing. Symmetrized via ``continue`` at the top of the re-add loop. 2. **``ondelete`` whitelist before raw-SQL interpolation** The value comes from MySQL's ``information_schema`` (not user input), but interpolating a reflected string into raw SQL without a guard left a "what if an unexpected value appears" footgun. Added ``_VALID_ONDELETE_ACTIONS`` (the four SQL-standard actions) and a ``RuntimeError`` when an unexpected value is reflected. 3. **Direct ALTER on PostgreSQL for tables with pre-existing UNIQUE** ``recreate="always"`` is dialect-agnostic — on PostgreSQL it triggers ``CREATE TABLE AS SELECT → DROP → RENAME`` holding ``ACCESS EXCLUSIVE`` for the full table-copy duration. For a multi-million-row ``dashboard_slices``, that lock window can be noticeable. The reflected UNIQUE constraint has a stable name on PostgreSQL (default ``<table>_<cols>_key`` convention), so dropping it directly and then running structural change as direct ALTER avoids the copy entirely. The reflected UNIQUE name is wrapped in a new ``_drop_redundant_unique_by_name()`` helper. Postgres takes the direct path; MySQL keeps ``recreate="always"`` because InnoDB binds FKs to the UNIQUE's underlying index for back-reference (``DROP CONSTRAINT`` on the UNIQUE there raises ``ERROR 1553``); SQLite keeps ``recreate="always"`` because unnamed UNIQUEs reflect with ``name=None`` and can't be dropped by name. Verified end-to-end: downgrade-then-upgrade against MySQL with ~12M total junction rows seeded completes in ~1m 41s (within the range of the prior measurements). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Belt-and-braces invariant: ``t.name`` is interpolated as a
backtick-quoted identifier into the ALTER statements emitted by
``_downgrade_mysql_table``. The values originate from
``AFFECTED_TABLES`` (a module-level literal), so SQL injection is
already structurally precluded at the call site. Adding an explicit
``allowed = {a.name for a in AFFECTED_TABLES}`` membership check
makes that invariant load-bearing rather than implicit — a future
refactor that loosens the call-site can't slip past review.
Surfaced during a downstream SQLAlchemy review on the entity-versioning
branch that stacks on top of this one; lifted onto sc-105349 because
the patch is properly scoped to this branch's composite-PK migration.
After rebasing onto master, 2bee73611e32 and master's 31dae2559c05 both revised 33d7e0e21daa, forking the alembic chain into two heads ('superset db upgrade' refuses to run). Re-point down_revision at 31dae2559c05 so the versioning chain extends the real head.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MySQL branch dropped the live FK constraints and then re-reflected them for the copy_from table — which only returned the pre-drop list via the Inspector's per-instance info_cache, an implementation detail. Capture the list before dropping and pass it through explicitly (the downgrade path already did this). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e69c731 to
b394df8
Compare
Two fixes from a 4-lens review pass:
- Resumability guard: on MySQL every DDL statement auto-commits, so a failure at table N of 8 left tables 1..N-1 converted with alembic_version un-stamped — re-running failed at table 1 (drop_column('id') on a converted table) and downgrade couldn't run either. Skip tables whose id column is already gone, making re-runs safe on every dialect.
- The down_revision re-point left two stale 33d7e0e21daa references: the migration docstring header, and — operationally worse — the seed script's --dirty-duplicates-pct help text, which instructed a downgrade that would unwind every migration since 2025-11. The help text now points at the migration's down_revision instead of hardcoding a hash.
Also: drop the never-called required_parent_count helper and trim report_schedule_user from JUNCTIONS_WITH_UNIQUE (the script never seeds that table; the entry implied coverage that doesn't exist).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The migration's riskiest half — _delete_null_fk_rows / _dedupe_by_min_id / _assert_no_duplicates — had zero coverage: the fixtures seeded no rows, and both test schema builders created FK columns NOT NULL, diverging from the real pre-migration shape (six of eight tables allowed NULLs), so test_fk_columns_not_null passed trivially. - Build the pre-migration schema with historically-accurate nullable FKs (keyed on the migration's TABLES_WITH_NULLABLE_FKS, giving that documentation set a load-bearing consumer). - Add test_upgrade_scrubs_null_fks_and_duplicates: seeds NULL-FK rows and duplicate pairs, runs upgrade, asserts exactly the distinct non-NULL pairs survive. Verified deletable-detectable: commenting out _dedupe_by_min_id makes it fail. - Delete the permanently-skipped placeholder test and the captured-but-never-asserted pre_shape; replace spec-kit references (T034a/tasks.md/quickstart.md) with self-contained prose. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four corrections to the maintenance-window guidance: - PostgreSQL takes the direct-ALTER path for ALL eight tables (the redundant UNIQUEs are dropped by name); the doc described a recreate='always' full rewrite the code deliberately avoids, and sized only those two tables. The lock-window query now covers all eight. - State the cumulative-lock property: Alembic runs the upgrade in one transaction on Postgres, so ACCESS EXCLUSIVE locks are held until commit — total unavailability is the sum of per-table windows; quiesce the app. - MySQL: DROP COLUMN and ADD PRIMARY KEY are separate ALTERs, so most tables pay the InnoDB clustered-index rebuild twice — budget ~2x the single-rebuild estimate. - Downgrade is a comparable maintenance window in its own right, not a quick undo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SUMMARY
Replace synthetic
id INTEGER PRIMARY KEYwith compositePRIMARY KEY (fk1, fk2)on the eight pure-junction tables. The redundantUNIQUE(fk1, fk2)on the two tables that previously carried one is dropped (subsumed by the new PK). All eight tables are M:N association tables — there is no semantic load on the surrogateid.Affected tables and their composite PK pair:
dashboard_roles(dashboard_id, role_id)dashboard_slices(dashboard_id, slice_id)dashboard_user(user_id, dashboard_id)report_schedule_user(user_id, report_schedule_id)rls_filter_roles(role_id, rls_filter_id)rls_filter_tables(table_id, rls_filter_id)slice_user(user_id, slice_id)sqlatable_user(user_id, table_id)Why now: 6 of the 8 tables had no UNIQUE constraint at all, so duplicate
(fk1, fk2)rows could (and did) accumulate in production data. The migration deduplicates byMIN(id)before the PK promotion. This change also lifts a precondition for the entity-versioning epic (sc-103156) — SQLAlchemy-Continuum's M:N restore replays the version table; surrogate-PK junctions interact poorly with the bulk reassign-and-replace pattern (Continuum issue #129). Continuum verification against the new shape is not part of this PR — it is the responsibility of the versioning epic.BEFORE/AFTER
dashboard_slicesis the canonical example (the other seven follow the same pattern, withdashboard_user,dashboard_roles,slice_user,sqlatable_user,rls_filter_*having no pre-existing UNIQUE):Before:
After:
The redundant
UNIQUE (dashboard_id, slice_id)is dropped (subsumed by the PK).TESTING INSTRUCTIONS
Reproduces locally per
quickstart.mdsteps 4–8:superset db upgrade. Two tables with pre-existing UNIQUE (dashboard_slices,report_schedule_user) are batch-recreated viacopy_from; the other six loseidand gain a composite PK directly. Migration logs the duplicate-row count and NULL-FK row count for each affected table.INSERTa(fk1, fk2)pair, thenINSERTthe same pair a second time. The second insert must raiseIntegrityError(PK violation). Same pair under different IDs is no longer possible.pytest tests/unit_tests/migrations/composite_pk_association_tables_test.py(16 parametrized assertions: 8 duplicate-rejection, 8 distinct-pair).pytest tests/integration_tests/migrations/composite_pk_association_tables__tests.py tests/integration_tests/migrations/composite_pk_round_trip__tests.py(28 schema-shape assertions plus a round-trip + idempotency test against in-memory SQLite viaMigrationContext).superset db downgrade <prior-revision>. Theidcolumn is restored (backfilled viasa.Identity(always=False)on Postgres/MySQL), and the originalUNIQUE(fk1, fk2)is re-added on the two tables that originally had it. Intentional asymmetry: FK columns remainNOT NULLafter downgrade — under SQLAlchemysecondary=semantics,NULL-FK junction rows are meaningless, so we don't restore the original nullable state.superset db upgradeafter downgrade returns the post-upgrade shape.Cross-DB matrix (SC-005)
Verified end-to-end on all three backends locally — fresh full-history install (
superset db upgradefrom scratch) plus round-trip (upgrade → downgrade → upgrade) — and CI re-runs the migration during integration-test setup.test-postgres (current/next/previous),test-postgres-hive,test-postgres-prestoINSERT-without-idsanity check; CItest-mysqlINSERT (NULL, 5)-rejection sanity check; CItest-sqlite; in-memory unit/integration tests (44 passed, 1 skip)Bugs found and fixed during local cross-backend verification
The dedicated unit and integration tests run against in-memory SQLite, and CI's per-backend integration shards exercise the migration only as part of the suite's setup phase (which catches crashes but not subtle behavioural disparities). Doing fresh-install + round-trip locally on each real backend — plus reviewer feedback from @justinpark on a populated DB — surfaced six dialect-specific issues that were masked by both layers:
Duplicate foreign key constraint name) duringrecreate="always"— InnoDB scopes FK constraint names per-database, not per-table. The temp table created by the recreate path collides with the original. CI's setup phase did catch this (thetest-mysqlshard turned red on first push); fix: drop FKs by name before the recreate on MySQL.Cannot drop index 'PRIMARY': needed in a foreign key constraint) on downgrade — InnoDB uses the composite PK index to back the FK on the leftmost column. Dropping the PK without first dropping the FKs orphans the index. Only manifests on a real round-trip — CI's setup-only check would never see it; fix: drop FKs before the PK swap on MySQL, re-add them after.Cannot change column: used in a foreign key constraint) on populated tables —batch_op.alter_column(fk, nullable=False)(added to enforce NOT NULL on SQLite, see below) emitsALTER COLUMNon a column that participates in an FK, which MySQL 8 rejects when the table has data. CI'stest-mysqlshard runs against empty tables and so didn't catch this; @justinpark caught it on a populated install. Fix: extract thealter_column nullable=Falseinto a_enforce_not_null_for_sqlite()helper that no-ops on Postgres (whereADD PRIMARY KEYalready promotes columns implicitly) and on MySQL (same plus avoids 1832).AUTO_INCREMENTon the restoredidcolumn on MySQL —sa.Identity(always=False)only emitsAUTO_INCREMENTwhen the column hasprimary_key=Trueat create time, but our portable path adds the column then creates the PK separately. Existing rows would all collide onid=0; subsequentINSERTs would fail withField 'id' doesn't have a default value. Found by an explicitINSERT-without-idtest against the post-downgrade DB; fix: combinedDROP PRIMARY KEY, ADD COLUMN AUTO_INCREMENT, ADD PRIMARY KEYALTER on MySQL.NOT NULLon constituent columns (long-standing SQLite quirk — onlyINTEGER PRIMARY KEYdoes). PostgreSQL and MySQL implicitly promote PK columns toNOT NULL; SQLite leaves them nullable for compound keys. A fresh SQLite install would have acceptedINSERT (NULL, 5)despite both columns being part of the PK. The integration tests masked this because the test fixture seeds columns withnullable=False. Fix:_enforce_not_null_for_sqlite()helper — runs only on SQLite where the explicit step is required.recreate="always"is used to drop the redundant UNIQUE (per @aminghadersohi's review).recreate="always"is dialect-agnostic — on Postgres it triggersCREATE TABLE AS SELECT → DROP → RENAME, holdingACCESS EXCLUSIVEfor the entire copy duration. The reflected UNIQUE has a stable name on Postgres (default<table>_<cols>_key), so dropping it directly viaDROP CONSTRAINTand then running structural change as direct ALTER avoids the copy entirely. MySQL and SQLite keep therecreate="always"path (InnoDB binds FKs to the UNIQUE index → ERROR 1553 on direct drop; SQLite reflects unnamed UNIQUEs withname=None→ can't drop by name).The MySQL-specific fixes use raw triple-quoted SQL (no SQLA-core equivalent for the dialect-specific combined ALTER); the constitution allows raw SQL for dialect-specific DDL with no programmatic equivalent.
Migration runtime — empirical numbers
Measured by seeding synthetic data with
scripts/seed_junction_load.py(committed in this PR) on a MySQL 8 container, then timing downgrade + upgrade at each scale. Numbers below are MySQL on Docker on macOS — production Postgres on dedicated hardware will be at least as fast, likely faster:dashboard_slicesrowsLinear scaling at ~8–9 µs/row through 10M rows; no memory cliff observed. For typical large-enterprise self-hosted scale (1–5M
dashboard_slices) the upgrade is tens of seconds. Multi-tenant-SaaS scale (50M+) extrapolates to ~7–8 min. Dirty-data overhead (5% duplicates injected on the 3 non-UNIQUE junctions) added < 1s — the dedupe phase is dominated by the table scan that runs whether duplicates exist or not.Operators can self-size with the diagnostic queries in
UPDATING.md("Sizing the maintenance window on PostgreSQL / MySQL"): per-table row counts, on-disk size, aggregated duplicate roll-up, external-FK pre-flight check, lock-window estimate.Operator runbook
Operators should run the pre-flight inventory queries from
UPDATING.md("Composite primary keys on many-to-many association tables") before applying — they show how many duplicate(fk1, fk2)rows and how many NULL-FK rows exist in their database. The migration deletes both classes (keepingMIN(id)per group for duplicates). The full operator-facing runbook is inUPDATING.mdand includes:idpg_dumpagainst the new schemaADDITIONAL INFORMATION
copy_fromrecreate path isO(n)table-rewrite fordashboard_slicesandreport_schedule_user; the other six are direct ALTER. Expected to complete in seconds-to-tens-of-seconds.Continuum verification deferral
Verification of SQLAlchemy-Continuum's M:N restore behaviour against the new schema is the responsibility of the versioning epic (sc-103156) and is not part of this PR. This PR delivers the schema precondition; the versioning work picks up
sqlalchemy-continuumas a dependency and verifies the restore behaviour there.