Skip to content

Stop pgjsonb purge / archive from orphaning recent salt_returns rows#69061

Open
co-cy wants to merge 2 commits intosaltstack:3006.xfrom
co-cy:pgjsonb-fix-orphan-returns
Open

Stop pgjsonb purge / archive from orphaning recent salt_returns rows#69061
co-cy wants to merge 2 commits intosaltstack:3006.xfrom
co-cy:pgjsonb-fix-orphan-returns

Conversation

@co-cy
Copy link
Copy Markdown

@co-cy co-cy commented May 5, 2026

Depends on #69049. This branch is layered on top of that PR; the diff currently shows both commits and will rebase to a single commit once #69049 is merged.

What does this PR do?

_purge_jobs and _archive_jobs decided which jids rows to handle with this predicate:

delete from jids where jid in (
    select distinct jid from salt_returns where alter_time < %s
)

The subquery returned a jid as soon as any single salt_returns row for that jid was older than the cutoff. For long-running jobs whose minions answer at staggered times — slow orchestrate runs, SSH commands that take hours, syndic batches — the predicate fired on the first old return and deleted the parent jids row, but the recent salt_returns rows for the same jid stayed behind. They became orphans: salt-run jobs.lookup_jid could no longer rebuild the load, the archive held parent rows without their recent returns, and any code joining salt_returns to jids saw a broken parent edge.

This PR rewrites both predicates as an EXISTS / NOT EXISTS antijoin so a jids row is removed only when every salt_returns row for that jid is older than the cutoff:

delete from jids j where exists (
    select 1 from salt_returns r where r.jid = j.jid
) and not exists (
    select 1 from salt_returns r
    where r.jid = j.jid and r.alter_time >= %s
)

EXISTS is preferred over IN (SELECT distinct ...): it short-circuits on the first hit, uses idx_salt_returns_jid for per-row probes, and avoids NOT IN's NULL gotcha. Since jids is typically much smaller than salt_returns, the planner picks an antijoin via the jid index — at least as fast as the original seq scan under the old predicate.

What issues does this PR fix or reference?

Fixes #69060

Previous Behavior

A long-running job whose first minion answers at t0 and last minion answers at t0 + keep_jobs_seconds + ε loses its jids row at the first maintenance tick after t0 + keep_jobs_seconds, while the late return is still being inserted into salt_returns. Subsequent jobs.lookup_jid calls cannot reconstruct the load.

New Behavior

The parent jids row is kept while any recent salt_returns row for that jid exists. Once every return is older than the cutoff, the parent and the returns are purged together, and the archive captures both consistently.

Scope of this PR

Fixes orphan returns only. There is a related but separate slower leak: jobs published against an offline target leave a jids row with no salt_returns rows at all, which the maintenance loop never touches because the subquery cannot match on missing data. Closing that needs either a jids.created_at schema migration with backwards-compat detection in the code, or jid-format parsing in Python. Both warrant their own discussion and will be tracked as a follow-up.

Merge requirements satisfied?

  • Docs (no schema or user-facing config change; not required)
  • Changelog — changelog/69060.fixed.md
  • Tests written/updated — see tests/pytests/unit/returners/test_pgjsonb.py:
    • test__purge_jobs_keeps_jids_with_any_recent_salt_returns_row
    • test__archive_jobs_keeps_jids_with_any_recent_salt_returns_row

Commits signed with GPG?

No.

co-cy added 2 commits May 4, 2026 22:23
`salt.returners.pgjsonb` had eight error-handling sites in `_get_serv`,
`_purge_jobs` and `_archive_jobs` that wrote `psycopg2.DatabaseError`
messages directly to `sys.stderr` and re-raised:

    except psycopg2.DatabaseError as err:
        error = err.args
        sys.stderr.write(str(error))
        cursor.execute("ROLLBACK")
        raise err

On a daemonized master `sys.stderr` is captured by systemd's journal
(or worse, /dev/null), bypassing the configured `log_file` /
`log_level_logfile` and syslog. Operators that scrape Salt's log file
never see these errors, even though the rest of the file already uses
`log.error` (e.g. `clean_old_jobs`).

The local `error = err.args` assignment is dead -- only `str(err.args)`
is written, then the variable is unused. Also drop `import sys`,
which becomes unused after the migration.

Replace each call with `log.exception(...)` carrying a description of
the operation that failed, so the traceback is preserved in the log.
Behaviour (rollback, re-raise) is unchanged. The asymmetric
`except Exception` block on `_archive_jobs` jids-insert is updated to
the same pattern but kept in place (its scope is PR-7, not this PR).

Add three behavioural tests with `caplog`: one each for `_get_serv`,
`_purge_jobs` and `_archive_jobs` confirming the error reaches Salt's
logger, the transaction is rolled back, and the exception propagates.

Refs: saltstack#69048
`_purge_jobs` and `_archive_jobs` selected jids using:

    delete from jids where jid in (
        select distinct jid from salt_returns where alter_time < %s
    )

The subquery returned the jid as soon as ANY single salt_returns row
for that jid was older than the cutoff. For long-running jobs whose
minions answer at staggered times -- a slow orchestrate run, an SSH
target with a 2-hour command, a syndic batch -- the predicate fired
on the first old return and dropped the parent jids row, but the
recent salt_returns rows for the same jid stayed behind in the source
table. They were now orphans:

* `salt-run jobs.lookup_jid` could no longer reconstruct the load.
* The archive copy held parent rows without their recent returns.
* Any code that JOINs salt_returns to jids saw a broken parent edge.

Rewrite both predicates as an `EXISTS` / `NOT EXISTS` antijoin:

    delete from jids j where exists (
        select 1 from salt_returns r where r.jid = j.jid
    ) and not exists (
        select 1 from salt_returns r
        where r.jid = j.jid and r.alter_time >= %s
    )

A jids row is now removed only when every salt_returns row for that
jid is older than the cutoff, so partial-return jobs keep their
parent until they fully expire. Same change in `_archive_jobs` for
the jids_archive insert.

`EXISTS` is preferred over `IN (SELECT distinct ...)`: it short-
circuits on the first hit, uses `idx_salt_returns_jid` for per-row
probes, and avoids `NOT IN`'s NULL gotcha. Since `jids` is typically
much smaller than `salt_returns`, the planner picks an antijoin via
the jid index, which is at least as fast as the original seq scan
under the old predicate.

This PR fixes the data-integrity issue (orphan returns) only. It
does NOT address the slower leak of jids rows that never received
any salt_returns at all -- a job published against an offline
target leaves an immortal jids row -- because closing that needs
either a `jids.created_at` schema migration with backwards-compat
detection or jid-format parsing in Python. Both warrant their own
discussion; tracked as a separate follow-up.

Add two behavioural tests asserting the rewritten predicate is
issued (`NOT EXISTS` and `alter_time >= %s`, with a guard against
regressing to the old `alter_time < %s` form) for both `_purge_jobs`
and `_archive_jobs`.

Depends on saltstack#69049 (which routed pgjsonb errors through Salt's
logger and structured the catch blocks the new code re-uses); this
PR builds on that branch and should be merged after it.

Refs: saltstack#69060
@co-cy co-cy requested a review from a team as a code owner May 5, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant