Skip to content

Fix MULTI_PROCESS parallel mode (QueueRead peer disconnect, lock groups)#183

Open
vlad-lesin wants to merge 2 commits into
ossc-db:masterfrom
vlad-lesin:lock_groups
Open

Fix MULTI_PROCESS parallel mode (QueueRead peer disconnect, lock groups)#183
vlad-lesin wants to merge 2 commits into
ossc-db:masterfrom
vlad-lesin:lock_groups

Conversation

@vlad-lesin

Copy link
Copy Markdown

Summary

Improves MULTI_PROCESS / WRITER=PARALLEL reliability when the reader and writer backends exit out of order, and removes the schema-change window between reader and writer on PostgreSQL 9.6+.

  • 6081b40 — Detect when the reader has detached from the shared queue without sending EOF; stop the writer from blocking forever in QueueRead() (Unix).
  • 767c8dc — Use PostgreSQL lock groups so the reader keeps AccessShareLock while the writer takes AccessExclusiveLock (9.6+); document parallel locking and related upstream ProcKill() fixes.

Commits

1. Detect peer disconnect in QueueRead (6081b40)

Problem: On reader failure before a successful load (e.g. bad INFILE in load_function with MULTI_PROCESS=YES), cleanup calls ParallelWriterClose(onError=true): PQcancel() + PQfinish() but no queue EOF. The writer may not honor cancel (FE/BE timing around DoingCommandRead) and then sits in QueueRead() with AccessExclusiveLock — orphaned backend, blocked table.

Fix: After each wait in QueueRead(), on Unix call shmctl(IPC_STAT); if shm_nattch <= 1, the reader has left the segment without EOF → raise an error and exit the writer.

Scope: lib/pgut/pgut-ipc.c only. No Windows equivalent yet (#ifndef WIN32).

2. Use lock groups for MULTI_PROCESS reader and writer (767c8dc)

Problem: Reader released AccessShareLock before the writer acquired AccessExclusiveLock, leaving a window where DDL could change the table and reader/writer could disagree on the definition.

Fix (PG 9.6+, #if PG_VERSION_NUM >= 90600):

Backend Role
Reader BecomeLockGroupLeader(), AccessShareLock, publish leader in QueueHeader
Writer BecomeLockGroupMember() before first table lock, then AccessExclusiveLock

Older PG (< 9.6): Previous UnlockRelation(rel, AccessShareLock) path unchanged.

Also:

  • PG_CONFIG ?= pg_config in all PGXS Makefiles (honor PG_CONFIG from the environment when building against multiple PostgreSQL versions).
  • Docs updated (EN/JA): parallel locking, version-specific schema guidance.

Upstream note: Same lock-group machinery as PostgreSQL ProcKill lock-group fixes:

  1. Fix race conditions in ProcKill()'s lock-group freelist handling (double push / follower leak)
  2. Fix procLatch ownership race in ProcKill() ("latch already owned by PID ...")

Affected since 9.6; fixes backpatched through 14 on REL_14_STABLEREL_18_STABLE (not in 14.23 yet; expected in the next 14.x minor). pg_bulkload MULTI_PROCESS on 9.6+ is exposed to those races until the server includes both commits — lock groups here do not replace them.

MULTI_PROCESS=YES uses two backends: the main backend (e.g.
TYPE=FUNCTION) reads the load source and pushes heap tuples into a
shared queue; a parallel writer backend (TYPE=TUPLE, WRITER=DIRECT
over libpq) pops them from the queue and writes to storage. On success
the main backend sends a zero-length queue message via
write_queue(NULL, 0) in ParallelWriterClose() when onError is false.

If the reader backend fails first (e.g. ParserInit() on invalid INFILE
in the load_function test), cleanup runs before any queue EOF is sent:

  pg_bulkload()
    → WriterInit() before ParserInit() in PG_TRY
    → ParserInit() fails
    → PG_CATCH → WriterClose(wt, true)
      → ParallelWriterClose(onError=true)
          → PQgetCancel() / PQcancel() if PQisBusy
          → PQfinish()   (no write_queue(NULL, 0))

The postmaster still delivers cancel (PQcancel →
ProcessCancelRequestPacket() → SendCancelRequest() → SIGINT →
StatementCancelHandler() sets QueryCancelPending)
[PostgreSQL: backend_startup.c, procsignal.c,
postgres.c].

Depending on timing, ProcessInterrupts() in the writer may see
DoingCommandRead=true while PostgresMain() is between extended-protocol
messages (ReadCommand() after Bind, before Execute) and clear
QueryCancelPending without ERROR: canceling statement due to user
request [postgres.c: ProcessInterrupts(), main-loop comment at
ReadCommand].

Then the writer continues pg_bulkload(), may hold AccessExclusiveLock
[DirectWriterInit() / table_open in writer_direct.c], and blocks in
QueueRead() waiting for tuples the main backend will never send
[TupleParserRead() in parser_tuple.c].

Symptom: orphaned writer backend (TYPE=TUPLE in pg_stat_activity), lock
on the target relation, following pg_bulkload or installcheck appearing
to hang.

After each sleep in QueueRead(), call shmctl(IPC_STAT) on the queue
segment (Unix). If shm_nattch <= 1, only this backend remains attached:
the main backend has detached without sending EOF. Raise an error so
the writer backend exits and releases its locks instead of waiting
forever.

No equivalent check exists on Windows (#ifndef WIN32); a separate
mechanism would be needed there.
On PostgreSQL 9.6+, keep the reader's AccessShareLock and let the
writer join the same lock group before taking AccessExclusiveLock,
instead of releasing the reader lock before starting the writer.
PG < 9.6 keeps the old UnlockRelation path (#if PG_VERSION_NUM >=
90600).

Reader: BecomeLockGroupLeader(), publish leader PGPROC/PID in the
shared queue header, keep AccessShareLock while the writer runs.

Writer: QueueOpen, BecomeLockGroupMember() before the first table
lock, then direct write with AccessExclusiveLock.

Related PostgreSQL core fixes (lock-group ProcKill bugs):

  https://www.postgresql.org/message-id/flat/d2983796-2603-41b7-a66e-fc8489ddb954%40gmail.com

  [PATCH] Fix ProcKill lock-group vs procLatch recycle race

  All PostgreSQL versions since 9.6 are affected (lock groups were
  added in 9.6).  Upstream addresses the problem in two commits on
  REL_14_STABLE through REL_18_STABLE (not in released minors yet as
  of 14.23; expected in the next 14.x minor, e.g. 14.24).  Not
  backpatched to 13 or older:

  1) Fix race conditions in ProcKill()'s lock-group freelist handling
     Refactor lock-group teardown so freelist updates are coordinated
     under leader_lwlock and a single freeProcsLock pass.  Fixes a
     double push of the leader's PGPROC onto the freelist and a leak
     of the last follower's slot when leader and member exit
     concurrently.

  2) Fix procLatch ownership race in ProcKill()
     Call SwitchBackToLocalLatch() and DisownLatch() before any
     PGPROC can return to the freelist.  Fixes "latch already owned
     by PID ..." when a recycled slot still has an owned procLatch
     (e.g. follower pushes the leader's PGPROC before the leader
     reaches DisownLatch).

  This pg_bulkload change is affected by the same mechanism:
  MULTI_PROCESS on PG 9.6+ forms a lock group between the reader and
  writer backends, so both ProcKill() issues can theoretically surface
  when those backends shut down together (error/cancel paths).
  Using lock groups here does not replace those server-side fixes;
  run PostgreSQL builds that include both commits (or wait for the
  corresponding 14+ minor releases).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant