Fix MULTI_PROCESS parallel mode (QueueRead peer disconnect, lock groups)#183
Open
vlad-lesin wants to merge 2 commits into
Open
Fix MULTI_PROCESS parallel mode (QueueRead peer disconnect, lock groups)#183vlad-lesin wants to merge 2 commits into
vlad-lesin wants to merge 2 commits into
Conversation
MULTI_PROCESS=YES uses two backends: the main backend (e.g.
TYPE=FUNCTION) reads the load source and pushes heap tuples into a
shared queue; a parallel writer backend (TYPE=TUPLE, WRITER=DIRECT
over libpq) pops them from the queue and writes to storage. On success
the main backend sends a zero-length queue message via
write_queue(NULL, 0) in ParallelWriterClose() when onError is false.
If the reader backend fails first (e.g. ParserInit() on invalid INFILE
in the load_function test), cleanup runs before any queue EOF is sent:
pg_bulkload()
→ WriterInit() before ParserInit() in PG_TRY
→ ParserInit() fails
→ PG_CATCH → WriterClose(wt, true)
→ ParallelWriterClose(onError=true)
→ PQgetCancel() / PQcancel() if PQisBusy
→ PQfinish() (no write_queue(NULL, 0))
The postmaster still delivers cancel (PQcancel →
ProcessCancelRequestPacket() → SendCancelRequest() → SIGINT →
StatementCancelHandler() sets QueryCancelPending)
[PostgreSQL: backend_startup.c, procsignal.c,
postgres.c].
Depending on timing, ProcessInterrupts() in the writer may see
DoingCommandRead=true while PostgresMain() is between extended-protocol
messages (ReadCommand() after Bind, before Execute) and clear
QueryCancelPending without ERROR: canceling statement due to user
request [postgres.c: ProcessInterrupts(), main-loop comment at
ReadCommand].
Then the writer continues pg_bulkload(), may hold AccessExclusiveLock
[DirectWriterInit() / table_open in writer_direct.c], and blocks in
QueueRead() waiting for tuples the main backend will never send
[TupleParserRead() in parser_tuple.c].
Symptom: orphaned writer backend (TYPE=TUPLE in pg_stat_activity), lock
on the target relation, following pg_bulkload or installcheck appearing
to hang.
After each sleep in QueueRead(), call shmctl(IPC_STAT) on the queue
segment (Unix). If shm_nattch <= 1, only this backend remains attached:
the main backend has detached without sending EOF. Raise an error so
the writer backend exits and releases its locks instead of waiting
forever.
No equivalent check exists on Windows (#ifndef WIN32); a separate
mechanism would be needed there.
On PostgreSQL 9.6+, keep the reader's AccessShareLock and let the writer join the same lock group before taking AccessExclusiveLock, instead of releasing the reader lock before starting the writer. PG < 9.6 keeps the old UnlockRelation path (#if PG_VERSION_NUM >= 90600). Reader: BecomeLockGroupLeader(), publish leader PGPROC/PID in the shared queue header, keep AccessShareLock while the writer runs. Writer: QueueOpen, BecomeLockGroupMember() before the first table lock, then direct write with AccessExclusiveLock. Related PostgreSQL core fixes (lock-group ProcKill bugs): https://www.postgresql.org/message-id/flat/d2983796-2603-41b7-a66e-fc8489ddb954%40gmail.com [PATCH] Fix ProcKill lock-group vs procLatch recycle race All PostgreSQL versions since 9.6 are affected (lock groups were added in 9.6). Upstream addresses the problem in two commits on REL_14_STABLE through REL_18_STABLE (not in released minors yet as of 14.23; expected in the next 14.x minor, e.g. 14.24). Not backpatched to 13 or older: 1) Fix race conditions in ProcKill()'s lock-group freelist handling Refactor lock-group teardown so freelist updates are coordinated under leader_lwlock and a single freeProcsLock pass. Fixes a double push of the leader's PGPROC onto the freelist and a leak of the last follower's slot when leader and member exit concurrently. 2) Fix procLatch ownership race in ProcKill() Call SwitchBackToLocalLatch() and DisownLatch() before any PGPROC can return to the freelist. Fixes "latch already owned by PID ..." when a recycled slot still has an owned procLatch (e.g. follower pushes the leader's PGPROC before the leader reaches DisownLatch). This pg_bulkload change is affected by the same mechanism: MULTI_PROCESS on PG 9.6+ forms a lock group between the reader and writer backends, so both ProcKill() issues can theoretically surface when those backends shut down together (error/cancel paths). Using lock groups here does not replace those server-side fixes; run PostgreSQL builds that include both commits (or wait for the corresponding 14+ minor releases).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improves MULTI_PROCESS / WRITER=PARALLEL reliability when the reader and writer backends exit out of order, and removes the schema-change window between reader and writer on PostgreSQL 9.6+.
6081b40— Detect when the reader has detached from the shared queue without sending EOF; stop the writer from blocking forever inQueueRead()(Unix).767c8dc— Use PostgreSQL lock groups so the reader keepsAccessShareLockwhile the writer takesAccessExclusiveLock(9.6+); document parallel locking and related upstreamProcKill()fixes.Commits
1. Detect peer disconnect in QueueRead (
6081b40)Problem: On reader failure before a successful load (e.g. bad
INFILEinload_functionwithMULTI_PROCESS=YES), cleanup callsParallelWriterClose(onError=true):PQcancel()+PQfinish()but no queue EOF. The writer may not honor cancel (FE/BE timing aroundDoingCommandRead) and then sits inQueueRead()withAccessExclusiveLock— orphaned backend, blocked table.Fix: After each wait in
QueueRead(), on Unix callshmctl(IPC_STAT); ifshm_nattch <= 1, the reader has left the segment without EOF → raise an error and exit the writer.Scope:
lib/pgut/pgut-ipc.conly. No Windows equivalent yet (#ifndef WIN32).2. Use lock groups for MULTI_PROCESS reader and writer (
767c8dc)Problem: Reader released
AccessShareLockbefore the writer acquiredAccessExclusiveLock, leaving a window where DDL could change the table and reader/writer could disagree on the definition.Fix (PG 9.6+,
#if PG_VERSION_NUM >= 90600):BecomeLockGroupLeader(),AccessShareLock, publish leader inQueueHeaderBecomeLockGroupMember()before first table lock, thenAccessExclusiveLockOlder PG (< 9.6): Previous
UnlockRelation(rel, AccessShareLock)path unchanged.Also:
PG_CONFIG ?= pg_configin all PGXS Makefiles (honorPG_CONFIGfrom the environment when building against multiple PostgreSQL versions).Upstream note: Same lock-group machinery as PostgreSQL ProcKill lock-group fixes:
Affected since 9.6; fixes backpatched through 14 on
REL_14_STABLE–REL_18_STABLE(not in 14.23 yet; expected in the next 14.x minor). pg_bulkloadMULTI_PROCESSon 9.6+ is exposed to those races until the server includes both commits — lock groups here do not replace them.