feat: Single-pass WAL streaming for LOG_BASED replication by bdewilde · Pull Request #772 · MeltanoLabs/tap-postgres

bdewilde · 2026-04-27T20:05:10Z

problem

PostgresLogBasedStream.get_records() opens its own LogicalReplicationConnection per selected stream. With N LOG_BASED streams the tap runs N sequential WAL scans -- each rereads the same segments, with add-tables discarding most records server-side. End-to-end sync time scales ~linearly in N. For pipelines with multiple LOG_BASED streams against a large backlog, this dominates run-time.

changes

A new SingleConnectionWALReader opens one logical replication connection with add-tables covering all selected LOG_BASED tables, scans the WAL once, and dispatches each parsed wal2json message inline to the owning stream's new emit_record() method for immediate Singer RECORD emission. STATE flushes every 30s, and the slot is advanced to the WAL tip on idle/max-run exit.

new modules: _wal_helpers.py (FQN/escaping/parsing helpers) and wal_reader.py (the reader and read loop).
client.py gains emit_record method and a config-flag branch in get_records
tap.py adds the log_based_single_connection config setting (default False!) and _sync_log_based_streams_shared orchestration
new tests: tests/test_wal_helpers.py, tests/test_wal_reader.py, tests/test_consume.py

Full disclosure, I had Claude Code implement those three test modules, and then I iterated a bit. If it's still excessive / not testing usefully -- something Claude is known to do, sigh -- just let me know, and I will take a hatchet to it.

This is a very belated follow-up to PR #667 and Issue #587.

constraints

Tap.sync_all is @typing.final, so dispatch can't be restructured at the SDK boundary -- this was a bummer. My next best option was trigger at the first LOG_BASED stream's get_records() call, gated by a _shared_wal_run_completed flag on the tap so siblings become no-ops.
SCHEMA-before-RECORD across streams: _sync_log_based_streams_shared pre-writes every stream's schema before the reader runs. Since the SDK's Stream.sync() later calls _write_schema_message() again, the override on PostgresLogBasedStream is idempotent. (Without that flag every SCHEMA would be emitted twice, which is not great.)
Per-stream LSN filter: Replication opens at min(start_lsn) across all streams, so each stream's own bookmark is captured at construction and used to drop messages that it's already past.
I had to dip into private SDK calls -- _write_record_message() and _increment_stream_state() -- for this to work. I consolidated them in one place -- emit_record() -- so SDK renames hit one method. Not sure how stable the API is here...

questions

_write_schema_message idempotency: I made the smallest fix I could for the duplicate-SCHEMA bug given the @final constraint on sync_all. Is there another / cleaner approach?
emit_record() uses internal SDK calls. Is this going to be an issue? Is there a safer / "public" equivalent?
get_records() as trigger: The "first stream's get_records fires the shared reader" pattern is a not-great workaround for sync_all being final. Is it okay as documented, or is there a cleaner SDK hook?
replication_max_run_seconds / replication_idle_exit_seconds now bound the whole LOG_BASED batch instead of each stream. To me that feels like an improvement, but I don't know the whole system / downstream use caess. For example, does anything downstream assume per-stream bounds?

edgarrmondragon · 2026-05-04T14:06:02Z

Thanks @bdewilde!

There are some typing errors. I might take a longer look later in the week.

bdewilde · 2026-05-04T14:08:49Z

Thanks @bdewilde!

There are some typing errors. I might take a longer look later in the week.

Hi Edgar, thanks in advance for giving it a longer look! My bad for the typing errors -- if you can point me at them (without having to do a whole review ;), I can try to fix them sometime this week. And apologies for the test failures. I wasn't ever able to get the full unit test suite running green in my local dev, but all of the new tests I added did pass for me.

Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>

bdewilde · 2026-05-28T16:31:13Z

Hi again @edgarrmondragon 🙂 Just swinging by to check on this. Is there anything I can do to nudge this forward? I'm happy to make changes / go back to the drawing board, I just need some direction from you, since I'm not deeply familiar with the SDK.

edgarrmondragon

Thanks @bdewilde and sorry for the delay!

I started reviewing and managed to take a look at 2 of the files. I'll continue tomorrow, but in the meantime, some questions, nits and suggestions.

edgarrmondragon · 2026-05-30T15:55:17Z

+}
+
+
+# TODO: should this be a shared fixture/function in conftest?


Yeah, we could have an instance of this as a fixture if the dummy config is also the same.

…/tap-postgres into single-pass-wal-streaming

edgarrmondragon

Thanks @bdewilde, LGTM!

bdewilde · 2026-06-01T19:58:04Z

@edgarrmondragon Revisiting this with fresh eyes today, a couple notable questions came to mind:

What happens when a user adds a new stream to an existing set with log-based replication? In SingleConnectionWALReader.run(), we compute the "global" start LSN across all selected streams -- does a new stream cause the global min to be the very first WAL entry, which would force an iteration over the full WAL? Per-stream bookmarks ensure the replicated data is correct, but a full iteration could be quite slow. Not great! In meltano, iirc a new stream with log-based replication does effectively a full-table replication for its first run only, but I couldn't find any logic to confirm that or dig into it more deeply. Probably you know this off the top of your head... 🙏
When exiting out of the run loop, we call SingleConnectionWALReader._advance_slot_and_state_all(), which sets every stream's bookmark to the current WAL tip. I think this is a problem if we exit due to max_run_seconds before the WAL's backlog has been fully read, which would discard the not-yet-emitted messages and then they'd be silently skipped over on the next run. Right? iirc I was imitating a pattern from the original per-stream logic, so maybe this issue already exists, it's just that the blast radius is smaller (per-stream rather than all-streams). Am I understanding this correctly? If so, could we maybe just pass max_lsn_seen out of _run_loop(), and then send that into _advance_slot_and_state_all() as an arg?

feat: single-pass WAL streaming for LOG_BASED replication (MeltanoLabs#772)

sicarul · 2026-06-05T02:35:21Z

I understand your main motivation with this PR is speed, however in some scenarios this should also improve accuracy, i've stumbled upon a scenario in which the current implementation will advance the WAL because of table A, and then the table B already lost the reference because the replication slot advanced past the updates we needed for table B. I'll test the PR out

bdewilde · 2026-06-12T18:18:55Z

Hi @sicarul , just following up! Did this set of changes work for you, from both a perf and accuracy perspective? :)

@edgarrmondragon , to address my second question above, I just pushed changes to prevent (afaict) a silent data-loss risk in how the single-pass WAL reader advances replication state on exit. Previously, both idle and timeout exit paths advanced every stream's bookmark and the slot to the current WAL tip. That's correct and desirable behavior on an idle exit -- the backlog has been drained! -- but on a max_run_seconds timeout with messages still unread, it skips past records between the max LSN actually dispatched and the tip — silently losing those records. Does that make sense, or have I seriously misunderstood this behavior? (Sorry, I'm not an expert on this...) The fix has _run_loop() return (max_lsn_seen, caught_up), and run() picks the safe target to advance to: the tip when "caught up", otherwise max_lsn_seen. This preserves WAL-retention safety while never bookmarking past unread changes. Let me know if you have thoughts on this!

sicarul · 2026-06-12T18:19:56Z

Yes it's working fine for us so far

edgarrmondragon · 2026-06-12T22:04:45Z

Thanks @bdewilde and @sicarul!

I'll take another look at this next week and try to answer your question about losing records when the run timeout is reached (I'm no expert in this niche either 😅)

bdewilde added 17 commits April 24, 2026 15:36

refactor: prep restructure for single-pass wal

edbcb8c

refactor: define wal consts at module level

bd4ad1c

add client adapter to isolate internal sdk calls

1b31e96

add helper funcs for wal replication

b5a21e3

test: add unit tests for wal helper funcs

692a9b7

add single-conn wal reader class

cb06454

add cfg setting, integrate wal reader into tap

204da7e

fix: pivot to new override to avoid typing.final

9f6a3e5

test: add wal reader test suite

04d0f70

refactor: move some wal logic into helper funcs

3357409

fix: don't emit duplicate schema messages

a9243bf

fix: minor issues w/ wal reader logic

e454b64

improve observability of wal reader

06b23de

test: add more wal reader tests

79eb68f

test: add cases for parsing helpers and consume

16897e1

fix: don't enable new wal reader by default

cb6245c

docs: slim down some of the new docstrings

a33d99a

bdewilde requested a review from edgarrmondragon as a code owner April 27, 2026 20:05

Merge branch 'main' into single-pass-wal-streaming

359d13d

edgarrmondragon changed the title ~~Single-pass WAL streaming for LOG_BASED replication?~~ feat: Single-pass WAL streaming for LOG_BASED replication May 4, 2026

edgarrmondragon self-assigned this May 4, 2026

edgarrmondragon added the enhancement New feature or request label May 4, 2026

edgarrmondragon and others added 3 commits May 12, 2026 11:13

Merge branch 'main' into single-pass-wal-streaming

9efd0e2

Fix lint issues

88af98a

Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>

Merge branch 'main' into single-pass-wal-streaming

57e4486

Merge branch 'main' into single-pass-wal-streaming

e952138

edgarrmondragon requested changes May 29, 2026

View reviewed changes

edgarrmondragon assigned bdewilde May 29, 2026

edgarrmondragon reviewed May 30, 2026

View reviewed changes

bdewilde added 6 commits June 1, 2026 12:46

tweak log/error severity in non-ideal cases

ec23100

fix: use time.monotonic for perf timing

9c49de3

refactor: use suppress ctx mgr not try/except

bc7e836

test: check deleted_at str format

0b4c55a

refactor: consolidate+share wal lsn query logic

a0c3c58

refactor: avoid extra loop over streams

0ae649e

bdewilde requested a review from edgarrmondragon June 1, 2026 18:43

edgarrmondragon and others added 3 commits June 1, 2026 13:02

Merge branch 'main' into single-pass-wal-streaming

c80cdcd

fix: update test for runtimeerror -> log error

1e83b91

Merge branch 'single-pass-wal-streaming' of ssh://github.com/bdewilde…

73ed5a5

…/tap-postgres into single-pass-wal-streaming

edgarrmondragon approved these changes Jun 1, 2026

View reviewed changes

sicarul mentioned this pull request Jun 5, 2026

feat: single-pass WAL streaming for LOG_BASED replication (MeltanoLabs#772) pulumi/tap-postgres#1

Merged

sicarul added a commit to pulumi/tap-postgres that referenced this pull request Jun 5, 2026

Merge pull request #1 from pulumi/pulumi-single-pass-wal-772

5c603a8

feat: single-pass WAL streaming for LOG_BASED replication (MeltanoLabs#772)

use max lsn seen on timeout exit so no data loss

5f89be2

Merge branch 'main' into single-pass-wal-streaming

95725a4

		}


		# TODO: should this be a shared fixture/function in conftest?

Conversation

bdewilde commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

problem

changes

constraints

questions

Uh oh!

edgarrmondragon commented May 4, 2026

Uh oh!

bdewilde commented May 4, 2026

Uh oh!

bdewilde commented May 28, 2026

Uh oh!

edgarrmondragon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edgarrmondragon May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

edgarrmondragon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdewilde commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sicarul commented Jun 5, 2026

Uh oh!

bdewilde commented Jun 12, 2026

Uh oh!

sicarul commented Jun 12, 2026

Uh oh!

edgarrmondragon commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bdewilde commented Apr 27, 2026 •

edited

Loading

edgarrmondragon left a comment •

edited

Loading

bdewilde commented Jun 1, 2026 •

edited

Loading