Skip to content

perf: dedup redundant endReadableNT teardown ticks#558

Draft
jeswr wants to merge 1 commit into
nodejs:mainfrom
jeswr:perf/endreadable-tick-dedup
Draft

perf: dedup redundant endReadableNT teardown ticks#558
jeswr wants to merge 1 commit into
nodejs:mainfrom
jeswr:perf/endreadable-tick-dedup

Conversation

@jeswr

@jeswr jeswr commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Draft — agent-generated. This PR was opened by @jeswr's AI agent as part of an ecosystem streams-performance investigation. @jeswr will review, finalize, and tag the maintainers when it's ready. Please don't treat it as ready-for-review yet. Numbers and reproduction are below so they can be checked independently.

What

Deduplicate the redundant endReadableNT teardown nextTicks on the short-lived flowing path.

Ending a readable schedules endReadableNT via process.nextTick. On the flowing path endReadable() gets called several times after end — once from flow() and again from two resume_() sites — even though endReadableNT only ever emits 'end' once (it bails on endEmitted). The extra ticks are pure waste, and the teardown tick cascade is the dominant per-stream teardown cost for short-lived streams.

The fix adds a kEndScheduled state bit (exposed as endScheduled, following the existing makeBitMapDescriptor pattern):

  • endReadable() schedules the tick only when neither endEmitted nor endScheduled is set, and sets endScheduled when it does.
  • endReadableNT() clears endScheduled on entry, so if the tick bails out (e.g. a last-moment unshift raised state.length) a later endReadable() can schedule a fresh tick.

'end' still fires exactly once, on the same tick it otherwise would — no observable ordering change. finish → end → close ordering is untouched. This is the safe subset of "collapse the teardown ticks"; it deliberately does not attempt the semver-major fusing of finish → end → close.

Reproduction & measured numbers

Benchmarks are added under benchmark/ (not published — the package files field only ships lib, LICENSE, README.md). The only tracked source edit is lib/…/readable.js, so it stashes cleanly while the untracked benchmark/ dir stays put:

# tick count for one flowing PassThrough -> Transform unit (objectMode, 4 chunks)
git stash push -- lib/internal/streams/readable.js
node benchmark/endReadable-ticks.js   # BASE
git stash pop
node benchmark/endReadable-ticks.js   # PATCHED
total nextTick hops / unit endReadableNT() scheduled
base 15 (3× from flow(), 2× from resume_())
patched 12

CPU: a paired A/B benchmark (base and patched loaded side-by-side in one process, alternating rep-by-rep so shared-box drift cancels), N=8000 units × 4 chunks, measured a median ~4.5–7.7 % CPU reduction with the patched build faster in the large majority of paired reps on a 2-core machine shared with other load. The magnitude tracks the ~20 % reduction in teardown ticks. benchmark/endReadable-cpu.js is the single-version form for git stash before/after comparison; see benchmark/README.md.

Tests

Full suite stays green with the change:

npm test   # tap test/parallel/test-*.js test/ours/test-*.js
# => 189 test files, 0 failures

Note on the mirror / build model (for @jeswr + maintainers to decide)

lib/ here is a mechanical mirror of Node core (README: "a mirror of the streams implementations in Node.js 18.19.0"), regenerated by npm run build from the Node tarball + build/replacements.mjs. So a raw lib/ edit would be lost on the next re-sync. Two ways to persist it, for the maintainers to pick:

  1. Land it in Node core first (preferred) and pick it up on the next sync. A companion core patch (same change against nodejs/node master's lib/internal/streams/readable.js, which uses the raw state[kState] bitfield idiom) is prepared and applies cleanly (git apply --check passes). @jeswr is holding it to submit to core directly.

  2. Carry it as a build/replacements.mjs entry. I verified the following entry, applied to the real Node v18.19.0 source and run through this repo's own babel+prettier pipeline, reproduces the committed lib/ edit byte-for-byte:

    'lib/internal/streams/readable.js': [
      // …existing entries…
      ['const kDataEmitted = 1 << 18;',
       "const kDataEmitted = 1 << 18;\n// Set while an endReadableNT() nextTick is already pending, so repeated\n// read()/resume() calls after end don't schedule duplicate ticks.\nconst kEndScheduled = 1 << 19;"],
      ['  dataEmitted: makeBitMapDescriptor\\(kDataEmitted\\),',
       '  dataEmitted: makeBitMapDescriptor(kDataEmitted),\n  // If true, an endReadableNT tick is already pending.\n  endScheduled: makeBitMapDescriptor(kEndScheduled),'],
      ["  debug\\('endReadable', state.endEmitted\\);\n  if \\(!state.endEmitted\\) \\{\n    state.ended = true;",
       "  debug('endReadable', state.endEmitted);\n  // On the flowing short-lived path read()/resume() can call endReadable()\n  // several times after end; endReadableNT only ever emits 'end' once, so\n  // skip scheduling a duplicate tick when one is already pending.\n  if (!state.endEmitted && !state.endScheduled) {\n    state.ended = true;\n    state.endScheduled = true;"],
      ["  debug\\('endReadableNT', state.endEmitted, state.length\\);\n",
       "  debug('endReadableNT', state.endEmitted, state.length);\n\n  // Clear the scheduled flag now that the tick is running, so that if this\n  // tick bails out below (e.g. a last-moment unshift raised state.length),\n  // a later endReadable() can schedule a fresh tick.\n  state.endScheduled = false;\n"],
    ],

The commit here edits lib/ directly so the change can be reviewed, tested, and benchmarked in userland now; @jeswr will coordinate the persistence path with the maintainers.

Scope note

A second candidate from the same investigation — hoisting the per-chunk arrow closure in Transform.prototype._write — was evaluated and intentionally left out. It is not present in current Node core either, was never prototyped/measured in this investigation, is described in the analysis as noise-level, and touches a re-entrant hot path (the existing this[kCallback] slot). It would need its own measured net-win benchmark before it's worth proposing, so it's kept out of this focused, measured change.


🤖 Generated with Claude Code

On the short-lived flowing path, ending a readable can call endReadable()
several times after end (from flow() and two resume_() sites). Each call
schedules an endReadableNT() nextTick, but endReadableNT only ever emits
'end' once (it bails on kEndEmitted), so the extra ticks are pure waste,
and the teardown tick cascade is the dominant per-stream teardown cost.

Add a kEndScheduled state bit: endReadable() only schedules when neither
kEndEmitted nor kEndScheduled is set; endReadableNT() clears it on entry so
a later endReadable() can still reschedule if this tick bails out (e.g. a
last-moment unshift raised state.length). 'end' still fires exactly once,
on the same tick it otherwise would - no observable ordering change.

For one flowing PassThrough -> Transform unit (objectMode, 4 chunks):
15 -> 12 nextTick hops/unit (endReadableNT scheduled 5x -> 2x). A paired
A/B CPU benchmark (base vs patched in one process) shows a median ~4.5-7.7%
CPU reduction on this shape. Benchmarks added under benchmark/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant