perf: dedup redundant endReadableNT teardown ticks by jeswr · Pull Request #558 · nodejs/readable-stream

jeswr · 2026-07-04T00:53:11Z

Draft — agent-generated. This PR was opened by @jeswr's AI agent as part of an ecosystem streams-performance investigation. @jeswr will review, finalize, and tag the maintainers when it's ready. Please don't treat it as ready-for-review yet. Numbers and reproduction are below so they can be checked independently.

What

Deduplicate the redundant endReadableNT teardown nextTicks on the short-lived flowing path.

Ending a readable schedules endReadableNT via process.nextTick. On the flowing path endReadable() gets called several times after end — once from flow() and again from two resume_() sites — even though endReadableNT only ever emits 'end' once (it bails on endEmitted). The extra ticks are pure waste, and the teardown tick cascade is the dominant per-stream teardown cost for short-lived streams.

The fix adds a kEndScheduled state bit (exposed as endScheduled, following the existing makeBitMapDescriptor pattern):

endReadable() schedules the tick only when neither endEmitted nor endScheduled is set, and sets endScheduled when it does.
endReadableNT() clears endScheduled on entry, so if the tick bails out (e.g. a last-moment unshift raised state.length) a later endReadable() can schedule a fresh tick.

'end' still fires exactly once, on the same tick it otherwise would — no observable ordering change. finish → end → close ordering is untouched. This is the safe subset of "collapse the teardown ticks"; it deliberately does not attempt the semver-major fusing of finish → end → close.

Reproduction & measured numbers

Benchmarks are added under benchmark/ (not published — the package files field only ships lib, LICENSE, README.md). The only tracked source edit is lib/…/readable.js, so it stashes cleanly while the untracked benchmark/ dir stays put:

# tick count for one flowing PassThrough -> Transform unit (objectMode, 4 chunks)
git stash push -- lib/internal/streams/readable.js
node benchmark/endReadable-ticks.js   # BASE
git stash pop
node benchmark/endReadable-ticks.js   # PATCHED

	total `nextTick` hops / unit	`endReadableNT()` scheduled
base	15	5× (3× from `flow()`, 2× from `resume_()`)
patched	12	2×

CPU: a paired A/B benchmark (base and patched loaded side-by-side in one process, alternating rep-by-rep so shared-box drift cancels), N=8000 units × 4 chunks, measured a median ~4.5–7.7 % CPU reduction with the patched build faster in the large majority of paired reps on a 2-core machine shared with other load. The magnitude tracks the ~20 % reduction in teardown ticks. benchmark/endReadable-cpu.js is the single-version form for git stash before/after comparison; see benchmark/README.md.

Tests

Full suite stays green with the change:

npm test   # tap test/parallel/test-*.js test/ours/test-*.js
# => 189 test files, 0 failures

Note on the mirror / build model (for @jeswr + maintainers to decide)

lib/ here is a mechanical mirror of Node core (README: "a mirror of the streams implementations in Node.js 18.19.0"), regenerated by npm run build from the Node tarball + build/replacements.mjs. So a raw lib/ edit would be lost on the next re-sync. Two ways to persist it, for the maintainers to pick:

Land it in Node core first (preferred) and pick it up on the next sync. A companion core patch (same change against nodejs/node master's lib/internal/streams/readable.js, which uses the raw state[kState] bitfield idiom) is prepared and applies cleanly (git apply --check passes). @jeswr is holding it to submit to core directly.

Carry it as a build/replacements.mjs entry. I verified the following entry, applied to the real Node v18.19.0 source and run through this repo's own babel+prettier pipeline, reproduces the committed lib/ edit byte-for-byte:

'lib/internal/streams/readable.js': [
  // …existing entries…
  ['const kDataEmitted = 1 << 18;',
   "const kDataEmitted = 1 << 18;\n// Set while an endReadableNT() nextTick is already pending, so repeated\n// read()/resume() calls after end don't schedule duplicate ticks.\nconst kEndScheduled = 1 << 19;"],
  ['  dataEmitted: makeBitMapDescriptor\\(kDataEmitted\\),',
   '  dataEmitted: makeBitMapDescriptor(kDataEmitted),\n  // If true, an endReadableNT tick is already pending.\n  endScheduled: makeBitMapDescriptor(kEndScheduled),'],
  ["  debug\\('endReadable', state.endEmitted\\);\n  if \\(!state.endEmitted\\) \\{\n    state.ended = true;",
   "  debug('endReadable', state.endEmitted);\n  // On the flowing short-lived path read()/resume() can call endReadable()\n  // several times after end; endReadableNT only ever emits 'end' once, so\n  // skip scheduling a duplicate tick when one is already pending.\n  if (!state.endEmitted && !state.endScheduled) {\n    state.ended = true;\n    state.endScheduled = true;"],
  ["  debug\\('endReadableNT', state.endEmitted, state.length\\);\n",
   "  debug('endReadableNT', state.endEmitted, state.length);\n\n  // Clear the scheduled flag now that the tick is running, so that if this\n  // tick bails out below (e.g. a last-moment unshift raised state.length),\n  // a later endReadable() can schedule a fresh tick.\n  state.endScheduled = false;\n"],
],

The commit here edits lib/ directly so the change can be reviewed, tested, and benchmarked in userland now; @jeswr will coordinate the persistence path with the maintainers.

Scope note

A second candidate from the same investigation — hoisting the per-chunk arrow closure in Transform.prototype._write — was evaluated and intentionally left out. It is not present in current Node core either, was never prototyped/measured in this investigation, is described in the analysis as noise-level, and touches a re-entrant hot path (the existing this[kCallback] slot). It would need its own measured net-win benchmark before it's worth proposing, so it's kept out of this focused, measured change.

🤖 Generated with Claude Code

On the short-lived flowing path, ending a readable can call endReadable() several times after end (from flow() and two resume_() sites). Each call schedules an endReadableNT() nextTick, but endReadableNT only ever emits 'end' once (it bails on kEndEmitted), so the extra ticks are pure waste, and the teardown tick cascade is the dominant per-stream teardown cost. Add a kEndScheduled state bit: endReadable() only schedules when neither kEndEmitted nor kEndScheduled is set; endReadableNT() clears it on entry so a later endReadable() can still reschedule if this tick bails out (e.g. a last-moment unshift raised state.length). 'end' still fires exactly once, on the same tick it otherwise would - no observable ordering change. For one flowing PassThrough -> Transform unit (objectMode, 4 chunks): 15 -> 12 nextTick hops/unit (endReadableNT scheduled 5x -> 2x). A paired A/B CPU benchmark (base vs patched in one process) shows a median ~4.5-7.7% CPU reduction on this shape. Benchmarks added under benchmark/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: dedup redundant endReadableNT teardown ticks#558

perf: dedup redundant endReadableNT teardown ticks#558
jeswr wants to merge 1 commit into
nodejs:mainfrom
jeswr:perf/endreadable-tick-dedup

jeswr commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

jeswr commented Jul 4, 2026

What

Reproduction & measured numbers

Tests

Note on the mirror / build model (for @jeswr + maintainers to decide)

Scope note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant