perf: dedup redundant endReadableNT teardown ticks#558
Draft
jeswr wants to merge 1 commit into
Draft
Conversation
On the short-lived flowing path, ending a readable can call endReadable() several times after end (from flow() and two resume_() sites). Each call schedules an endReadableNT() nextTick, but endReadableNT only ever emits 'end' once (it bails on kEndEmitted), so the extra ticks are pure waste, and the teardown tick cascade is the dominant per-stream teardown cost. Add a kEndScheduled state bit: endReadable() only schedules when neither kEndEmitted nor kEndScheduled is set; endReadableNT() clears it on entry so a later endReadable() can still reschedule if this tick bails out (e.g. a last-moment unshift raised state.length). 'end' still fires exactly once, on the same tick it otherwise would - no observable ordering change. For one flowing PassThrough -> Transform unit (objectMode, 4 chunks): 15 -> 12 nextTick hops/unit (endReadableNT scheduled 5x -> 2x). A paired A/B CPU benchmark (base vs patched in one process) shows a median ~4.5-7.7% CPU reduction on this shape. Benchmarks added under benchmark/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Deduplicate the redundant
endReadableNTteardownnextTicks on the short-lived flowing path.Ending a readable schedules
endReadableNTviaprocess.nextTick. On the flowing pathendReadable()gets called several times after end — once fromflow()and again from tworesume_()sites — even thoughendReadableNTonly ever emits'end'once (it bails onendEmitted). The extra ticks are pure waste, and the teardown tick cascade is the dominant per-stream teardown cost for short-lived streams.The fix adds a
kEndScheduledstate bit (exposed asendScheduled, following the existingmakeBitMapDescriptorpattern):endReadable()schedules the tick only when neitherendEmittednorendScheduledis set, and setsendScheduledwhen it does.endReadableNT()clearsendScheduledon entry, so if the tick bails out (e.g. a last-momentunshiftraisedstate.length) a laterendReadable()can schedule a fresh tick.'end'still fires exactly once, on the same tick it otherwise would — no observable ordering change.finish → end → closeordering is untouched. This is the safe subset of "collapse the teardown ticks"; it deliberately does not attempt the semver-major fusing offinish → end → close.Reproduction & measured numbers
Benchmarks are added under
benchmark/(not published — the packagefilesfield only shipslib,LICENSE,README.md). The only tracked source edit islib/…/readable.js, so it stashes cleanly while the untrackedbenchmark/dir stays put:nextTickhops / unitendReadableNT()scheduledflow(), 2× fromresume_())CPU: a paired A/B benchmark (base and patched loaded side-by-side in one process, alternating rep-by-rep so shared-box drift cancels), N=8000 units × 4 chunks, measured a median ~4.5–7.7 % CPU reduction with the patched build faster in the large majority of paired reps on a 2-core machine shared with other load. The magnitude tracks the ~20 % reduction in teardown ticks.
benchmark/endReadable-cpu.jsis the single-version form forgit stashbefore/after comparison; seebenchmark/README.md.Tests
Full suite stays green with the change:
Note on the mirror / build model (for @jeswr + maintainers to decide)
lib/here is a mechanical mirror of Node core (README: "a mirror of the streams implementations in Node.js 18.19.0"), regenerated bynpm run buildfrom the Node tarball +build/replacements.mjs. So a rawlib/edit would be lost on the next re-sync. Two ways to persist it, for the maintainers to pick:Land it in Node core first (preferred) and pick it up on the next sync. A companion core patch (same change against
nodejs/nodemaster'slib/internal/streams/readable.js, which uses the rawstate[kState]bitfield idiom) is prepared and applies cleanly (git apply --checkpasses). @jeswr is holding it to submit to core directly.Carry it as a
build/replacements.mjsentry. I verified the following entry, applied to the real Node v18.19.0 source and run through this repo's own babel+prettier pipeline, reproduces the committedlib/edit byte-for-byte:The commit here edits
lib/directly so the change can be reviewed, tested, and benchmarked in userland now; @jeswr will coordinate the persistence path with the maintainers.Scope note
A second candidate from the same investigation — hoisting the per-chunk arrow closure in
Transform.prototype._write— was evaluated and intentionally left out. It is not present in current Node core either, was never prototyped/measured in this investigation, is described in the analysis as noise-level, and touches a re-entrant hot path (the existingthis[kCallback]slot). It would need its own measured net-win benchmark before it's worth proposing, so it's kept out of this focused, measured change.🤖 Generated with Claude Code