Fix input-write deadlock in UserTerminalHandler on large input bursts#765
Open
Kronuz wants to merge 1 commit into
Open
Fix input-write deadlock in UserTerminalHandler on large input bursts#765Kronuz wants to merge 1 commit into
Kronuz wants to merge 1 commit into
Conversation
3d071a1 to
7c8c5d4
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #765 +/- ##
==========================================
+ Coverage 86.70% 87.06% +0.36%
==========================================
Files 72 72
Lines 5460 5574 +114
Branches 508 525 +17
==========================================
+ Hits 4734 4853 +119
+ Misses 726 721 -5 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
The per-session terminal pump (UserTerminalHandler::runUserTerminal) is a single-threaded select loop that wrote client input to the pty master with a blocking RawSocketUtils::writeAll(). A burst of input larger than ~one pty buffer deadlocks the session: the input echoes back, fills the pty output buffer, stalls the shell, the shell stops reading input, and the blocking write never returns -- so the loop also stops draining output. Pasting a multi-KB heredoc, for example, wedges the session (reproducible at ~1KB on macOS, ~8-10KB on Linux, where one pty buffer is larger). The read path already tolerated EAGAIN, i.e. it was written expecting a non-blocking master, but nothing ever made the master non-blocking. Set O_NONBLOCK where the fd is created -- in UserTerminal::setup (PseudoUserTerminal, and the test terminals), now part of that interface's contract -- and rework the pump to never block: buffer pending input, drain it to the master via the select write set, and keep reading output every iteration. When the input buffer fills, stop reading more from the router so backpressure reaches the client, symmetric to the existing output gating on routerWritable. Add a regression test (TerminalTest "LargeInputNoDeadlock") that drives a real pty (cat in raw mode) through the real handler and requires an 8KB input to echo back within a deadline. It deadlocks (times out) on the old code and passes with the fix. The existing tests never caught this because they all use FakeUserTerminal (a socket pair), which has no pty buffer semantics. Server-side only; no protocol, client, or proto changes.
7c8c5d4 to
12c4a90
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
The per-session terminal pump,
UserTerminalHandler::runUserTerminal, is a single-threadedselectloop that wrote client input to the pty master with a blockingRawSocketUtils::writeAll(). A burst of input larger than ~one pty buffer deadlocks the session:The whole session wedges. It's reproducible by pasting a multi-KB heredoc: ~1KB on macOS, ~8-10KB on Linux (where one pty buffer is larger).
Related is issue 682.
The fix
The read path already tolerated
EAGAIN-- it was written expecting a non-blocking master -- but nothing ever made the master non-blocking. So:O_NONBLOCKwhere the fd is created, inUserTerminal::setup(PseudoUserTerminaland the test terminals). This is now part of that interface's contract.selectwrite set, and keep reading output every iteration.routerWritable.Testing
A new regression test,
TerminalTest "LargeInputNoDeadlock", drives a real pty (catin raw mode) through the real handler. It first does a small warmup round-trip to confirm the full client -> pty -> client echo path is live, then pushes an 8KB payload (well past one pty buffer) and requires every byte to echo back within a generous deadline. The old code deadlocks and the read never completes, so the test times out and fails; with the fix it round-trips and passes. The warmup keeps the test reliable on a slow build (a sanitizer run underctest --parallel), where it would otherwise start typing before the pipeline was wired; it is far smaller than one pty buffer, so it round-trips even on the buggy code and cannot mask the regression. The existing tests never caught this because they all useFakeUserTerminal(a socket pair), which has no pty buffer semantics.The full
et-testsuite is green, and the fix has been validated live: a 56KB one-shot input that previously wedged the session now round-trips byte-exact against a realetserver.Why it's low-risk
Server-side only -- no protocol, client, or proto changes. The change is confined to the terminal pump and the
UserTerminal::setupcontract; the wire format and everything around it are untouched. A non-blocking master is what the read path already assumed, so this makes the code match its own expectations.Asking for guidance
Setting
O_NONBLOCKinsetup(rather than inside the pump) made the contract explicit and kept the fake terminals honest, but I'm happy to move it if you'd rather. Is the approach right?This PR was prepared with AI assistance; I've reviewed and stand behind it.