Skip to content

Fix input-write deadlock in UserTerminalHandler on large input bursts#765

Open
Kronuz wants to merge 1 commit into
MisterTea:masterfrom
Kronuz:fix-input-write-deadlock
Open

Fix input-write deadlock in UserTerminalHandler on large input bursts#765
Kronuz wants to merge 1 commit into
MisterTea:masterfrom
Kronuz:fix-input-write-deadlock

Conversation

@Kronuz

@Kronuz Kronuz commented Jun 18, 2026

Copy link
Copy Markdown

The bug

The per-session terminal pump, UserTerminalHandler::runUserTerminal, is a single-threaded select loop that wrote client input to the pty master with a blocking RawSocketUtils::writeAll(). A burst of input larger than ~one pty buffer deadlocks the session:

  1. The input echoes back and fills the pty's output buffer.
  2. With its output buffer full, the shell stalls and stops reading input.
  3. The blocking write to the master never drains, so it never returns.
  4. Stuck in that write, the loop also stops draining output -- the very thing that would unstick the shell.

The whole session wedges. It's reproducible by pasting a multi-KB heredoc: ~1KB on macOS, ~8-10KB on Linux (where one pty buffer is larger).

Related is issue 682.

The fix

The read path already tolerated EAGAIN -- it was written expecting a non-blocking master -- but nothing ever made the master non-blocking. So:

  • Set O_NONBLOCK where the fd is created, in UserTerminal::setup (PseudoUserTerminal and the test terminals). This is now part of that interface's contract.
  • Rework the pump to never block: buffer pending input, drain it to the master through the select write set, and keep reading output every iteration.
  • Add backpressure: when the input buffer fills, stop reading more from the router so backpressure reaches the client -- symmetric to the existing output gating on routerWritable.

Testing

A new regression test, TerminalTest "LargeInputNoDeadlock", drives a real pty (cat in raw mode) through the real handler. It first does a small warmup round-trip to confirm the full client -> pty -> client echo path is live, then pushes an 8KB payload (well past one pty buffer) and requires every byte to echo back within a generous deadline. The old code deadlocks and the read never completes, so the test times out and fails; with the fix it round-trips and passes. The warmup keeps the test reliable on a slow build (a sanitizer run under ctest --parallel), where it would otherwise start typing before the pipeline was wired; it is far smaller than one pty buffer, so it round-trips even on the buggy code and cannot mask the regression. The existing tests never caught this because they all use FakeUserTerminal (a socket pair), which has no pty buffer semantics.

The full et-test suite is green, and the fix has been validated live: a 56KB one-shot input that previously wedged the session now round-trips byte-exact against a real etserver.

Why it's low-risk

Server-side only -- no protocol, client, or proto changes. The change is confined to the terminal pump and the UserTerminal::setup contract; the wire format and everything around it are untouched. A non-blocking master is what the read path already assumed, so this makes the code match its own expectations.

Asking for guidance

Setting O_NONBLOCK in setup (rather than inside the pump) made the contract explicit and kept the fake terminals honest, but I'm happy to move it if you'd rather. Is the approach right?


This PR was prepared with AI assistance; I've reviewed and stand behind it.

@Kronuz Kronuz force-pushed the fix-input-write-deadlock branch from 3d071a1 to 7c8c5d4 Compare June 18, 2026 01:40
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.55462% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.06%. Comparing base (6368584) to head (12c4a90).

Files with missing lines Patch % Lines
src/terminal/UserTerminalHandler.cpp 59.09% 9 Missing ⚠️
test/integration_tests/TerminalTest.cpp 92.55% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #765      +/-   ##
==========================================
+ Coverage   86.70%   87.06%   +0.36%     
==========================================
  Files          72       72              
  Lines        5460     5574     +114     
  Branches      508      525      +17     
==========================================
+ Hits         4734     4853     +119     
+ Misses        726      721       -5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The per-session terminal pump (UserTerminalHandler::runUserTerminal) is a
single-threaded select loop that wrote client input to the pty master with a
blocking RawSocketUtils::writeAll().  A burst of input larger than ~one pty
buffer deadlocks the session: the input echoes back, fills the pty output
buffer, stalls the shell, the shell stops reading input, and the blocking write
never returns -- so the loop also stops draining output.  Pasting a multi-KB
heredoc, for example, wedges the session (reproducible at ~1KB on macOS, ~8-10KB
on Linux, where one pty buffer is larger).

The read path already tolerated EAGAIN, i.e. it was written expecting a
non-blocking master, but nothing ever made the master non-blocking.  Set
O_NONBLOCK where the fd is created -- in UserTerminal::setup (PseudoUserTerminal,
and the test terminals), now part of that interface's contract -- and rework the
pump to never block: buffer pending input, drain it to the master via the select
write set, and keep reading output every iteration.  When the input buffer fills,
stop reading more from the router so backpressure reaches the client, symmetric
to the existing output gating on routerWritable.

Add a regression test (TerminalTest "LargeInputNoDeadlock") that drives a real
pty (cat in raw mode) through the real handler and requires an 8KB input to echo
back within a deadline.  It deadlocks (times out) on the old code and passes with
the fix.  The existing tests never caught this because they all use
FakeUserTerminal (a socket pair), which has no pty buffer semantics.

Server-side only; no protocol, client, or proto changes.
@Kronuz Kronuz force-pushed the fix-input-write-deadlock branch from 7c8c5d4 to 12c4a90 Compare June 18, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant