fix(adapter): surface error on agent EOF without final response#1198
fix(adapter): surface error on agent EOF without final response#1198howie wants to merge 2 commits into
Conversation
|
LGTM ✅ — Minimal, correct defensive fix for the silent-EOF-on-bridged-agent-crash path. What This PR DoesWhen a bridged (non-ACP-native) agent crashes on a backend error (e.g. HTTP 500 / quota exhausted), it exits without emitting an ACP error notification. Previously this fell through to the confusing How It WorksIn the recv loop's
Findings
Finding Details🟢 F1: Correct guard logicThe dual-guard ( 🟢 F2: Clear inline documentationThe comment block above the new code explains exactly when this path triggers and why, making it self-documenting for future maintainers. 🟢 F3: Thorough PR descriptionThe PR body walks through the full failure chain, documents the three termination paths, explains alternatives considered, and notes the intentional partial-text trade-off. Exemplary. Baseline Check
CI Status
What's Good (🟢)
|
The ACP recv loop's EOF branch (rx.recv() -> None) broke out of the loop without setting response_error. When a bridged agent crashes on a backend error (e.g. Gemini HTTP 500 / quota exhausted) it exits without ever emitting an ACP error notification, so the pipe closes, recv() returns None, and final_content falls through to the "_(no response)_" sentinel -- hiding the real failure from the user. Mark unexpected termination explicitly on EOF when no error was recorded and no text was streamed, so the user sees a concrete error instead of a vague sentinel. The text_buf.is_empty() guard preserves any partial text that did arrive before the connection closed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01X6Vv8eNtc4T9ESSSzZ2KgM
dc9dfee to
513c523
Compare
Mob review (Codex + Claude multi-finder) found the original text_buf.is_empty() guard predicates error surfacing on buffer contents, which is not a reliable proxy for turn completion. Three crash classes still slipped through silently: - session-reset turns pre-seed text_buf with the expiry notice, so the guard is always false and a crash is masked; - send-once mode slices off inter-tool narration before delivery, so a crash after a tool (non-empty buffer, empty delivered body) shows no error; - native-streaming partial-text crashes were shown as complete answers. Drop the text_buf.is_empty() conjunct: on EOF, set response_error whenever none was recorded. This is safe because a successful turn is always signalled by the id-bearing JSON-RPC response to session/prompt, which breaks the loop before EOF — so the EOF arm is reachable only on abnormal termination, regardless of buffer contents. When partial text was streamed, final_content prepends the warning to it, preserving the output while flagging the truncation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01X6Vv8eNtc4T9ESSSzZ2KgM
Final Aggregated Review — PR #1198Modegroup-review (2/2 voices active: Claude 8-finder/14-verified workflow + Codex) Outcome (after fix + re-review)
Consensus Critical (must fix)None. Consensus Important (must fix)Both voices converge on a single root cause: the
Unifying fix (resolves all three): drop the The original spec's Disputed / scope decision (user decides)
Actionable NIT
Voices unavailable
Out-of-PR note
|
|
Caution This PR is missing a Discord Discussion URL in the body. All PRs must reference a prior Discord discussion to ensure community alignment before implementation. Please edit the PR description to include a link like: |
What problem does this solve?
When a bridged (non-ACP-native) agent crashes on a backend error, OpenAB shows the user a vague
_(no response)_(or, worse, a partially-streamed buffer presented as a complete answer) instead of the real failure reason.agy does not natively speak ACP — we bridge it. So when an AI backend (e.g. Gemini) returns HTTP 500 or a quota-exhausted error, agy can crash/exit without first emitting an ACP final response carrying that error. The failure chain:
rx.recv() -> Nonebranch breaks out, butresponse_erroris stillNone.final_contentis empty → falls through to the_(no response)_sentinel.The user is left with no idea their request hit a quota limit or server error.
This was surfaced in the OpenAB community discussion. Because agy itself doesn't support ACP, the bridging layer (OpenAB side) must supply a sensible error on pipe EOF rather than letting it fall through.
Closes #
Discord Discussion URL:
At a Glance
Prior Art & Industry Research
Not applicable — defensive bug fix on the ACP bridge. Adds error surfacing on an existing
breakpath; no new architecture, dependency, or protocol. It makes the EOF branch consistent with the
sibling
!conn.alive()branch, which already setsresponse_error = "Agent process died".Proposed Solution
In
crates/openab-core/src/adapter.rs, the recv loop EOF branch records an explicit error when thepipe closes without a final response and no error was already recorded:
The downstream
final_contentbuilder already rendersresponse_erroras⚠️ {err}(standalonewhen there is no body, or prepended to any partial content), so no other change is needed.
Why this approach?
crashing, so the bridge detects the abnormal EOF itself, mirroring the existing
!conn.alive()branch.text_buf.is_empty(). Multi-modelmob review (Codex + a Claude multi-finder review) showed that guard is unsound: it is defeated by
the session-reset pre-seed (
text_bufalready holds the expiry notice), by send-once mode (thedelivered body is a post-
answer_startslice, not the full buffer), and by native streaming(partial text shown as a complete answer). The guard is also unnecessary: a successful turn is
always signalled by the id-bearing JSON-RPC response to
session/prompt, which breaks the recvloop before EOF — so the EOF branch is reachable only on abnormal termination, regardless of
buffer contents. When partial text was streamed,
final_contentprepends the warning to it,preserving the output while flagging the truncation.
Alternatives Considered
text_buf.is_empty()(earlier revision): rejected after mob review — masks crashes onreset turns, send-once tool turns, and native-streaming partials (see above).
signals the pipe closed; a timeout only adds latency for the same outcome.
emit an error before crashing, so the bridge must be robust to silent EOF. A complementary agy-side
fix is still worthwhile but out of scope for this repo.
Known Issues / Follow-ups
response_errordrives the
⚠️message banner but notdelivery_failed, so dispatch still reacts with success(🆗) rather than error (❌). This gap is shared with the existing
Agent process died/hard-timeout / coded-error branches — this PR only joins the pattern. Recommended as a separate PR
that wires
response_error → ❌uniformly for all error branches.Validation
Rust changes:
cargo build -p openab-corepassescargo test -p openab-core --lib adapter— 43 passed, 0 failedadapter.rsclippy-cleantext_buf.is_empty()guard was droppedCI note:
checkjob (cargo clippy --workspace -- -D warnings) is currently RED oncrates/openab-core/src/pre_seed.rs:289(manual_is_multiple_of, rust-1.96 lint) — pre-existingon main (landed via feat: add pre_seed phase — S3 zip download before pre_boot #1189/feat(pre_seed): support .tar.gz/.tgz archives #1196/build: make pre-seed a default feature #1197), not part of this diff. The merge gate will stay red until
main is fixed; flagging for maintainers.
All PRs:
confirm the user sees
⚠️ Agent process exited unexpectedly(with any partial text preserved)instead of
_(no response)_. No automated test seam exists for this deeply-nested recv loop;the sibling error paths are likewise covered only by manual repro against a live agent.