Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix intermittent assistant no-response issue by making run triggering resilient and runtime
failures observable
This change addresses the intermittent cases where the assistant appeared to stop responding,
sometimes after a few tool calls and sometimes even after the first user message.
Primary fix: prevent queued work from getting stranded
The assistant runtime uses a per-run lock so only one processing loop runs for a conversation
at a time. Previously, if a new trigger arrived while that lock was held, the trigger was
dropped after logging “unable to acquire lock.” That is fine only if the active loop naturally
discovers all new work before it exits. In practice, there are edge cases around run shutdown,
permission/human-input continuations, and queued user messages where work can be enqueued while
the active loop is near completion or paused, then never picked up again.
The runtime now records that a rerun was requested when it cannot acquire the lock. Once the
active run releases the lock, it checks that flag and immediately triggers the run again. This
makes trigger delivery edge-triggered plus level-triggered: even if the immediate trigger loses
the lock race, the queued message or continuation is processed after the current cycle ends.
This directly addresses the “assistant does not respond” symptom because messages and
continuation events are no longer silently stranded behind a missed trigger.
Secondary fix: surface runtime failures instead of ending silently
Previously,
runs:createMessagefiredruntime.trigger(runId)in the background and returnedsuccess to the renderer. If the background runtime later threw outside the model stream’s own
error-event path, the UI could receive only
run-processing-endwithout anyerrorevent orassistant message. From the user’s perspective, the assistant simply stopped.
The runtime now catches non-abort failures at the top level, logs them, persists a run
errorevent, publishes that event to the renderer, and still emits
run-processing-endfor cleanup.That means model setup failures, provider/client exceptions, run-state issues, serialization
failures, and other unexpected runtime exceptions are visible in the chat instead of looking
like a silent no-op.
Additional robustness: tool and model stream errors recover more cleanly
Tool execution exceptions are now converted into tool-result error payloads when they are not
aborts. This lets the model observe that a tool failed and continue or explain the issue,
instead of the entire run dying mid-tool-call.
Model stream setup and iteration errors are also converted into model stream
errorevents.The existing stream error handling can then produce a run error event and show it in the UI.
Small cleanup
The call sites that intentionally fire
runtime.trigger(runId)in the background now usevoid runtime.trigger(runId)to make that behavior explicit.Net effect
The main behavior change is that assistant work is no longer dropped when a trigger races with
an active run lock. If something still fails, the failure is now surfaced as a visible run
error rather than leaving the user with an indefinitely silent or completed-looking
conversation.