Skip to content

fix runtime issue#516

Open
arkml wants to merge 1 commit intodevfrom
fix_runtime2
Open

fix runtime issue#516
arkml wants to merge 1 commit intodevfrom
fix_runtime2

Conversation

@arkml
Copy link
Copy Markdown
Contributor

@arkml arkml commented Apr 21, 2026

Fix intermittent assistant no-response issue by making run triggering resilient and runtime
failures observable

This change addresses the intermittent cases where the assistant appeared to stop responding,
sometimes after a few tool calls and sometimes even after the first user message.

Primary fix: prevent queued work from getting stranded

The assistant runtime uses a per-run lock so only one processing loop runs for a conversation
at a time. Previously, if a new trigger arrived while that lock was held, the trigger was
dropped after logging “unable to acquire lock.” That is fine only if the active loop naturally
discovers all new work before it exits. In practice, there are edge cases around run shutdown,
permission/human-input continuations, and queued user messages where work can be enqueued while
the active loop is near completion or paused, then never picked up again.

The runtime now records that a rerun was requested when it cannot acquire the lock. Once the
active run releases the lock, it checks that flag and immediately triggers the run again. This
makes trigger delivery edge-triggered plus level-triggered: even if the immediate trigger loses
the lock race, the queued message or continuation is processed after the current cycle ends.

This directly addresses the “assistant does not respond” symptom because messages and
continuation events are no longer silently stranded behind a missed trigger.

Secondary fix: surface runtime failures instead of ending silently

Previously, runs:createMessage fired runtime.trigger(runId) in the background and returned
success to the renderer. If the background runtime later threw outside the model stream’s own
error-event path, the UI could receive only run-processing-end without any error event or
assistant message. From the user’s perspective, the assistant simply stopped.

The runtime now catches non-abort failures at the top level, logs them, persists a run error
event, publishes that event to the renderer, and still emits run-processing-end for cleanup.
That means model setup failures, provider/client exceptions, run-state issues, serialization
failures, and other unexpected runtime exceptions are visible in the chat instead of looking
like a silent no-op.

Additional robustness: tool and model stream errors recover more cleanly

Tool execution exceptions are now converted into tool-result error payloads when they are not
aborts. This lets the model observe that a tool failed and continue or explain the issue,
instead of the entire run dying mid-tool-call.

Model stream setup and iteration errors are also converted into model stream error events.
The existing stream error handling can then produce a run error event and show it in the UI.

Small cleanup

The call sites that intentionally fire runtime.trigger(runId) in the background now use void runtime.trigger(runId) to make that behavior explicit.

Net effect

The main behavior change is that assistant work is no longer dropped when a trigger races with
an active run lock. If something still fails, the failure is now surfaced as a visible run
error rather than leaving the user with an indefinitely silent or completed-looking
conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant