Skip to content

fix: multi-pass response extraction, schema sanitization, workspace safety, and SandboxJS try-catch#583

Open
buger wants to merge 12 commits into
mainfrom
fix/workflow-output-unwrapping
Open

fix: multi-pass response extraction, schema sanitization, workspace safety, and SandboxJS try-catch#583
buger wants to merge 12 commits into
mainfrom
fix/workflow-output-unwrapping

Conversation

@buger

@buger buger commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • track-execution: Replace naive single-pass response extraction with multi-pass approach that filters ProbeAgent tool output (├─┬, [task:), unwraps JSON-wrapped text ({"text":"..."} with control char handling), and falls back through generate-response entries before defaulting to "Execution completed". Fixes tasks showing raw JSON or tool tree output instead of actual AI responses.

  • mcp-custom-sse-server: Add fixRequiredFields recursive sanitizer that strips required entries referencing non-existent properties (Gemini rejects these schemas). Also apply normalizeInputSchema to external MCP tools (regularTools) — previously only workflow/HTTP/UTCP tools were normalized.

  • workflow-tool-executor: Fix argsOverrides filter from truthiness check (!argsOverrides[r]) to proper key existence (!(r in argsOverrides)) so falsy override values like 0, "", or false are correctly handled.

  • workspace-manager: Add safety guards to cleanupStale() — skip directories that are real git repos (have .git directory, not worktree file), validate worktree gitdir paths actually point to .git/worktrees/, and refuse to remove worktrees from parent repos outside the workspace basePath. Prevents accidental deletion of user repositories.

  • SandboxJS: Bump @nyariv/sandboxjs to probelabs/SandboxJS@d0d8c8a which fixes three bugs in try-catch handling for async/await: catch variable extraction (regex group 2→3), ExecReturn leaking that prevented code after try-catch from executing, and try-finally without catch silently swallowing errors.

  • assistant.yaml: Add session continuation prompt guidance for workflow tools.

Test plan

  • 15 new tests for multi-pass response extraction (JSON unwrapping, tool output detection, generate-response fallback, empty history, production-like scenarios)
  • Updated workspace-manager test to match new safety guards
  • All 3444 existing unit tests pass
  • 56 new tests in SandboxJS for async try-catch scenarios
  • All 230 SandboxJS tests pass (174 existing + 56 new)

🤖 Generated with Claude Code

buger and others added 12 commits March 26, 2026 06:34
…to SSE server

Two bugs fixed:

1. Workflow output value_js received raw ReviewSummary wrappers instead of
   unwrapped step outputs. This caused every workflow tool (slack-search,
   slack-read-thread, discourse-read-thread, discourse-reply) to return
   "Unknown error" because outputs['step'].success was undefined (actual
   data was nested in .output). Script steps already unwrapped correctly
   via buildProviderTemplateContext, but workflow-executor's value_js,
   if conditions, and Liquid contexts did not.

2. executeHttpClientTool in mcp-custom-sse-server only handled bearer auth,
   not oauth2_client_credentials. When the AI called http_client tools with
   oauth2 auth (e.g. MongoDB Atlas), no token exchange happened and requests
   went out without Authorization headers, causing 401 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix on_message trigger dispatch to match normal message path:
  seed setFirstMessage so human_input checks auto-resolve and the
  full intent-router → build-config → generate-response chain runs
  with proper tool loading (Jira MCP, Slack, etc.)
- Inject trigger.inputs.text as the AI message with original Slack
  message appended, so triggers can give specific instructions
- Fix live update race condition: serialize publish() calls via a
  promise queue in SlackTaskLiveUpdateSink to prevent duplicate
  Slack messages when tick() and complete() run concurrently
- Track inflightTick promise so complete()/fail() await in-flight
  ticks before publishing the final update
- Fix self-bot message detection for bot_message subtypes by also
  checking ev.bot_id against the bot's own bot_id from auth.test
- Add resolveChannelName() to SlackClient for #channel-name support
  in scheduler output targets via conversations.list with caching
- Allow cron jobs without workflow (inputs.text as user message)
- Make StaticCronJob.workflow optional in types
- Fix workflow output warning to only fire for undefined (not null)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add debounce-manager for throttling check executions and integrate
it into level-dispatch. Supports configurable throttle settings
per check via config types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add global uncaughtException handler that suppresses transient I/O
errors (EIO, EPIPE, ECONNRESET, ERR_STREAM_DESTROYED) from dying
child processes instead of crashing the entire visor process.

Three layers of defense:
- Global handler in child-process-error-handler.ts (imported early)
- Worktree manager skips process.exit(1) for transient I/O errors
- Stream-level error handlers on MCP transport stderr pipes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update @probelabs/probe to v0.6.0-rc313 with enriched task telemetry
  (agent scope fields, full task state on events, task.items_json)
- Parse task.items_json from batch events for proper titles on batch
  created/updated/completed/deleted operations
- Collapse sub-agent scopes (engineer, code-explorer) that lack
  meaningful task titles into deduplicated single-line entries instead
  of showing repetitive generic "Engineer Task" items
- Preserve sub-agent task titles when they exist (from task tool snapshots)
- Group repeated sub-agent iterations under a single scope label

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches the assistant to reuse inner ProbeAgent sessions via
continue_session when making follow-up calls to the same tool,
avoiding expensive cold-start re-execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ce safety

- track-execution: replace naive single-pass response extraction with
  multi-pass approach (unwrapJsonText, isToolOutput, extractBestResponseText)
  that filters tool output, unwraps JSON-wrapped text, and falls back through
  generate-response entries before defaulting to "Execution completed"

- mcp-custom-sse-server: add fixRequiredFields to strip invalid entries from
  required arrays (Gemini rejects schemas referencing non-existent properties);
  apply normalizeInputSchema to external MCP tools (regularTools)

- workflow-tool-executor: fix argsOverrides filter from truthiness check to
  proper key existence (!(r in argsOverrides)) so falsy override values work

- workspace-manager: add safety guards to cleanupStale() — skip directories
  that are real git repos, validate worktree gitdir paths, refuse to touch
  parent repos outside the workspace basePath

- sandboxjs: bump to probelabs/SandboxJS@d0d8c8a which fixes try-catch
  with async/await (catch variable extraction, ExecReturn leaking,
  try-finally without catch)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant