feat: surface uploadable files to the agent#534
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves the agent’s ability to complete upload-gated web forms by (1) surfacing concrete allowlisted file paths in the agent prompt and (2) no longer advertising upload_file in prompts when uploads are disabled. It extends the existing upload allowlist security gate to also act as a “capability manifest” for the agent.
Changes:
- Add a Node-only utility
resolveAdvertisedUploadFiles(allowFileUpload)to filter allowlist entries down to existing regular files and resolve them to absolute paths. - Thread
hasFileUpload+advertisedUploadFilesthrough prompt builders soupload_fileexamples are shown/hidden appropriately and can list exact available file paths. - Update CLI and server runners to compute
advertisedUploadFilesand pass them intoWebAgentOptions; add/extend unit tests around the new behavior.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/server/test/sensitive-data-leak.test.ts | Extend pilo-core mock to include the new export used by the server runner. |
| packages/server/src/taskRunner.ts | Compute advertisedUploadFiles from the upload allowlist and pass into WebAgent options. |
| packages/server/src/taskRunner.test.ts | Extend pilo-core mock to include the new export used by runTask(). |
| packages/core/test/utils/uploadFiles.test.ts | Add unit tests for resolveAdvertisedUploadFiles() (files included; dirs/missing skipped). |
| packages/core/test/prompts.test.ts | Add tests asserting upload_file prompt gating and advertised file list rendering. |
| packages/core/src/webAgent.ts | Add advertisedUploadFiles option plumbed into prompt construction; derive hasFileUpload. |
| packages/core/src/utils/uploadFiles.ts | Introduce resolveAdvertisedUploadFiles() Node-only helper. |
| packages/core/src/prompts.ts | Gate upload_file tool example by hasFileUpload and include an advertised-files note when present. |
| packages/core/src/index.ts | Export the new Node-only helper from the main barrel export. |
| packages/cli/src/commands/run.ts | Compute advertisedUploadFiles in CLI and pass into WebAgent options. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ); | ||
| const allowFileUpload: false | FileUploadConfig = | ||
| uploadAllowedPaths.length > 0 ? { allowedPaths: uploadAllowedPaths } : false; | ||
| const advertisedUploadFiles = await resolveAdvertisedUploadFiles(allowFileUpload); |
There was a problem hiding this comment.
Verified, and the real issue is incomplete test mocks rather than the production call. pilo-core is a workspace:* dependency, so resolveAdvertisedUploadFiles is always present at runtime — there's no real version-mismatch path. The only place it's undefined is hand-written vi.mock("pilo-core", () => ({...})) factories that omit it; guarding the production call to accommodate those would degrade production code to satisfy test fakes.
Fixed by completing the mocks instead. You flagged routes/pilo.test.ts; the same gap also existed in routes/piloWs.test.ts (the WS route also calls runTask) — both now export resolveAdvertisedUploadFiles. (sensitive-data-leak.test.ts and taskRunner.test.ts already had it.) Note the routes/* suites were green before this because /pilo/run is an SSE stream: status 200 is sent when the stream opens, so the TypeError was thrown inside the stream callback and masked by tests that only assert the status, not the streamed events.
The CLI suite (run.test.ts) uses importOriginal() + spread, so it already had the real export — no change needed there.
There was a problem hiding this comment.
Correction to my earlier reply: I'd also touched routes/piloWs.test.ts, but that was unnecessary and I've reverted it. That suite mocks ../taskRunner.js outright (runTask → a local vi.fn()), so the real runTask — and thus resolveAdvertisedUploadFiles — never runs there; its pilo-core mock didn't need the export. The only suite that genuinely hit the masked TypeError is routes/pilo.test.ts (real runTask via the SSE route), and that's the one fixed here. Force-pushed; the diff now only adds the export to routes/pilo.test.ts.
a9919b7 to
daf06b9
Compare
When upload_allowed_paths names specific files, advertise those exact paths in the agent prompt so it can complete upload-gated forms instead of giving up. Also stop advertising upload_file when uploads are disabled. - resolveAdvertisedUploadFiles(): Node-only helper resolving allowlist entries that are existing regular files (directories/missing skipped); exported from index.ts only, never the browser-safe core.ts. - buildToolExamples gates the upload_file example on hasFileUpload and lists advertised file paths; threaded through buildActionLoopSystemPrompt and buildStepErrorFeedbackPrompt. - WebAgent stays browser-safe (no fs): CLI and server compute the file list and pass advertisedUploadFiles via WebAgentOptions. - Complete the pilo-core mocks in the server test suites (routes/pilo, routes/piloWs, sensitive-data-leak, taskRunner) with the new export so runTask's unconditional call resolves under the hand-written mocks. Builds on #494 (upload_file action); addresses the upload gap from #462. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
daf06b9 to
6885fac
Compare
…bort (#535) ## Summary Refines the `done()`/`abort()` guidance in the action-loop prompt. Today, a form submit that returns **no error** but shows **no explicit confirmation** is treated as "unverified," pushing the agent to `abort()`. That's miscalibrated for action tasks: the honest outcome is "submitted; no confirmation shown, but no error either." This change lets the agent report such cases with `done()` (caveated), while **keeping** `abort()` for unverified *data* and blocked core steps. Two scoped edits to the "Before calling done()" block in `prompts.ts`: - Form-submit check now distinguishes a validation error (did NOT submit → fix/retry) from "neither confirmation nor error" (normal on many sites → `done()` stating no confirmation was shown). - The abort-on-uncertain rule is narrowed to *information/data* you must return, or a blocked core step — explicitly **not** an action you performed that produced no error. ## Why separate from #534 This is a global agent-behavior change (affects every task), distinct from the upload-files feature. Reviewers should weigh it on its own. ## Validation - [x] **Target (action tasks):** browser-use ember-form upload task, firewall on — **5 of 6** runs now pass via honest, caveated `done()` (previously aborted reliably). Reasoning shows the agent stating it submitted with no confirmation but no error. - [x] **Guardrail intact (data honesty):** "report a datum that isn't on the page" probe — **3 of 3** correctly `abort()`, zero fabrication. - [x] Unit test asserts the new guidance is present; `pnpm --filter pilo-core` tests green. ## Note for reviewers Rigorous research-task regression measurement belongs in a main-vs-branch comparison eval (the local probes here are a sound sanity check, not a full sweep). Worth running before merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Summary
Builds on the
upload_fileaction (#494): whenupload_allowed_pathsnames specific files, the agent is now told those files exist and their exact paths, so it can complete upload-gated forms instead of giving up. Also stops advertisingupload_filein the prompt when uploads are disabled (a pre-existing quirk).resolveAdvertisedUploadFiles(allowFileUpload)resolves the allowlist entries that are existing regular files (directories/missing paths skipped). Exported fromindex.tsonly — never the browser-safecore.ts.buildToolExamplesgates theupload_fileexample on a newhasFileUploadflag and, when files are advertised, lists their exact paths near the affordance. Threaded throughbuildActionLoopSystemPromptandbuildStepErrorFeedbackPrompt.WebAgentstays browser-safe (nofs): CLI and server compute the file list (they already buildallowFileUpload) and passadvertisedUploadFilesviaWebAgentOptions.Why
File upload was the dominant capability gap on the browser-use interaction benchmark (#462). With the allowlist as a pure security gate, the agent never knew a fixture existed; it now learns the path from its prompt, the way browser-use surfaces its available files.
Test Plan
resolveAdvertisedUploadFiles(files included, dirs/missing excluded);buildToolExamples/builders show/hideupload_file+ the file note by flag.pnpm --filter pilo-core --filter pilo-cli --filter pilo-server run testgreen;typecheck+format:checkclean.upload_filewith the prompt-advertised path and the form accepts it.🤖 Generated with Claude Code