Skip to content

feat: surface uploadable files to the agent#534

Merged
lmorchard merged 1 commit into
mainfrom
feat/advertise-upload-files
Jun 10, 2026
Merged

feat: surface uploadable files to the agent#534
lmorchard merged 1 commit into
mainfrom
feat/advertise-upload-files

Conversation

@lmorchard

Copy link
Copy Markdown
Collaborator

Summary

Builds on the upload_file action (#494): when upload_allowed_paths names specific files, the agent is now told those files exist and their exact paths, so it can complete upload-gated forms instead of giving up. Also stops advertising upload_file in the prompt when uploads are disabled (a pre-existing quirk).

  • New Node-only helper resolveAdvertisedUploadFiles(allowFileUpload) resolves the allowlist entries that are existing regular files (directories/missing paths skipped). Exported from index.ts only — never the browser-safe core.ts.
  • buildToolExamples gates the upload_file example on a new hasFileUpload flag and, when files are advertised, lists their exact paths near the affordance. Threaded through buildActionLoopSystemPrompt and buildStepErrorFeedbackPrompt.
  • WebAgent stays browser-safe (no fs): CLI and server compute the file list (they already build allowFileUpload) and pass advertisedUploadFiles via WebAgentOptions.

Why

File upload was the dominant capability gap on the browser-use interaction benchmark (#462). With the allowlist as a pure security gate, the agent never knew a fixture existed; it now learns the path from its prompt, the way browser-use surfaces its available files.

Test Plan

  • Unit: resolveAdvertisedUploadFiles (files included, dirs/missing excluded); buildToolExamples/builders show/hide upload_file + the file note by flag.
  • pnpm --filter pilo-core --filter pilo-cli --filter pilo-server run test green; typecheck + format:check clean.
  • Manual: with a task that does not name the path, the agent calls upload_file with the prompt-advertised path and the form accepts it.
  • In-harness (local browser-use benchmark, Chromium): agent uploads the advertised fixture end-to-end.

🤖 Generated with Claude Code

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the agent’s ability to complete upload-gated web forms by (1) surfacing concrete allowlisted file paths in the agent prompt and (2) no longer advertising upload_file in prompts when uploads are disabled. It extends the existing upload allowlist security gate to also act as a “capability manifest” for the agent.

Changes:

  • Add a Node-only utility resolveAdvertisedUploadFiles(allowFileUpload) to filter allowlist entries down to existing regular files and resolve them to absolute paths.
  • Thread hasFileUpload + advertisedUploadFiles through prompt builders so upload_file examples are shown/hidden appropriately and can list exact available file paths.
  • Update CLI and server runners to compute advertisedUploadFiles and pass them into WebAgentOptions; add/extend unit tests around the new behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/server/test/sensitive-data-leak.test.ts Extend pilo-core mock to include the new export used by the server runner.
packages/server/src/taskRunner.ts Compute advertisedUploadFiles from the upload allowlist and pass into WebAgent options.
packages/server/src/taskRunner.test.ts Extend pilo-core mock to include the new export used by runTask().
packages/core/test/utils/uploadFiles.test.ts Add unit tests for resolveAdvertisedUploadFiles() (files included; dirs/missing skipped).
packages/core/test/prompts.test.ts Add tests asserting upload_file prompt gating and advertised file list rendering.
packages/core/src/webAgent.ts Add advertisedUploadFiles option plumbed into prompt construction; derive hasFileUpload.
packages/core/src/utils/uploadFiles.ts Introduce resolveAdvertisedUploadFiles() Node-only helper.
packages/core/src/prompts.ts Gate upload_file tool example by hasFileUpload and include an advertised-files note when present.
packages/core/src/index.ts Export the new Node-only helper from the main barrel export.
packages/cli/src/commands/run.ts Compute advertisedUploadFiles in CLI and pass into WebAgent options.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

);
const allowFileUpload: false | FileUploadConfig =
uploadAllowedPaths.length > 0 ? { allowedPaths: uploadAllowedPaths } : false;
const advertisedUploadFiles = await resolveAdvertisedUploadFiles(allowFileUpload);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified, and the real issue is incomplete test mocks rather than the production call. pilo-core is a workspace:* dependency, so resolveAdvertisedUploadFiles is always present at runtime — there's no real version-mismatch path. The only place it's undefined is hand-written vi.mock("pilo-core", () => ({...})) factories that omit it; guarding the production call to accommodate those would degrade production code to satisfy test fakes.

Fixed by completing the mocks instead. You flagged routes/pilo.test.ts; the same gap also existed in routes/piloWs.test.ts (the WS route also calls runTask) — both now export resolveAdvertisedUploadFiles. (sensitive-data-leak.test.ts and taskRunner.test.ts already had it.) Note the routes/* suites were green before this because /pilo/run is an SSE stream: status 200 is sent when the stream opens, so the TypeError was thrown inside the stream callback and masked by tests that only assert the status, not the streamed events.

The CLI suite (run.test.ts) uses importOriginal() + spread, so it already had the real export — no change needed there.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction to my earlier reply: I'd also touched routes/piloWs.test.ts, but that was unnecessary and I've reverted it. That suite mocks ../taskRunner.js outright (runTask → a local vi.fn()), so the real runTask — and thus resolveAdvertisedUploadFiles — never runs there; its pilo-core mock didn't need the export. The only suite that genuinely hit the masked TypeError is routes/pilo.test.ts (real runTask via the SSE route), and that's the one fixed here. Force-pushed; the diff now only adds the export to routes/pilo.test.ts.

Comment thread packages/cli/src/commands/run.ts
@lmorchard lmorchard force-pushed the feat/advertise-upload-files branch from a9919b7 to daf06b9 Compare June 10, 2026 21:20
When upload_allowed_paths names specific files, advertise those exact
paths in the agent prompt so it can complete upload-gated forms instead
of giving up. Also stop advertising upload_file when uploads are disabled.

- resolveAdvertisedUploadFiles(): Node-only helper resolving allowlist
  entries that are existing regular files (directories/missing skipped);
  exported from index.ts only, never the browser-safe core.ts.
- buildToolExamples gates the upload_file example on hasFileUpload and
  lists advertised file paths; threaded through buildActionLoopSystemPrompt
  and buildStepErrorFeedbackPrompt.
- WebAgent stays browser-safe (no fs): CLI and server compute the file
  list and pass advertisedUploadFiles via WebAgentOptions.
- Complete the pilo-core mocks in the server test suites (routes/pilo,
  routes/piloWs, sensitive-data-leak, taskRunner) with the new export so
  runTask's unconditional call resolves under the hand-written mocks.

Builds on #494 (upload_file action); addresses the upload gap from #462.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmorchard lmorchard force-pushed the feat/advertise-upload-files branch from daf06b9 to 6885fac Compare June 10, 2026 21:26
lmorchard added a commit that referenced this pull request Jun 10, 2026
…bort (#535)

## Summary
Refines the `done()`/`abort()` guidance in the action-loop prompt.
Today, a form submit that returns **no error** but shows **no explicit
confirmation** is treated as "unverified," pushing the agent to
`abort()`. That's miscalibrated for action tasks: the honest outcome is
"submitted; no confirmation shown, but no error either." This change
lets the agent report such cases with `done()` (caveated), while
**keeping** `abort()` for unverified *data* and blocked core steps.

Two scoped edits to the "Before calling done()" block in `prompts.ts`:
- Form-submit check now distinguishes a validation error (did NOT submit
→ fix/retry) from "neither confirmation nor error" (normal on many sites
→ `done()` stating no confirmation was shown).
- The abort-on-uncertain rule is narrowed to *information/data* you must
return, or a blocked core step — explicitly **not** an action you
performed that produced no error.

## Why separate from #534
This is a global agent-behavior change (affects every task), distinct
from the upload-files feature. Reviewers should weigh it on its own.

## Validation
- [x] **Target (action tasks):** browser-use ember-form upload task,
firewall on — **5 of 6** runs now pass via honest, caveated `done()`
(previously aborted reliably). Reasoning shows the agent stating it
submitted with no confirmation but no error.
- [x] **Guardrail intact (data honesty):** "report a datum that isn't on
the page" probe — **3 of 3** correctly `abort()`, zero fabrication.
- [x] Unit test asserts the new guidance is present; `pnpm --filter
pilo-core` tests green.

## Note for reviewers
Rigorous research-task regression measurement belongs in a
main-vs-branch comparison eval (the local probes here are a sound sanity
check, not a full sweep). Worth running before merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@lmorchard lmorchard merged commit 33ac1a5 into main Jun 10, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants