Skip to content

feat: report a performed action with no confirmation as done(), not abort#535

Merged
lmorchard merged 1 commit into
mainfrom
feat/done-abort-no-confirmation
Jun 10, 2026
Merged

feat: report a performed action with no confirmation as done(), not abort#535
lmorchard merged 1 commit into
mainfrom
feat/done-abort-no-confirmation

Conversation

@lmorchard

Copy link
Copy Markdown
Collaborator

Summary

Refines the done()/abort() guidance in the action-loop prompt. Today, a form submit that returns no error but shows no explicit confirmation is treated as "unverified," pushing the agent to abort(). That's miscalibrated for action tasks: the honest outcome is "submitted; no confirmation shown, but no error either." This change lets the agent report such cases with done() (caveated), while keeping abort() for unverified data and blocked core steps.

Two scoped edits to the "Before calling done()" block in prompts.ts:

  • Form-submit check now distinguishes a validation error (did NOT submit → fix/retry) from "neither confirmation nor error" (normal on many sites → done() stating no confirmation was shown).
  • The abort-on-uncertain rule is narrowed to information/data you must return, or a blocked core step — explicitly not an action you performed that produced no error.

Why separate from #534

This is a global agent-behavior change (affects every task), distinct from the upload-files feature. Reviewers should weigh it on its own.

Validation

  • Target (action tasks): browser-use ember-form upload task, firewall on — 5 of 6 runs now pass via honest, caveated done() (previously aborted reliably). Reasoning shows the agent stating it submitted with no confirmation but no error.
  • Guardrail intact (data honesty): "report a datum that isn't on the page" probe — 3 of 3 correctly abort(), zero fabrication.
  • Unit test asserts the new guidance is present; pnpm --filter pilo-core tests green.

Note for reviewers

Rigorous research-task regression measurement belongs in a main-vs-branch comparison eval (the local probes here are a sound sanity check, not a full sweep). Worth running before merge.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 10, 2026 00:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the core action-loop system prompt guidance so that when an agent submits a form and observes neither a success confirmation nor a validation error (i.e., no explicit confirmation but also no error), it reports the outcome via done() with an explicit caveat rather than treating it as “unverified” and calling abort().

Changes:

  • Updates the “Before calling done()” checklist to distinguish validation errors (treat as not submitted; retry) from “no confirmation and no error” (treat as submitted; done() with caveat).
  • Narrows the “abort on uncertainty” rule to apply to unverified data or outright blocked core steps, not to performed actions lacking explicit confirmation.
  • Adds a unit test asserting the updated guidance text is present in the built system prompt.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/core/src/prompts.ts Refines the action-loop prompt guidance around form submission verification and when to use done() vs abort().
packages/core/test/prompts.test.ts Adds a regression test that asserts the updated guidance appears in the generated prompt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/prompts.ts Outdated
Comment thread packages/core/src/prompts.ts Outdated
- Does your answer match the requested format?
3. Verify actions actually completed by checking the most recent page state:
- If you submitted a form, did the next page confirm success?
- If you submitted a form, look for a success message OR a validation error. A validation error means it did NOT submit — fix and retry. Seeing NEITHER (no confirmation, but no error) is a normal outcome on many sites: treat the submission as completed and report it with done(), stating explicitly that no confirmation was shown.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split into nested sub-bullets:

  • If you submitted a form, look for a success message or a validation error:
    • A validation error means it did NOT submit — fix and retry.
    • Neither a confirmation nor an error is a normal outcome on many sites: treat the submission as completed and report it with done(), stating explicitly that no confirmation was shown.

Meaning is unchanged and the test-asserted phrases (no confirmation was shown, is NOT "unverified") are intact. This is a clarity-only reflow; I didn't re-run the (nondeterministic) eval validation for a copy change, but the prompts unit test passes.

The done/abort guidance treated a form submit that produced no error and no
explicit success message as 'unverified', pushing the agent to abort. Refine it
so a performed action with no error but no confirmation is reported via done()
with a caveat, while abort remains for unverified data and blocked core steps.
@lmorchard lmorchard force-pushed the feat/done-abort-no-confirmation branch from 9ad9599 to a3a5a20 Compare June 10, 2026 21:25
@lmorchard lmorchard merged commit d2d9fb1 into main Jun 10, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants