Skip to content

test(flow-08): tighten buy-side correctness assertions#466

Draft
bussyjd wants to merge 1 commit into
mainfrom
fix/flow-08-buyer-invariants
Draft

test(flow-08): tighten buy-side correctness assertions#466
bussyjd wants to merge 1 commit into
mainfrom
fix/flow-08-buyer-invariants

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 11, 2026

Summary

A specialist audit of flows/flow-08-buy.sh against the named payment invariants in .claude/skills/obol-stack-dev/references/live-obol-qa.md and references/paid-commerce.md surfaced three correctness gaps. This PR addresses all three plus secondary precision issues from the same review.

1. Buyer-wallet invariant — L211–217 (was)

flow-08 previously funded whatever wallet obol agent wallet list obol-agent returned. If obol stack up generated a random agent wallet, flow-08 happily funded that with anvil_setStorageAt and the test passed — exactly the "do not fund a generated signer to make the test pass" anti-pattern named in live-obol-qa.md.

Now derives the canonical Bob address from .env REMOTE_SIGNER_PRIVATE_KEY using the keccak-of-abi-encode pattern that flow-11-dual-stack.sh already uses (line 794), and asserts AGENT_WALLET == BOB_WALLET before funding.

Heads up: if the default obol-agent is not currently pre-seeded with Bob during release-smoke (flow-04), this assertion will start failing release-smoke. That is the intended forcing function — it surfaces a real gap. The follow-up is to teach the agent provisioner / flow-04 to pre-seed --private-key-file <bob> during obol agent init.

2. Exact balance deltas — L300–330 (was)

Old assertion was "seller balance increased" (post > pre), no buyer-side check, and a swallowed-failure else-branch at L322–323 that emitted pass even when seller balance decreased. Now both sides are checked strictly:

  • post_seller - pre_seller == PAID_AMOUNT
  • pre_buyer - post_buyer == PAID_AMOUNT

with no catch-all pass. Adds a PRE_BUYER_BAL capture next to the existing PRE_SELLER_BAL.

3. Decouple paid-inference correctness from verbatim model wording — L274–281 (was)

Old check required the model to return the literal string "USDC payment smoke test passed.". Payment correctness should not depend on the model's instruction-following (paid-commerce.md: "Do not rely on agent wording"). Replaced with a structural assertion (HTTP 200 + non-empty TEXT). The verbatim match is preserved as a separate informational pass line.

Secondary fixes rolled in

  • Fail-fast on empty PAID_AMOUNT parse (was silent).
  • LITELLM_MASTER_KEY empty now emit_metrics; exit 1 instead of continuing with empty bearer token.
  • x402-buyer auth-pool: exact remaining=$EXPECTED_AUTHS instead of loose remaining=[1-9].
  • New step asserts remaining decremented by exactly 1 after the paid call.
  • Anvil funding poll regex broadened from exact ^1000000000 to ^[1-9][0-9]{8,} so a re-run with pre-existing balance doesn't fail.
  • Unused BUY_AUTH_COUNT=5 removed; now derived as EXPECTED_AUTHS and actually asserted.

Test plan

  • bash -n flows/flow-08-buy.sh — syntax clean
  • Cluster smoke (flows/release-smoke.sh) on spark1 — currently in flight against main as tmux qa-release-20260511-193603. Will re-run against this branch and report.
  • On a cluster where obol-agent was created with --private-key-file <bob-derived>: confirm flow-08 passes end-to-end with exact deltas and the new sidecar-decrement step.

@bussyjd bussyjd marked this pull request as draft May 11, 2026 13:14
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 11, 2026

Smoke evidence — assertion fires exactly as designed

Ran flows/release-smoke.sh (no OBOL flags) against this branch on a Linux/arm64 host. Bob assertion fires verbatim at flow-08 step [6]:

```
STEP: [6] Agent wallet matches deterministic Bob
FAIL: [6] Agent wallet 0x8a4f0e83306d666E3B74121b38b24CA3AeC39Fd5
!= deterministic Bob 0x57b0eF875DeB5A37301F1640E469a2129Da9490E
(preseed missing; obol-agent must be created with
REMOTE_SIGNER_PRIVATE_KEY-derived Bob — see references/live-obol-qa.md)
```

The default obol-agent was generated with a random key (0x8a4f…9Fd5), not the deterministic Bob derived from .env REMOTE_SIGNER_PRIVATE_KEY (0x57b0eF…490E). That's exactly the silent gap on main — this PR makes it loud and named at the right step.

Comparison vs main (same .env, same host)

main (pre-PR) this PR
Where the wrong wallet is detected nowhere — flow-08 funds it on Anvil and proceeds step [6] Agent wallet matches deterministic Bob
Time-to-failure ~5–10 min (gets all the way to paid call + receipt poll) ~10 s (right after wallet read)
Symptom FAIL: [12] Paid inference 503 + FAIL: [13] No Transfer log — misleading: looks like settlement broke FAIL: [6] preseed missing — names the actual cause + the fix path
Operator action implied by the failure dig into verifier / facilitator / sidecar logs follow the message: pre-seed obol-agent with Bob (separate PR)

Cascade after [6] (expected, derivative, not new findings)

Once [6] fails, the next four checks in flow-08 inevitably fail because they all assume a properly-funded buyer:

```
FAIL: [8] Agent wallet funded on local Anvil — pattern not found after 120s
FAIL: [9] obol buy inference failed
FAIL: [10] PurchaseRequest Ready — not found after 180s
FAIL: [11] x402-buyer has exactly 5 auths — pattern 'flow08-paid: remaining=5 ' not found after 180s
```

All four are noise around the same root cause. Optionally we could short-circuit after [6] to keep the artifact cleaner, but I'd leave them as-is — they're useful diagnostic context if [6] passes someday and one of them still fails.

Other smoke notes

  • All flows before flow-08 (flow-01..07 + flow-10) — PASS end-to-end on this branch.
  • flow-11 (dual-stack USDC, independent of this PR — touches no files this branch modifies) hit a separate Payment verification failed (503) at step [43] after Bob's PurchaseRequest reached Ready and the sidecar showed exactly 5 auths. Filing a separate investigation ticket for that one.
  • Full log archived locally: release-smoke-20260511-203607.log (1036 lines).

Suggested merge order

  1. This PR can land now — the assertion is doing its job; the four cascading FAILs after [6] are derivative.
  2. Follow-up PR needed: teach obol agent init (or flow-04) to pre-seed the default obol-agent with the keccak-derived Bob when .env REMOTE_SIGNER_PRIVATE_KEY is present. Without that, release-smoke will continue to red on this step until the wider invariant is honored.

@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 11, 2026

Follow-up: same 503 symptom on flow-08 — root cause is Anvil staleness

Reproduced the flow-11 step [43] Payment verification failed (503) on flow-08 today after merging the Bob-preseed work into the local cluster:

  1. Cluster obol stack up → imported deterministic Bob into obol-agent (0x57b0eF…490E).
  2. flow-06: 33/33 PASS.
  3. flow-10: 14/14 PASS (Anvil + facilitator reused from earlier sessions, ~3h old).
  4. flow-08: step [6] PASS (Bob match), but step [12] Paid inference failed → 503 Payment verification failed. Cascading FAILs at [13]/[14]/[15]/[16] (same shape as test(flow-08): tighten buy-side correctness assertions #466's flow-11 note).

Root cause

Facilitator (x402-rs/x402-facilitator:1.4.7) logs on every /verify:

ERROR ... verify_eip3009_payment ... error=Onchain error: server returned an error response:
error code -32603: failed to get storage for 0x036CbD53842c5426634e7929541eC2318f3dCF7e at <slot>:
server returned an error response: error code -32603: state at block #41314522 is pruned

The facilitator's eth_call against Anvil resolves the USDC storage slot at the fork-base block. Anvil forwards to its --fork-url (https://base-sepolia-rpc.publicnode.com, non-archive), which has pruned that historical state. Anvil's --prune-history 1000000 doesn't help — the missing state lives upstream.

Fix (verified)

Restarted Anvil + facilitator with a fresh fork (new base block 41372240) and re-ran flow-08 against the same cluster. Result:

  • step [12] PASS — HTTP 200 on paid/qwen3.5:9b
  • step [13] PASS — settlement tx 0x01739b7ff93154f894ed6d315dbd349775559bb1ae1f19fd0573713b5fd3dbab
  • step [14]/[15] PASS — seller +1000, buyer −1000 (exact micro-USDC delta)

The 3 remaining FAILs (steps 8/11/16) are stale-state artifacts from the first run's auths still being in the sidecar pool — not flow correctness issues.

Suggested follow-up (separate PR, not this one)

flow-10 currently reuses a running Anvil if port 8545 is bound. The reuse path is what put us in the pruned-state window. Options:

  1. Always start a fresh Anvil in flow-10 — cheapest and most reliable. Trade-off: slower release-smoke on warm runs.
  2. Use a multi-upstream archive RPC as --fork-url. Anvil itself has no native multi-RPC failover (--fork-url is single-valued, confirmed against anvil --help). Options: paid archive (Alchemy/QuickNode), a multi-provider gateway (dRPC/llamarpc), or pointing Anvil at the cluster's own eRPC once it's up.
  3. Refresh fork-block on reuse — detect drift and SIGTERM Anvil if its fork base is older than N hours.

Also worth a tickbox: anvil 1.0.0-stable (2025-02-13) is the version installed by obolup.sh. Latest Foundry stable is v1.7.1. Bumping Foundry is independent but overdue.

… review

Three correctness gaps surfaced by an audit against the named payment
invariants in references/live-obol-qa.md and references/paid-commerce.md:

1. Buyer-wallet invariant. flow-08 previously funded whatever wallet the
   default obol-agent happened to generate at stack init — the exact
   "do not fund a generated signer" anti-pattern named in the live-OBOL
   QA reference. Now derives the deterministic Bob address from
   .env REMOTE_SIGNER_PRIVATE_KEY (the canonical keccak-of-abi-encode
   pattern used by flow-11/13/14) and asserts AGENT_WALLET == BOB_WALLET
   before funding. The flow header documents the upstream pre-seed
   requirement.

2. Exact balance deltas. Replaces "seller balance increased" + missing
   buyer-side check with strict pre/post deltas on both sides:
   post_seller - pre_seller == PAID_AMOUNT AND
   pre_buyer - post_buyer == PAID_AMOUNT. Also removes a swallowed-
   failure else-branch that emitted `pass` when the seller balance had
   neither increased nor stayed equal (i.e. decrease was reported as
   pass).

3. Decouple paid-inference correctness from model wording. The pre-
   existing assertion required the model to return the verbatim string
   "USDC payment smoke test passed." Replaced with a structural check:
   HTTP 200 + non-empty TEXT. The verbatim match is kept as a separate
   informational `pass` line. Aligns with paid-commerce.md ("do not
   rely on agent wording").

Secondary correctness tightenings rolled in:

- Fail-fast on empty PAID_AMOUNT from the 402 body (previously silent;
  only surfaced much later at the settlement-receipt step).
- Master-key read failure now `emit_metrics; exit 1` instead of
  continuing with an empty bearer token.
- x402-buyer auth-pool assertion now requires the exact expected count
  (EXPECTED_AUTHS, derived from BUY_BUDGET_USDC / per-request price)
  instead of the loose `remaining=[1-9]` (single-digit) pattern.
- New post-call step asserts remaining decremented by exactly 1 — the
  spend-proof half of the sidecar contract.
- Anvil funding poll regex broadened from exact `^1000000000 ` to
  `^[1-9][0-9]{8,} ` so a re-run with pre-existing balance doesn't
  fail the poll.

The unused BUY_AUTH_COUNT=5 declaration is removed; the same value is
now derived and asserted via EXPECTED_AUTHS.
@OisinKyne OisinKyne force-pushed the fix/flow-08-buyer-invariants branch from 842473e to e35d872 Compare May 11, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants