[Draft] Generic x402 payment proxy: pre-paid HTTP budgets + in-cluster sidecar route

> **Status: draft / sketch.** Open for refinement before scoping into work items. Not committing to the exact API surface below.

## Background

Two halves of "agent-side x402 spending" already exist:

- **Stateless single-shot** — `buy.py pay <url>` (skill `buy-x402`, merged in `6eb9a8b`) signs one EIP-3009 / Permit2 auth, attaches `X-PAYMENT`, sends any HTTP method, and returns the response. Works against any `type:http` x402 endpoint. Max loss = 1 request.
- **Persistent inference budgets** — `buy.py buy <name>` pre-signs N auths, stores them in a `PurchaseRequest` CR, and the `x402-buyer` sidecar in the `litellm` pod spends them transparently as the agent hits `paid/<remote-model>` through LiteLLM. Max loss = N × price; runtime path holds zero signer access.
- **Seller side** — `obol sell http` and `obol sell demo {hello,blocks,oracle}` let anyone gate any in-cluster HTTP service behind x402 ForwardAuth.

## Gap

The persistent / sidecar-mediated path is **inference-shaped, OpenAI-Chat-only**. There's no equivalent for non-LLM HTTP services, and the only LLM wire format covered is OpenAI Chat. Specifically:

### A. No pre-paid budgets for HTTP services
An agent that wants to call `demo-blocks` 200 times to monitor a wallet, or hit a paid OCR / RAG / indexer / market-data endpoint repeatedly, has to re-sign on every call via `pay`. That burns local-signer round-trips and adds latency to every request. The signing/retry/settlement state machine in `internal/x402/buyer/` already handles the persistent case for inference — it just isn't exposed for arbitrary HTTP.

### B. No in-cluster proxy endpoint for non-Python consumers
`buy.py` is a Python skill. Anything else in the cluster — Node.js MCP servers, Go indexers, Rust services in another pod, browser webapps behind the tunnel, other agent runtimes — cannot easily shell out to a Python script. They need an HTTP endpoint they can call to get paid responses back.

### C. No transparent tool-call shape for agents
Even from a Python skill, `pay` is an explicit subprocess call. For an LLM tool call, "pay this URL" should look like a normal HTTP request the model issues, with the sidecar transparently handling 402 retries — same UX as `paid/<model>` for inference.

### D. No Anthropic-native wire format
Claude Code, Cline, Aider, and most "Claude-class" agent tools speak the Anthropic Messages API (`POST /v1/messages`), not OpenAI Chat. They can't be pointed at the existing `paid/*` route without an adapter. Today the only path for those tools is to hold an `ANTHROPIC_API_KEY` and pay Anthropic directly — bypassing the cluster, the marketplace, and any seller's monetized models.

## Proposal sketch

Reframe the sidecar as **a single payment + retry + settle engine fronted by multiple wire-format adapters**, not a single LLM-shaped proxy:

```
                       ┌──────────────────────────┐
   /v1/chat/completions │ OpenAI Chat adapter      │ ← shipped (paid/* via LiteLLM)
   ───────────────────► │                          │
                       │                          │
   /v1/messages         │ Anthropic Messages       │ ← new — Claude Code / Cline / Aider
   ───────────────────► │ adapter                  │
                       │                          │
   /proxy/<host>/<path> │ Generic HTTP adapter     │ ← new — arbitrary x402 services
   ───────────────────► │                          │
                       └────────────┬─────────────┘
                                    │
                       ┌────────────▼─────────────┐
                       │ Auth pool lookup         │
                       │ (host- or model-keyed)   │
                       │ X-PAYMENT attach         │
                       │ 402 retry                │
                       │ Settle on <400 success   │
                       └──────────────────────────┘
```

New endpoints on the existing `x402-buyer` sidecar:

```
POST /proxy/<host>/<path...>
  - per-host auth pool lookup
  - one auth attached as X-PAYMENT
  - upstream call forwarded verbatim
  - 402 retry with next auth
  - settle on <400 success
  - upstream response returned to caller

POST /v1/messages
  - Anthropic Messages API surface
  - translate request → OpenAI Chat → upstream (or pass-through if upstream is anthropic-native)
  - reuse the same paid/<model> auth pool the sidecar already drives
  - translate response back to Anthropic Messages format

POST /admin/budget
  - extension of the buy.py / PurchaseRequest path for type:http
  - pre-sign N auths against a (host, path-glob) tuple
  - store in a host-keyed pool alongside today's model-keyed pools
```

Plus controller / skill plumbing:

- Relax the `PurchaseRequest` inference-shape assumption: a `type:http` purchase should NOT publish a `paid/<model>` LiteLLM route. It should write a buyer-config entry keyed by `(host, path-glob)` instead of `(upstream, model)`.
- Extend `buy.py buy` to accept `--type http --path-glob <glob>` and produce the right CR shape.
- Pricing-aware Anthropic adapter: respect the Anthropic-style `cache_control` markers we now inject by default for Anthropic models, so prompt caching cost savings carry through the paid path.

## What this unlocks (concrete examples)

### Demo services already in tree
- **`demo-blocks` polling** — 200-call wallet monitor: pre-buy budget once, hit `/proxy/obol.stack:8080/services/demo-blocks/...` from any pod. Today: 200 separate sign/send round-trips.
- **`demo-oracle` from a Node dashboard** — frontend (or any non-Python service in the cluster) does `fetch("http://x402-buyer:8402/proxy/...")` and gets paid responses, zero crypto code on the dashboard.

### Real monetizable services
- Self-hosted **OCR / transcription / embedding** endpoints behind `obol sell http`, consumed as transparent agent tools.
- **Private RAG corpus** charged per-query, used inside a research session with bounded total cost.
- **Custom indexers / Dune-style analytical queries** monetized per call.
- **Validator / DV monitoring feeds** subscribed to with a daily auth budget.
- **Image / video / voice generation** endpoints — bulk batch with one budget instead of N signs.
- **Third-party market data / weather / search** — same agent, same code, regardless of upstream.

### Cross-runtime
- **MCP servers** in the cluster expose tools whose backends are paid x402 endpoints. The MCP server just speaks HTTP to the sidecar — no key handling, no signing, no budget bookkeeping. Every MCP server in the ecosystem becomes spend-aware without code changes.

### Adoption: Anthropic-native tools (new with the `/v1/messages` adapter)
The Anthropic Messages adapter turns every existing Claude-class tool into a potential obol buyer with one env var:

```bash
export ANTHROPIC_BASE_URL=http://x402-buyer.llm.svc.cluster.local:8402/anthropic
# or, from the host:
export ANTHROPIC_BASE_URL=http://127.0.0.1:18402/anthropic   # via kubectl port-forward
unset ANTHROPIC_API_KEY
claude   # or cline / aider / any anthropic-native client
```

From that point on:
- Claude Code's autonomous loop, tool calls, and subagents work unchanged.
- Every model call is paid per request from the agent's wallet against whatever upstream the cluster routes to (could be a marketplace seller's hosted Claude-compatible endpoint, a self-hosted GPU box, or a remote Anthropic-fork).
- No `ANTHROPIC_API_KEY` required.
- Live model switching reuses the existing LiteLLM model-routing layer.
- Spending caps, auto-refill, and audit trail come for free from the existing budget machinery.

The same env-var pattern works for any tool that respects `ANTHROPIC_BASE_URL` (Claude Code, Cline, Aider, claude-dev, custom integrations). It also gives **sellers** a much larger addressable audience — anyone with `claude` installed becomes a potential customer of any model exposed via `obol sell http` or `obol sell inference`.

## Risk surface

- Auth pools are already per-upstream-keyed today — a malicious host can only spend auths the user pre-signed for *it*. Property carries over.
- **SSRF guard required**: `/proxy/<host>` must refuse `<host>` resolving to internal cluster addresses unless explicitly allowlisted. Same discipline that keeps `frontend` and `erpc` HTTPRoutes hostname-restricted in `traefik` / `obol-frontend` / `erpc` namespaces today.
- The Anthropic adapter must not silently drop fields it can't translate (vision, parallel tool use, MCP-shaped messages). Errors should be 400 with a clear body, not a "successful" payment for a degraded response. Document the supported subset up front.
- The sidecar is distroless — the new mux routes add no new dependencies.

## Suggested split

- [ ] Sidecar route: `POST /proxy/<host>/<path...>` with per-host pool lookup, retry, settle (`internal/x402/buyer/proxy.go`).
- [ ] Sidecar route: `POST /v1/messages` Anthropic Messages adapter — translate request to OpenAI Chat, reuse the existing paid pool, translate response back. Document the supported subset (parity with what LiteLLM's anthropic-passthrough already covers, minus what's intentionally out of scope).
- [ ] Sidecar reload: extend `/admin/reload` to pick up host-keyed pools alongside model-keyed pools.
- [ ] SSRF allowlist in the sidecar config — reject internal cluster targets by default.
- [ ] Controller: relax `PurchaseRequest` to handle `type:http` without publishing a `paid/<model>` LiteLLM route (`internal/serviceoffercontroller/`).
- [ ] `buy.py buy --type http --path-glob` flag in `internal/embed/skills/buy-x402/scripts/buy.py`.
- [ ] Hermes / OpenClaw skill that exposes the proxy as a tool-call-shaped HTTP target so an LLM tool call looks like a normal `fetch`.
- [ ] Docs: "Use Claude Code with obol" walkthrough showing the `ANTHROPIC_BASE_URL` env-var setup, including the port-forward path for host-side use.
- [ ] Integration test extending the existing `flow-08-buy.sh` / monetize integration test to cover an HTTP budget against `demo-blocks`.
- [ ] Integration test: Anthropic Messages adapter end-to-end — `claude` (or a curl shaped like Anthropic Messages) → sidecar `/v1/messages` → upstream → settled response.

## Out of scope for this issue

- Changing the verifier path (`x402-verifier` stays verify-only on the cluster-routed paid flow).
- Adding new chains / tokens — orthogonal to this proxy work.
- A dedicated CLI surface (`obol agent pay …`) — can come later as ergonomics on top of the sidecar route.
- Vision / image input on the Anthropic adapter — explicitly deferred to a follow-up; not all sellers' upstreams support it and the translation layer adds risk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Generic x402 payment proxy: pre-paid HTTP budgets + in-cluster sidecar route #414

Background

Gap

A. No pre-paid budgets for HTTP services

B. No in-cluster proxy endpoint for non-Python consumers

C. No transparent tool-call shape for agents

D. No Anthropic-native wire format

Proposal sketch

What this unlocks (concrete examples)

Demo services already in tree

Real monetizable services

Cross-runtime

Adoption: Anthropic-native tools (new with the `/v1/messages` adapter)

Risk surface

Suggested split

Out of scope for this issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Draft] Generic x402 payment proxy: pre-paid HTTP budgets + in-cluster sidecar route #414

Description

Background

Gap

A. No pre-paid budgets for HTTP services

B. No in-cluster proxy endpoint for non-Python consumers

C. No transparent tool-call shape for agents

D. No Anthropic-native wire format

Proposal sketch

What this unlocks (concrete examples)

Demo services already in tree

Real monetizable services

Cross-runtime

Adoption: Anthropic-native tools (new with the /v1/messages adapter)

Risk surface

Suggested split

Out of scope for this issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Adoption: Anthropic-native tools (new with the `/v1/messages` adapter)