Skip to content

[Draft] Generic x402 payment proxy: pre-paid HTTP budgets + in-cluster sidecar route #414

@bussyjd

Description

@bussyjd

Status: draft / sketch. Open for refinement before scoping into work items. Not committing to the exact API surface below.

Background

Two halves of "agent-side x402 spending" already exist:

  • Stateless single-shotbuy.py pay <url> (skill buy-x402, merged in 6eb9a8b) signs one EIP-3009 / Permit2 auth, attaches X-PAYMENT, sends any HTTP method, and returns the response. Works against any type:http x402 endpoint. Max loss = 1 request.
  • Persistent inference budgetsbuy.py buy <name> pre-signs N auths, stores them in a PurchaseRequest CR, and the x402-buyer sidecar in the litellm pod spends them transparently as the agent hits paid/<remote-model> through LiteLLM. Max loss = N × price; runtime path holds zero signer access.
  • Seller sideobol sell http and obol sell demo {hello,blocks,oracle} let anyone gate any in-cluster HTTP service behind x402 ForwardAuth.

Gap

The persistent / sidecar-mediated path is inference-shaped, OpenAI-Chat-only. There's no equivalent for non-LLM HTTP services, and the only LLM wire format covered is OpenAI Chat. Specifically:

A. No pre-paid budgets for HTTP services

An agent that wants to call demo-blocks 200 times to monitor a wallet, or hit a paid OCR / RAG / indexer / market-data endpoint repeatedly, has to re-sign on every call via pay. That burns local-signer round-trips and adds latency to every request. The signing/retry/settlement state machine in internal/x402/buyer/ already handles the persistent case for inference — it just isn't exposed for arbitrary HTTP.

B. No in-cluster proxy endpoint for non-Python consumers

buy.py is a Python skill. Anything else in the cluster — Node.js MCP servers, Go indexers, Rust services in another pod, browser webapps behind the tunnel, other agent runtimes — cannot easily shell out to a Python script. They need an HTTP endpoint they can call to get paid responses back.

C. No transparent tool-call shape for agents

Even from a Python skill, pay is an explicit subprocess call. For an LLM tool call, "pay this URL" should look like a normal HTTP request the model issues, with the sidecar transparently handling 402 retries — same UX as paid/<model> for inference.

D. No Anthropic-native wire format

Claude Code, Cline, Aider, and most "Claude-class" agent tools speak the Anthropic Messages API (POST /v1/messages), not OpenAI Chat. They can't be pointed at the existing paid/* route without an adapter. Today the only path for those tools is to hold an ANTHROPIC_API_KEY and pay Anthropic directly — bypassing the cluster, the marketplace, and any seller's monetized models.

Proposal sketch

Reframe the sidecar as a single payment + retry + settle engine fronted by multiple wire-format adapters, not a single LLM-shaped proxy:

                       ┌──────────────────────────┐
   /v1/chat/completions │ OpenAI Chat adapter      │ ← shipped (paid/* via LiteLLM)
   ───────────────────► │                          │
                       │                          │
   /v1/messages         │ Anthropic Messages       │ ← new — Claude Code / Cline / Aider
   ───────────────────► │ adapter                  │
                       │                          │
   /proxy/<host>/<path> │ Generic HTTP adapter     │ ← new — arbitrary x402 services
   ───────────────────► │                          │
                       └────────────┬─────────────┘
                                    │
                       ┌────────────▼─────────────┐
                       │ Auth pool lookup         │
                       │ (host- or model-keyed)   │
                       │ X-PAYMENT attach         │
                       │ 402 retry                │
                       │ Settle on <400 success   │
                       └──────────────────────────┘

New endpoints on the existing x402-buyer sidecar:

POST /proxy/<host>/<path...>
  - per-host auth pool lookup
  - one auth attached as X-PAYMENT
  - upstream call forwarded verbatim
  - 402 retry with next auth
  - settle on <400 success
  - upstream response returned to caller

POST /v1/messages
  - Anthropic Messages API surface
  - translate request → OpenAI Chat → upstream (or pass-through if upstream is anthropic-native)
  - reuse the same paid/<model> auth pool the sidecar already drives
  - translate response back to Anthropic Messages format

POST /admin/budget
  - extension of the buy.py / PurchaseRequest path for type:http
  - pre-sign N auths against a (host, path-glob) tuple
  - store in a host-keyed pool alongside today's model-keyed pools

Plus controller / skill plumbing:

  • Relax the PurchaseRequest inference-shape assumption: a type:http purchase should NOT publish a paid/<model> LiteLLM route. It should write a buyer-config entry keyed by (host, path-glob) instead of (upstream, model).
  • Extend buy.py buy to accept --type http --path-glob <glob> and produce the right CR shape.
  • Pricing-aware Anthropic adapter: respect the Anthropic-style cache_control markers we now inject by default for Anthropic models, so prompt caching cost savings carry through the paid path.

What this unlocks (concrete examples)

Demo services already in tree

  • demo-blocks polling — 200-call wallet monitor: pre-buy budget once, hit /proxy/obol.stack:8080/services/demo-blocks/... from any pod. Today: 200 separate sign/send round-trips.
  • demo-oracle from a Node dashboard — frontend (or any non-Python service in the cluster) does fetch("http://x402-buyer:8402/proxy/...") and gets paid responses, zero crypto code on the dashboard.

Real monetizable services

  • Self-hosted OCR / transcription / embedding endpoints behind obol sell http, consumed as transparent agent tools.
  • Private RAG corpus charged per-query, used inside a research session with bounded total cost.
  • Custom indexers / Dune-style analytical queries monetized per call.
  • Validator / DV monitoring feeds subscribed to with a daily auth budget.
  • Image / video / voice generation endpoints — bulk batch with one budget instead of N signs.
  • Third-party market data / weather / search — same agent, same code, regardless of upstream.

Cross-runtime

  • MCP servers in the cluster expose tools whose backends are paid x402 endpoints. The MCP server just speaks HTTP to the sidecar — no key handling, no signing, no budget bookkeeping. Every MCP server in the ecosystem becomes spend-aware without code changes.

Adoption: Anthropic-native tools (new with the /v1/messages adapter)

The Anthropic Messages adapter turns every existing Claude-class tool into a potential obol buyer with one env var:

export ANTHROPIC_BASE_URL=http://x402-buyer.llm.svc.cluster.local:8402/anthropic
# or, from the host:
export ANTHROPIC_BASE_URL=http://127.0.0.1:18402/anthropic   # via kubectl port-forward
unset ANTHROPIC_API_KEY
claude   # or cline / aider / any anthropic-native client

From that point on:

  • Claude Code's autonomous loop, tool calls, and subagents work unchanged.
  • Every model call is paid per request from the agent's wallet against whatever upstream the cluster routes to (could be a marketplace seller's hosted Claude-compatible endpoint, a self-hosted GPU box, or a remote Anthropic-fork).
  • No ANTHROPIC_API_KEY required.
  • Live model switching reuses the existing LiteLLM model-routing layer.
  • Spending caps, auto-refill, and audit trail come for free from the existing budget machinery.

The same env-var pattern works for any tool that respects ANTHROPIC_BASE_URL (Claude Code, Cline, Aider, claude-dev, custom integrations). It also gives sellers a much larger addressable audience — anyone with claude installed becomes a potential customer of any model exposed via obol sell http or obol sell inference.

Risk surface

  • Auth pools are already per-upstream-keyed today — a malicious host can only spend auths the user pre-signed for it. Property carries over.
  • SSRF guard required: /proxy/<host> must refuse <host> resolving to internal cluster addresses unless explicitly allowlisted. Same discipline that keeps frontend and erpc HTTPRoutes hostname-restricted in traefik / obol-frontend / erpc namespaces today.
  • The Anthropic adapter must not silently drop fields it can't translate (vision, parallel tool use, MCP-shaped messages). Errors should be 400 with a clear body, not a "successful" payment for a degraded response. Document the supported subset up front.
  • The sidecar is distroless — the new mux routes add no new dependencies.

Suggested split

  • Sidecar route: POST /proxy/<host>/<path...> with per-host pool lookup, retry, settle (internal/x402/buyer/proxy.go).
  • Sidecar route: POST /v1/messages Anthropic Messages adapter — translate request to OpenAI Chat, reuse the existing paid pool, translate response back. Document the supported subset (parity with what LiteLLM's anthropic-passthrough already covers, minus what's intentionally out of scope).
  • Sidecar reload: extend /admin/reload to pick up host-keyed pools alongside model-keyed pools.
  • SSRF allowlist in the sidecar config — reject internal cluster targets by default.
  • Controller: relax PurchaseRequest to handle type:http without publishing a paid/<model> LiteLLM route (internal/serviceoffercontroller/).
  • buy.py buy --type http --path-glob flag in internal/embed/skills/buy-x402/scripts/buy.py.
  • Hermes / OpenClaw skill that exposes the proxy as a tool-call-shaped HTTP target so an LLM tool call looks like a normal fetch.
  • Docs: "Use Claude Code with obol" walkthrough showing the ANTHROPIC_BASE_URL env-var setup, including the port-forward path for host-side use.
  • Integration test extending the existing flow-08-buy.sh / monetize integration test to cover an HTTP budget against demo-blocks.
  • Integration test: Anthropic Messages adapter end-to-end — claude (or a curl shaped like Anthropic Messages) → sidecar /v1/messages → upstream → settled response.

Out of scope for this issue

  • Changing the verifier path (x402-verifier stays verify-only on the cluster-routed paid flow).
  • Adding new chains / tokens — orthogonal to this proxy work.
  • A dedicated CLI surface (obol agent pay …) — can come later as ergonomics on top of the sidecar route.
  • Vision / image input on the Anthropic adapter — explicitly deferred to a follow-up; not all sellers' upstreams support it and the translation layer adds risk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions