Skip to content

feat(si): sponsored_context_accountability storyboard (first pass)#5551

Open
kapoost wants to merge 4 commits into
adcontextprotocol:mainfrom
kapoost:sponsored-context-accountability-storyboard
Open

feat(si): sponsored_context_accountability storyboard (first pass)#5551
kapoost wants to merge 4 commits into
adcontextprotocol:mainfrom
kapoost:sponsored-context-accountability-storyboard

Conversation

@kapoost

@kapoost kapoost commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

First-pass SI brand-side conformance storyboard for the surfaces shipped in #5501, scoped per the discussion under #5541.

What's in the box

One yaml at static/compliance/source/protocols/sponsored-intelligence/sponsored-context-accountability.yaml with four phases:

Phase What it asserts
presentation_only_happy_path Brand emits a full sponsored_context envelope (paying_principal nested, context_use, disclosure_obligation, declared_by.role=brand_agent), host returns an accepted receipt with matching accepted_context_use, second turn lands cleanly.
required_disclosure_commitment Host's receipt carries disclosure_commitment.status=accepted; agent accepts the well-formed receipt regardless of whether its own declaration carried required=true.
rejected_receipt Host returns host_receipt.status=rejected with a rejection_reason; agent accepts the rejection as a valid wire response.
silent_downgrade_rejected Host returns an accepted receipt with a downgraded accepted_context_use; agent MUST reject. Regression anchor on error_code ∈ {VALIDATION_ERROR, INVALID_REQUEST}; recommended error message wording ("silent downgrade forbidden") is documented in expected: as a manual-review pointer, not a hard assertion.

Design choices, per the thread on #5541

  • Existing storyboard primitives onlyresponse_schema, field_present, field_value, error_code. No new assertion language.
  • Single yaml, four phases — keeps the review surface small; the contract is visible together. Bill's preference from the thread.
  • LLM-generated content asserted as present/non-empty only — language and model provider stay implementation choices, so a brand agent answering in French via Mistral conforms identically to one answering in English via Claude.
  • Mismatch path decoupled from transport — assertion is on error_code (accepting VALIDATION_ERROR or INVALID_REQUEST); recommended error message wording is captured in expected: as a manual-review pointer.

Empirical surface

kapoost/bragent v0.2.0+ was the reference implementation from which the assertions were derived. The storyboard itself does not depend on bragent being running — the suite stays self-contained.

Open question for review

Whether to also assert that the rejected receipt surfaces in an audit endpoint. bragent has the shape (admin-gated /sessions/<id>/audit returns the dual-trail with rejection_reason round-tripped) but it is bragent-specific rather than a spec invariant, so this first pass stays on the wire-level SI surfaces. Happy to add an audit phase as a follow-up commit if the spec wants to elevate audit to a normative SI surface.


cc @sangilish (matrix authoring per the #5541 thread) and @bokelley (PR #5501 author).

Refs adcontextprotocol#5541, adcontextprotocol#5486, adcontextprotocol#5501.

Adds compliance/source/protocols/sponsored-intelligence/sponsored-context-accountability.yaml — a brand-side conformance storyboard exercising the PR adcontextprotocol#5501 sponsored_context envelope and sponsored_context_receipt allOf invariants in four phases inside a single yaml (small review surface, whole contract visible together).

Phases:

- presentation_only_happy_path
- required_disclosure_commitment
- rejected_receipt
- silent_downgrade_rejected (regression anchor: error.message contains "silent downgrade forbidden"; error code is one of INVALID_PARAMS / VALIDATION_ERROR / INVALID_REQUEST so the invariant decouples from any single transport)

Uses existing storyboard matchers only (response_schema, field_present, field_value, field_value_or_absent, field_contains, error_code). LLM-generated response.message is asserted as present/non-empty only, so language and model provider remain implementation choices.

The reference implementation kapoost/bragent (v0.2.0+) served as the empirical surface from which the assertions were derived; the storyboard itself does not depend on bragent being running.

Open question for review: whether to also assert that rejected receipts surface in an audit endpoint. bragent has the shape (admin-gated /sessions/<id>/audit returns the dual-trail) but it is bragent-specific rather than a spec invariant, so this first pass stays on the wire-level SI surfaces.

@aao-release-bot aao-release-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Argus review could not complete

The automated review encountered an issue (possibly reached max turns, timed out, or failed to post the final gh pr review). A human reviewer should take this PR.

View workflow run

This is an automated message from the Argus AI review workflow.

@aao-release-bot aao-release-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Argus review could not complete

The automated review encountered an issue (possibly reached max turns, timed out, or failed to post the final gh pr review). A human reviewer should take this PR.

View workflow run

This is an automated message from the Argus AI review workflow.

Three corrections after first push surfaced CI failures:

- Inline literal sponsored_context in every receipt sample_request
  instead of `$context.sponsored_context`. The schema lint substitutes
  string placeholders only — object-shaped fields need a literal so
  the allOf chain in si-sponsored-context-receipt.json can validate.
- Allow-grow the deliberately schema-invalid mismatch sample
  (silent_downgrade_rejected/si_send_message_with_mismatched_receipt)
  into tests/storyboard-sample-request-schema-allowlist.json. The
  whole point of the step is to send an envelope the spec rejects.
- Drop the field_contains assertion on `error.message` containing
  "silent downgrade forbidden". The storyboard primitives don't
  currently expose an error-message matcher and Bill's call on adcontextprotocol#5541
  was to avoid adding a new assertion language in the first pass. The
  wording recommendation now lives in the step's `expected:` text as
  a manual review pointer; promoting it to a hard check is a
  follow-up if the WG wants an `error_message_contains` matcher.

Also constrained error_code to the canonical AdCP enum
(VALIDATION_ERROR, INVALID_REQUEST) per the error-code lint —
INVALID_PARAMS is JSON-RPC, not AdCP.

All lint + integration tests now pass locally (264 test files,
4162 tests).
@kapoost kapoost force-pushed the sponsored-context-accountability-storyboard branch from e1edd46 to dec0547 Compare June 15, 2026 18:43

@aao-release-bot aao-release-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Argus review could not complete

The automated review encountered an issue (possibly reached max turns, timed out, or failed to post the final gh pr review). A human reviewer should take this PR.

View workflow run

This is an automated message from the Argus AI review workflow.

@kapoost

kapoost commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Force-pushed twice — heads-up so the review history reads cleanly:

  1. First push (e1edd466) — aligned the storyboard with the schema lint suite: inlined literal sponsored_context in every receipt sample_request (the $context.sponsored_context placeholder fails the object-shape validator), --allow-growed the deliberately schema-invalid mismatch sample into tests/storyboard-sample-request-schema-allowlist.json, dropped the field_contains assertion on error.message since the runner doesn't expose an error-message matcher today (kept the recommended wording in expected: as a manual-review pointer per Bill's "first pass on existing primitives only" call on feat(si): bragent ships brand-side reference impl for sponsored_context_accountability (PR #5501) #5541), and pinned error_code.allowed_values to canonical AdCP enum (VALIDATION_ERROR, INVALID_REQUEST) — INVALID_PARAMS was JSON-RPC.

  2. Second push (dec0547f6) — reverted an unintended package-lock.json drift (-403 lines) that my local npm install introduced when I ran the lint suite. The npm-ci install step was the only thing keeping CI red on the previous attempt; lockfile is now byte-identical to upstream/main.

Local npm run test passes (264 test files, 4162 tests). All 24 CI checks green on this PR. Argus AI review timed out three times — not a content judgement (the workflow itself didn't complete), so a human pair of eyes from the WG is the next step.

Ready for review.

@sangilish

Copy link
Copy Markdown
Contributor

This shape looks right to me. Thanks for keeping it to one YAML, four inline phases, and existing storyboard matchers only.

+1 on leaving the audit-endpoint round trip out of this first pass. I’d keep #5551 at the wire-level SI conformance boundary: sponsored_context declaration, accepted receipt, rejected receipt, and mismatch rejection. Audit readback feels like a follow-up only if/when the WG decides there is a normative SI audit surface rather than a bragent-local admin API.

One small cleanup before merge: after the follow-up commit, the YAML no longer hard-asserts error.message / field_contains, and INVALID_PARAMS was removed from the accepted error codes. The changeset and/or PR description still seem to mention the first-push version in a couple places. I’d sync those to the final behavior so the release note does not overstate what the storyboard asserts.

@bokelley

Copy link
Copy Markdown
Contributor

PR description updated to match the final YAML:

  • silent_downgrade_rejected row: replaced the error.message/field_contains/INVALID_PARAMS framing with the actual assertion (error_code ∈ {VALIDATION_ERROR, INVALID_REQUEST}; message wording is a manual-review pointer in expected:, not a hard check).
  • Design choices: removed field_value_or_absent and field_contains from the primitives list (neither appears in the final YAML); updated the mismatch-path bullet to reflect the error_code anchor.

One item I can't push to directly since it lives on @kapoost's branch: the changeset at .changeset/5541-sponsored-context-accountability-storyboard.md. Two lines need the same treatment:

  1. The silent_downgrade_rejected bullet currently says the regression anchor is on error.message and lists INVALID_PARAMS. Should match the PR description above.
  2. The matchers sentence lists field_value_or_absent and field_contains. Should be response_schema, field_present, field_value, error_code.

@kapoost, a small push to your branch with those two changeset edits will close the loop.


Generated by Claude Code

Two lines drifted from the YAML during the lint-alignment force-push.
Match the PR description bokelley updated:

- silent_downgrade_rejected bullet now reflects error_code ∈
  {VALIDATION_ERROR, INVALID_REQUEST}; the "silent downgrade
  forbidden" message wording is a manual-review pointer in
  `expected:`, not a hard check.
- Matchers sentence trimmed to the four primitives the final yaml
  actually uses: response_schema, field_present, field_value,
  error_code. field_value_or_absent and field_contains were
  removed during the lint-alignment pass.
@kapoost

kapoost commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@bokelley — done in 60f2b220. Two changeset lines now match the PR description: silent_downgrade_rejected bullet rewritten around error_code ∈ {VALIDATION_ERROR, INVALID_REQUEST} with the "silent downgrade forbidden" wording as a manual-review pointer in expected:, and the matchers sentence trimmed to the four the final yaml actually uses (response_schema, field_present, field_value, error_code). Thanks for the catch.

@aao-release-bot aao-release-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Argus review could not complete

The automated review encountered an issue (possibly reached max turns, timed out, or failed to post the final gh pr review). A human reviewer should take this PR.

View workflow run

This is an automated message from the Argus AI review workflow.

@bokelley

Copy link
Copy Markdown
Contributor

I think this is the right shape overall, but I would like one cleanup before approval:

  1. Receipt substitution / prior declaration check: the YAML now says the runtime substitutes the literal sponsored_context receipt body with the one emitted by the prior step, but the storyboard runner only substitutes explicit placeholder strings like $context.sponsored_context. The current samples send fixed Acme literals, so this no longer actually tests “host echoes the brand’s prior sponsored_context”; it tests “agent accepts this static Acme receipt.” It could also false-fail a conforming brand agent that emits a different principal and validates receipts against its own prior declaration.

    Please either restore sponsored_context: "$context.sponsored_context" for those receipt samples if object-shaped context substitution now works for the runner/lint path, or remove the runtime-substitution claim and explicitly scope the storyboard to the Acme fixture.

  2. Normative wording: a few lines still read like baseline SI requires every response to carry sponsored_context (for example the summary/narrative and the validation description). Current SI docs frame this as MAY / brand agents SHOULD; this storyboard can be stricter as an opt-in sponsored_context_accountability target, but the prose should say that explicitly rather than implying baseline SI conformance.

  3. Changeset drift: the disclosure-commitment bullet still says the agent accepts the receipt “regardless of whether its own declaration carried required=true,” but the YAML now deliberately uses a required=true literal. That line should match the final phase behavior.

CI is green and I agree with leaving audit-endpoint readback out of this first pass. This just needs the fixture/prose alignment so the storyboard is testing the contract it describes.

@bokelley

Copy link
Copy Markdown
Contributor

All three fixes are clear. The PR branch is in a fork outside my push scope, so here are the exact replacements for @kapoost to land in one commit.


1. Remove the runtime-substitution claim

prerequisites.description — replace the first paragraph:

The harness acts as the host: it sends sample requests and echoes back synthesised receipts whose sponsored_context field is the one previously emitted by the agent (the runtime substitution replaces the literal sample_request envelope with the real prior emission).

with:

The harness acts as the host: it sends sample requests carrying static Acme outdoor receipt literals. The storyboard runner does not perform object-shaped context substitution, so these receipts are fixed Acme fixtures — not echoes of the agent's own prior emission. The assertions test receipt acceptance and rejection semantics, not principal-matching against the agent's own declaration.

Also in the si_send_message_presentation_accepted step narrative, remove the last sentence:

The runtime substitutes the literal sponsored_context envelope below with the one returned by the prior step.


2. Normative wording

summary: field:

-"Validates that a conformant brand-side SI agent emits sponsored_context on every response, accepts well-formed sponsored_context_receipt envelopes, and rejects receipts that silently downgrade declared context_use."
+"Validates that a brand-side SI agent claiming sponsored_context_accountability conformance emits sponsored_context on each response, accepts well-formed sponsored_context_receipt envelopes, and rejects receipts that silently downgrade declared context_use."

narrative: top-level, first sentence:

-AdCP 3.1.0-rc.13/rc.14 introduced the sponsored_context_accountability surface (PR #5501): every SI
-response carries a `sponsored_context` envelope
+AdCP 3.1.0-rc.13/rc.14 introduced the sponsored_context_accountability surface (PR #5501): in this
+conformance profile, a brand-side SI agent's every response carries a `sponsored_context` envelope

Validation in si_initiate_session_presentation step:

-description: "PR #5501 envelope: every SI response must carry sponsored_context"
+description: "sponsored_context_accountability conformance target: brand-side SI agent must emit sponsored_context on each response"

3. Changeset drift — required_disclosure_commitment bullet

In .changeset/5541-sponsored-context-accountability-storyboard.md:

-- `required_disclosure_commitment` — host's receipt carries `disclosure_commitment.status=accepted`; agent accepts the receipt without erroring regardless of whether its own declaration carried `required=true`.
+- `required_disclosure_commitment` — literal `sponsored_context` carries `disclosure_obligation.required=true`; host's receipt carries `disclosure_commitment.status=accepted`; agent accepts the well-formed receipt without error.

Triaged by Claude Code. Session: https://claude.ai/code/session_01M3SfYLeRAe9pDHn1QtkdAr


Generated by Claude Code

…cope

Three fixes per bokelley's adcontextprotocol#5551 review (2026-06-16):

1. Remove runtime-substitution claim. The storyboard runner does not
   perform object-shaped context substitution, so the YAML's claim that
   sample_request envelopes are replaced with the prior emission was
   inaccurate. The samples send static Acme outdoor receipt literals;
   the assertions test receipt acceptance/rejection semantics, not
   principal-matching against the agent's own declaration. Updates
   `prerequisites.description` and drops the misleading last sentence
   from `si_send_message_presentation_accepted` narrative.

2. Normative wording. Baseline SI frames `sponsored_context` emission as
   MAY/SHOULD; this storyboard can be stricter as an opt-in conformance
   profile, but the prose should say so explicitly. Updates `summary`,
   top-level `narrative`, and the `si_initiate_session_presentation`
   validation description to scope the MUST to agents claiming
   sponsored_context_accountability conformance.

3. Changeset drift. The `required_disclosure_commitment` bullet still
   read "regardless of whether its own declaration carried required=true"
   — but the YAML now deliberately uses a `required=true` literal.
   Rewrites the bullet to match the actual phase behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@aao-release-bot aao-release-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Argus review could not complete

The automated review encountered an issue (possibly reached max turns, timed out, or failed to post the final gh pr review). A human reviewer should take this PR.

View workflow run

This is an automated message from the Argus AI review workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants