Skip to content

CI: contract + canary + live e2e suites (channels, voice)#21

Merged
dimavrem22 merged 2 commits into
mainfrom
ci-live-test-stack
Jul 2, 2026
Merged

CI: contract + canary + live e2e suites (channels, voice)#21
dimavrem22 merged 2 commits into
mainfrom
ci-live-test-stack

Conversation

@dimavrem22

Copy link
Copy Markdown
Contributor

Ports the proven plugin-CI stack (hermes-agent-plugin, openclaw-plugin) to this bridge. Same three tiers, adapted to a Python bridge whose host is the Claude Code CLI + claude-agent-sdk:

Tiers

  1. tests.yml — offline unit suite on every push/PR (Python 3.10/3.12 matrix; absorbs the old ci.yml), plus a contract-pr job that runs tests/contract against the latest published claude-agent-sdk and @anthropic-ai/claude-code so upstream interface drift fails the PR, not a live gateway.
  2. canary.yml — the same contract suite 2×/day (06:13 / 18:13 PT — deliberately an hour ahead of the rest of the fleet's canaries), paging Google Chat only on unattended (scheduled) failure.
  3. Live e2e — boots the real bridge (tunnel + webhooks + Claude Code sessions) in the runner and drives it from the remote driver identity:
    • live-channels.yml: email reachability, email intelligence (identity / sender / tools / contact CRUD), SMS, and cross-channel (email→SMS, SMS→email, email→call, SMS→call). Two serialized legs: mock (deterministic local Anthropic-API mock via ANTHROPIC_BASE_URL — free, proves the whole pipe with a nonce echo) and real (real Claude model — proves reasoning + tool use). Chained off a passing canary via workflow_run.
    • live-voice.yml: two real phone-call scenarios — inbound_inkbox (driver calls the agent; Inkbox STT/TTS) and outbound_realtime (driver texts "call me"; the agent dials back on the realtime voice path). Transcript-verified two-way speech + speech-mode assertions on the AUT's call records.

External-events live tests are intentionally out of scope, matching the fleet decision in inkbox-ai/openclaw-plugin#24.

Gating & safety

  • Live suites run only on ready (non-draft) same-repo PRs, manual dispatch, or a passing canary; drafts and forks get the offline lanes only.
  • Repo-wide inkbox-live-aut-tunnel concurrency group (no cancel) — only one holder of the AUT identity's Inkbox tunnel at a time, across both live workflows.
  • Gateway/driver logs can carry live message content, so they're dumped and uploaded only on failure.
  • INKBOX_PERMISSION_TIMEOUT_S=30 in CI so a stray permission escalation fails fast instead of parking a session for 10 minutes.

Secrets

Uses the existing CLAUDE_CODE_INKBOX_API_KEY / CLAUDE_CODE_INKBOX_SIGNING_KEY (AUT), REMOTE_INKBOX_API_KEY (driver), OPENAI_API_KEY (realtime voice), and GOOGLE_CHAT_WEBHOOK_URL. One new secret is required before the live real-model legs can run: ANTHROPIC_API_KEY — the bridged sessions are real Claude Code, so everything except the mock-channels leg needs it.

No plugin logic changes — test + workflow files only (.github/workflows/ci.yml is folded into tests.yml).

Three-tier stack: offline unit lane (Python 3.10/3.12 matrix), contract
tests against the latest published claude-agent-sdk + Claude Code CLI
(per-PR gate + 2x/day canary that pages Chat on unattended failure), and
live suites that boot the real bridge in CI and drive it from a remote
Inkbox identity — email/SMS/cross-channel with a mock-model and a
real-model leg, plus two real phone-call scenarios (inbound Inkbox
STT/TTS, outbound realtime). Live runs are gated to ready same-repo PRs,
dispatches, and passing canaries, and serialized on the AUT tunnel lock.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@dimavrem22 dimavrem22 marked this pull request as ready for review July 2, 2026 02:50
The 13:13/01:13 slots collided with another plugin's canary; the fleet
runs hour-staggered waves and this one leads them.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@dimavrem22 dimavrem22 merged commit 7890047 into main Jul 2, 2026
11 of 13 checks passed
@dimavrem22 dimavrem22 deleted the ci-live-test-stack branch July 2, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant