Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard by nicolotognoni · Pull Request #162 · PatterAI/Patter

nicolotognoni · 2026-06-09T20:34:44Z

Summary

This PR ships four interconnected features for the Hermes phone-agent integration:

Hermes CLI (patter hermes doctor|setup|attach-number|numbers) — zero-config scaffolding, preflight checks, and Twilio wiring for a self-hosted Hermes voice shell.
Session-key factory — per-call derivation of memory-scope headers (e.g., X-Hermes-Session-Key) from caller identity, enabling durable per-caller memory without exposing raw phone numbers.
Long-turn filler — opt-in spoken filler message when an LLM turn (e.g., agent runtime running tools) exceeds a timeout, preventing dead silence.
Echo guard + back-to-back dedup — filters agent TTS bleeding into STT (no AEC) and drops near-duplicate transcripts, fixing false barge-ins and phantom turns on low-quality links.

Changes

Hermes CLI (`patter hermes ...`)

New module cli_hermes.py (1393 lines): doctor (preflight checks), setup (scaffold + Twilio attach), attach-number, numbers commands.
New module _hermes_scaffold.py (351 lines): Single source of truth for the starter project files (app.py, .env.example, docker-compose.yml, etc.).
New example examples/hermes-phone-agent/: Committed scaffold tree (README, app.py, scripts, docker-compose.yml, .env.example).
CLI integration in cli.py: Wires build_hermes_parser and dispatch_hermes.
Tests test_hermes_cli.py (482 lines): Scaffold sync, doctor checks, Twilio API mocking.

Session-key factory (Feature #7)

Python models.py: Added hash_caller(caller: str | None) -> str | None — stable, non-reversible 64-bit hash.
Python openai_compatible.py: New session_key_factory parameter (callable) that derives the session-key header value per call from SessionContext (call_id, caller, callee, caller_hash).
Python llm_loop.py: Refactored _stream_accepts_call_id → _stream_accepted_context_kwargs to thread caller/callee alongside call_id; detects which kwargs each provider accepts.
TypeScript openai-compatible.ts: Mirrored hashCaller, SessionContext type, sessionKeyFactory parameter.
TypeScript llm-loop.ts: Added caller / callee to LLMStreamOptions.
TypeScript types.ts: New SessionContext interface.
Hermes preset (hermes.py / hermes.ts): Convenience sessionKeyFrom: "caller_hash" option that auto-installs a factory.
Tests test_llm_session_key_factory.py (367 lines) + llm-session-key-factory.mocked.test.ts (286 lines): Factory resolution, context threading, header emission.
Namespace exports (Feature build(deps-dev): bump vitest from 2.1.9 to 4.1.4 in /sdk-ts #6): import { hermes, openclaw, openaiCompatible } from "getpatter" now works in TypeScript; Python already had from getpatter.llm import hermes.

Long-turn filler (Feature #8)

Python client.py: New long_turn_message / long_turn_message_after_s parameters on agent().
Python stream_handler.py: PipelineStreamHandler._process_streaming_response schedules a filler task if no audio reaches the carrier after the timeout; filler is spoken via the same _synthesize_sentence primitive.
TypeScript stream-handler.ts: Mirrored `longTurnMessage

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

… long-turn filler Three opt-in developer-experience improvements for the agent-LLM providers, full Python/TypeScript parity. - TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` -> `new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects. - session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext + hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the raw phone number ever reaching the wire or the logs. The factory takes precedence over the static session_key; a falsy return omits the header. The loop dispatch was generalised to thread caller/callee only to providers whose stream() declares them (or **kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from raises in both SDKs (parity). - long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when a turn is slow and no audio has reached the caller yet — distinct from llm_error_message (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised via an async clear() that awaits an in-flight filler so it can never overlap the real sentence. Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback could overlap the first real sentence; Python's asyncio path was immune). Python 2206 / TypeScript 1758 tests pass; tsc + build clean.

… per-turn cancel-event reset Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'. Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity: 1. Tail-grace misclassified the next turn as a barge-in. After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to swallow the fading echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was detected as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a <=260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. New _tail_grace_active / tailGraceActive flag distinguishes "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a transcript during the tail grace now ends the grace and dispatches as a clean NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn — recovering the leading audio from the ring instead of dropping it, with no spurious send_clear / record_turn_interrupted. Real barge-in during active TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now tracked and cancelled (parity with TS clearGraceTimer) so at most one is in flight. 2. (Python) A barge-in's per-turn cancel event leaked into the next turn. _llm_cancel_event was recreated inside _process_streaming_response — AFTER LLMLoop.run had already captured the previous (still-set) event for the next turn — so the turn after any real barge-in bailed immediately. The reset moved to the top of _dispatch_turn, before dispatch; the event object is now stable through a turn (generator and consumption loop share it). TypeScript already allocates a fresh AbortController per turn in runPipelineLlm. Tests: new test_pipeline_multiturn_tail_grace.py (6) + pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean. Adversarial review: 0 critical / 0 high.

…t-token abort (Hermes/OpenClaw) The caller could not interrupt the agent mid-response. The STT receive loop awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` / `await this.runPipelineLlm(...)`), so during a long (30-90 s) Hermes/OpenClaw tool-running turn it stopped reading transcripts — a barge-in transcript ("ferma") was only processed AFTER the turn ended. On PSTN with echo-masked/unreliable VAD, the transcript path is the only barge-in fallback and it was structurally dead. Three coordinated changes, full Python/TypeScript parity: 1. Decoupled single-in-flight dispatch. The turn runs as one tracked background task (_dispatch_task / dispatchTask) so the receive loop keeps draining transcripts and runs handleBargeIn against the LIVE turn. The loop settles the previous dispatch before launching the next (single-in-flight), so conversation_history / metrics ordering is unchanged; the loop still awaits the final turn to settle before returning, so existing tests that inspect state right after the loop are unaffected. 2. Prompt pre-first-token abort (Python). Agent runtimes run tools for tens of seconds before the first token, during which the per-chunk cancel_event check never runs. The provider now races create()+first-byte against the cancel signal and spawns a watchdog that close()s the response the instant a barge-in fires (TS already aborts promptly via fetch + AbortController). The VAD legacy barge-in branch now also sets _llm_cancel_event (previously it only flipped _is_speaking, which Hermes never observed pre-first-token), and the OpenAI-compatible client uses an explicit httpx read/connect timeout so a dead gateway fails fast. 3. PATTER_FORWARD_STT_WHILE_SPEAKING (opt-in, default off). Forwards inbound audio to STT during TTS even with a VAD configured, so the transcript barge-in path can receive a transcript on echo-masked links where the VAD never fires. The leading-edge ring is still captured. Echo caveat (WARN on enable): without AEC the agent's own voice may be transcribed as a phantom interruption — pair with agent.barge_in_strategies. Default behaviour (flag off, VAD present, normal LLM) is byte-identical; the just-landed tail-grace multi-turn fix is preserved. Tests: new test_pipeline_bargein_backgrounded.py (4), test_provider_prefirsttoken_abort.py (3), pipeline-bargein-backgrounded.mocked.test.ts (2). Python 2219 / TypeScript 1765 pass; tsc + build clean.

…-on-cancel, bounded teardown Three defects found by adversarial review of the previous commit's decoupled-dispatch barge-in, all fixed with full Python/TS parity: 1. (HIGH, TS) Per-turn history was passed to the LLM by LIVE reference. With the turn dispatch backgrounded, a following transcript's user push (on the drain loop while the turn is in flight) could land in the in-flight turn's prompt before buildMessages read it — conflating two turns. Now a history SNAPSHOT is captured at launch and threaded through dispatchTurn → runPipelineLlm → llmLoop.run (and the onMessage/webhook paths), mirroring Python's list(self.conversation_history). Regression test added. 2. (MEDIUM, Python) On cleanup/hangup hard-cancel while the provider was parked pre-first-token, asyncio.wait did not cancel the in-flight create() POST, orphaning the Hermes/OpenClaw connection ("Task exception was never retrieved"). _open_stream_with_cancel now catches CancelledError and aborts the create task. Test added. 3. (MEDIUM, TS) handleStop/handleWsClose awaited the backgrounded dispatch with no timeout — a hung user onMessage (no AbortSignal) could block call teardown indefinitely. Teardown now bounds the wait via settleDispatchForTeardown (DISPATCH_SETTLE_TIMEOUT_MS = 30s); Python hard-cancels the task. Python 2220 / TypeScript 1766 pass; tsc + build clean.

…ollow-up, mark interrupted turns Residual Hermes/OpenClaw barge-in failure (live test, PATTER_FORWARD_STT_WHILE_SPEAKING=1, no AEC, no barge_in_strategies): barge-in fired on a PHANTOM transcript ("che tu l'hai" — the agent's own Italian TTS echoing into Deepgram, not caught by the English-only hallucination filter), the real follow-up was dropped leaving an empty [interrupted] turn, and the post-barge-in context was poisoned. A workflow root-cause (code trace + web research: Coval/Pipecat/LiveKit/Azure) confirmed this is NOT an interruptibility problem — the abort already works (bargein_ms=1.0). It is a GATE + ECHO + CONTEXT-REWRITE problem. Fixes, full Python/TS parity: 1. Echo guard (language-agnostic). Track the agent's in-flight spoken text (_current_agent_spoken_text / currentAgentSpokenText). A new _looks_like_echo / looksLikeEcho (substring OR >=60% word overlap) drops any barge-in (_handle_barge_in) or commit (_commit_transcript) that is the agent's own TTS echoing back. Active ONLY while _forward_stt_while_speaking, so the default VAD path and real post-turn replies are unaffected. 2. Back-to-back dedup fix. The <500ms drop now applies only to a NEAR-DUPLICATE of the previous final (Deepgram speech_final+is_final for the same utterance), via _is_near_duplicate / isNearDuplicate. A genuinely different fast follow-up is no longer swallowed into an empty [interrupted] turn. 3. Interrupted-turn context rewrite. On a confirmed mid-turn barge-in the spoken prefix is appended to history with an "[interrupted by caller]" marker, so a stateful agent runtime (Hermes/OpenClaw, X-Hermes-Session-Id) sees next turn that it was cut off and what the caller actually heard. Plus: fixed the stale _can_barge_in docstring (0.25 -> 0.5 s no-AEC gate). Recommended caller config (unchanged SDK defaults): barge_in_strategies= (MinWordsStrategy(min_words=2),), echo_cancellation=True. Tests: test_pipeline_echo_dedup.py (19) + pipeline-echo-dedup.mocked.test.ts (11); updated the back-to-back dedup tests to the corrected behaviour. Python 2236 / TypeScript 1777 pass; tsc + build clean.

…t replies, word-boundary dedup, clean interrupted metrics Adversarial review of the echo-safe barge-in commit found three real HIGH false-positive risks; all fixed with full Python/TS parity: 1. (HIGH) Echo guard could silently drop a legitimate SHORT caller answer that repeats the agent's offered words (e.g. agent "lunedì o martedì?", caller "lunedì" → substring match → dropped, caller goes unheard). Real TTS echo is a long near-complete fragment, not a 1-3 word reply. The echo guard now requires >= _ECHO_MIN_CANDIDATE_WORDS (4) words before classifying a candidate as echo, so short answers are never dropped. (Short echo blips on a no-AEC link are left to AEC / barge_in_strategies.) 2. (HIGH) Back-to-back dedup used a character-level substring test, so a genuinely different short follow-up was dropped ("no" matched inside "nothing else") — and this ran on the DEFAULT path (not gated on the echo flag), affecting all pipeline users. _is_near_duplicate / isNearDuplicate is now word-boundary aware (equal, or a true word-prefix double-emit), so "nothing else" is no longer a duplicate of "no" while Deepgram's speech_final+is_final pair still de-duplicates. 3. (HIGH, TS) The interrupted-turn "[interrupted by caller]" marker leaked into metrics: runPipelineLlm returned the marked text and dispatchTurn fed it to recordTtsComplete/recordTurnComplete. runPipelineLlm now returns { text, interrupted }; dispatchTurn records metrics on the PLAIN text (gated on !interrupted) and applies the marker to the history/transcript only — mirroring Python, where metrics are recorded before the marker is appended. Tests updated to the corrected behaviour (>=4-word echo examples + explicit short-answer-exemption + word-boundary dedup cases). Python 2237 / TypeScript 1779 pass; tsc + build clean.

… VAD cancel to transcript On a no-AEC link with PATTER_FORWARD_STT_WHILE_SPEAKING and no barge_in_strategies, a VAD speech_start during TTS cancelled the turn immediately. But that speech_start is very often the agent's own TTS echo (or pre-first-token line noise on a long tool-running Hermes/OpenClaw turn), so the agent self-interrupted almost every turn: a short normal reply "bene bene" produced agent_text='[interrupted]', and the next turn ran the LLM for seconds yet emitted tts_characters=0 (torn down before its first token). The echo guard only protected the transcript path; the raw VAD-energy cancel had none. Defer the VAD-energy cancel to transcript confirmation whenever forward_stt_while_speaking && aec is None — exactly as it already worked when barge_in_strategies are configured. The speech_start now marks the barge-in PENDING (agent keeps talking); the cancel fires only on a real transcript that survives the echo guard, else the agent resumes after barge_in_confirm_ms (default 1500ms). Default VAD path and forward-STT WITH AEC keep the responsive immediate cancel — no behaviour change for existing configs. Full Python/TS parity. New tests drive the VAD path through on_audio_received / handleAudio: no-AEC+no-strategies defers to pending; AEC on still cancels immediately; a real transcript confirms, an echo transcript does not.

…ch-number + example app Make standing up the Hermes voice shell (Direction A) copy-paste simple, on par with wiring a hosted custom-LLM voice agent but keeping Hermes on loopback. New `patter hermes` CLI group (Python): - `doctor` — preflight across the Hermes gateway (/v1/models reachability + model presence), the Patter providers (HermesLLM constructible, Deepgram / ElevenLabs keys, ElevenLabs transport, Silero VAD), and the Twilio carrier (creds valid, number webhook). Each problem prints a suggested fix. `--no-network` skips live probes, `--json` for machine-readable output. - `setup` — scaffold a ready-to-run hermes-phone-agent project, run the checks, optionally attach a Twilio number (`--number`/`--url`). Non-interactive with `--yes`. - `attach-number` / `numbers` — point a Twilio number's voice webhook at your Patter URL / list account numbers. Scaffold (`getpatter/_hermes_scaffold.py`) is the single source of truth for the committed `examples/hermes-phone-agent/` project (app.py, .env.example, README, docker-compose, doctor/text-turn/outbound-call scripts); a test keeps them in sync. The example defaults to REST ElevenLabs TTS and caller-hash memory. TS CLI gains a `hermes` stub pointing to the Python wizard (mirrors the `eval` stub); the HermesLLM provider stays available in both SDKs. Docs updated with a zero-config setup section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

…mes detection, key-gen, --enable-hermes Address the review gaps on the Hermes wizard: it now reads and (opt-in) writes real config instead of only consulting os.environ. doctor: - Autoloads dotenv files before checking — ~/.hermes/.env then the project/cwd .env (non-overriding), with --env-file/--no-env-file to control it. Loaded paths are reported; secrets are never echoed. - Reads ~/.hermes/.env + config.yaml directly: reports API_SERVER_ENABLED, surfaces the configured key/port/model, and runs `hermes gateway status` when the CLI is present. - Sharper severity: CLI missing AND gateway unreachable is now a failure, not a soft warning; gateway-down fix adapts to whether the CLI is available. setup: - --enable-hermes writes API_SERVER_ENABLED=true (and generates an API_SERVER_KEY if absent) into ~/.hermes/.env, backing up to .env.bak first, then reminds the operator to restart the gateway. - --generate-key writes a strong key into the project .env; when used with --enable-hermes the SAME key is mirrored so Patter and Hermes agree (a mismatch is a 401 at call time). - Autoloads env for the preflight so checks reflect the project's .env. New helpers (_parse_env_file / _upsert_env_file / _load_env_files / _read_hermes_config / _enable_hermes_gateway / _generate_key), no new deps. +11 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

… trace/diagnose Close the acceptance + debugging gaps so a green run means a real call works. setup: - --start-gateway runs `hermes gateway start` then polls /v1/models until the gateway answers, completing the enable → start → verify cycle. New `patter hermes test` — acceptance, not just preflight: GET /v1/models, send a real /v1/chat/completions turn with the X-Hermes-Session-Id header and report the latency + reply snippet, confirm HermesLLM is constructible, and check the STT/TTS keys. Exit non-zero on any blocker. New `patter hermes trace [call]` / `diagnose [call]` — read the on-disk per-call log (PATTER_LOG_DIR; services/call_log.py) and classify the pipeline stage by stage (carrier → STT → Hermes → TTS), with a latency breakdown. `diagnose` applies a decision tree and names the first broken stage with a fix, e.g. "Hermes replied but no audio — TTS stage. Check ELEVENLABS_API_KEY / REST transport." Defaults to the latest call; accepts a call_id or a directory. Note: item #3 (auto-attach the tunnel URL to the carrier) is already handled by the SDK — serve() auto-configures the Twilio/Plivo webhook once the tunnel is up (server.py) — so the scaffold app does it on `python app.py`; documented. Scaffold now sets PATTER_LOG_DIR and documents test/trace/diagnose; example dir regenerated. TS CLI stub lists the new subcommands. +15 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

Bring in the anonymous opt-out telemetry work (schema v5: stack/cost/install-id, deploy-shape, feature-adoption, upgrade funnel, CLI usage, call funnel, and the `getpatter telemetry status|disable|enable` command) that landed on main after this branch was cut. Conflicts resolved: - cli.py / cli.ts: keep both the `hermes` wizard and the `telemetry` command. - CHANGELOG.md: telemetry "Added" entries placed under Added, above Fixed. - README.md: replaced the long "Anonymous Telemetry" section with the short opt-out note requested earlier, and removed the duplicate section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

mintlify · 2026-06-09T20:38:57Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
patter-06b046ce	🟢 Ready	View Preview	Jun 9, 2026, 8:39 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

nicolotognoni · 2026-06-09T21:07:53Z

Closing in favor of #161, which is a strict superset of this branch.

This PR (feat/hermes-dx-multiturn) and #161 (claude/nice-planck-bv96no) share the same commit history — #161 is exactly this branch plus one additional commit (the zero-config patter openclaw CLI). Everything here (Hermes CLI, session-key factory, long-turn filler, echo-safe barge-in) is already contained in #161.

The base layer (Hermes DX: TS namespace exports, caller-hash session key, long-turn filler) already landed on main via #159. Merging both this PR and #161 would duplicate those changes, so we are consolidating into #161 alone.

The branch is left intact and this PR can be reopened if needed.

nicolotognoni and others added 12 commits June 5, 2026 19:28

docs(readme): condense telemetry note to a short opt-out callout

1b575ee

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

mintlify Bot deployed to staging - docs June 9, 2026 20:39 View deployment

nicolotognoni closed this Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard#162

Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard#162
nicolotognoni wants to merge 12 commits into
mainfrom
feat/hermes-dx-multiturn

nicolotognoni commented Jun 9, 2026

Uh oh!

mintlify Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

nicolotognoni commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nicolotognoni commented Jun 9, 2026

Summary

Changes

Hermes CLI (patter hermes ...)

Session-key factory (Feature #7)

Long-turn filler (Feature #8)

Uh oh!

mintlify Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolotognoni commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hermes CLI (`patter hermes ...`)

mintlify Bot commented Jun 9, 2026 •

edited

Loading