Skip to content

Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard#162

Closed
nicolotognoni wants to merge 12 commits into
mainfrom
feat/hermes-dx-multiturn
Closed

Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard#162
nicolotognoni wants to merge 12 commits into
mainfrom
feat/hermes-dx-multiturn

Conversation

@nicolotognoni

Copy link
Copy Markdown
Collaborator

Summary

This PR ships four interconnected features for the Hermes phone-agent integration:

  1. Hermes CLI (patter hermes doctor|setup|attach-number|numbers) — zero-config scaffolding, preflight checks, and Twilio wiring for a self-hosted Hermes voice shell.
  2. Session-key factory — per-call derivation of memory-scope headers (e.g., X-Hermes-Session-Key) from caller identity, enabling durable per-caller memory without exposing raw phone numbers.
  3. Long-turn filler — opt-in spoken filler message when an LLM turn (e.g., agent runtime running tools) exceeds a timeout, preventing dead silence.
  4. Echo guard + back-to-back dedup — filters agent TTS bleeding into STT (no AEC) and drops near-duplicate transcripts, fixing false barge-ins and phantom turns on low-quality links.

Changes

Hermes CLI (patter hermes ...)

  • New module cli_hermes.py (1393 lines): doctor (preflight checks), setup (scaffold + Twilio attach), attach-number, numbers commands.
  • New module _hermes_scaffold.py (351 lines): Single source of truth for the starter project files (app.py, .env.example, docker-compose.yml, etc.).
  • New example examples/hermes-phone-agent/: Committed scaffold tree (README, app.py, scripts, docker-compose.yml, .env.example).
  • CLI integration in cli.py: Wires build_hermes_parser and dispatch_hermes.
  • Tests test_hermes_cli.py (482 lines): Scaffold sync, doctor checks, Twilio API mocking.

Session-key factory (Feature #7)

  • Python models.py: Added hash_caller(caller: str | None) -> str | None — stable, non-reversible 64-bit hash.
  • Python openai_compatible.py: New session_key_factory parameter (callable) that derives the session-key header value per call from SessionContext (call_id, caller, callee, caller_hash).
  • Python llm_loop.py: Refactored _stream_accepts_call_id_stream_accepted_context_kwargs to thread caller/callee alongside call_id; detects which kwargs each provider accepts.
  • TypeScript openai-compatible.ts: Mirrored hashCaller, SessionContext type, sessionKeyFactory parameter.
  • TypeScript llm-loop.ts: Added caller / callee to LLMStreamOptions.
  • TypeScript types.ts: New SessionContext interface.
  • Hermes preset (hermes.py / hermes.ts): Convenience sessionKeyFrom: "caller_hash" option that auto-installs a factory.
  • Tests test_llm_session_key_factory.py (367 lines) + llm-session-key-factory.mocked.test.ts (286 lines): Factory resolution, context threading, header emission.
  • Namespace exports (Feature build(deps-dev): bump vitest from 2.1.9 to 4.1.4 in /sdk-ts #6): import { hermes, openclaw, openaiCompatible } from "getpatter" now works in TypeScript; Python already had from getpatter.llm import hermes.

Long-turn filler (Feature #8)

  • Python client.py: New long_turn_message / long_turn_message_after_s parameters on agent().
  • Python stream_handler.py: PipelineStreamHandler._process_streaming_response schedules a filler task if no audio reaches the carrier after the timeout; filler is spoken via the same _synthesize_sentence primitive.
  • TypeScript stream-handler.ts: Mirrored `longTurnMessage

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

nicolotognoni and others added 12 commits June 5, 2026 19:28
… long-turn filler

Three opt-in developer-experience improvements for the agent-LLM providers, full
Python/TypeScript parity.

- TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` ->
  `new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects.
- session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the
  X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext +
  hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the
  raw phone number ever reaching the wire or the logs. The factory takes precedence over
  the static session_key; a falsy return omits the header. The loop dispatch was
  generalised to thread caller/callee only to providers whose stream() declares them (or
  **kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from
  raises in both SDKs (parity).
- long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when
  a turn is slow and no audio has reached the caller yet — distinct from llm_error_message
  (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised
  via an async clear() that awaits an in-flight filler so it can never overlap the real
  sentence.

Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback
could overlap the first real sentence; Python's asyncio path was immune).

Python 2206 / TypeScript 1758 tests pass; tsc + build clean.
… per-turn cancel-event reset

Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an
agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn
went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'.

Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity:

1. Tail-grace misclassified the next turn as a barge-in.
   After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace
   keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to
   swallow the fading echo tail. Humans reply in 200-700 ms — inside that window
   — so the user's next utterance was detected as a barge-in: it recorded an
   interrupted turn and the leading audio was withheld from STT (only a <=260 ms
   echo-contaminated ring), so no final transcript was produced and the agent
   never answered. New _tail_grace_active / tailGraceActive flag distinguishes
   "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a
   transcript during the tail grace now ends the grace and dispatches as a clean
   NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn —
   recovering the leading audio from the ring instead of dropping it, with no
   spurious send_clear / record_turn_interrupted. Real barge-in during active
   TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now
   tracked and cancelled (parity with TS clearGraceTimer) so at most one is in
   flight.

2. (Python) A barge-in's per-turn cancel event leaked into the next turn.
   _llm_cancel_event was recreated inside _process_streaming_response — AFTER
   LLMLoop.run had already captured the previous (still-set) event for the next
   turn — so the turn after any real barge-in bailed immediately. The reset
   moved to the top of _dispatch_turn, before dispatch; the event object is now
   stable through a turn (generator and consumption loop share it). TypeScript
   already allocates a fresh AbortController per turn in runPipelineLlm.

Tests: new test_pipeline_multiturn_tail_grace.py (6) +
pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert
the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and
the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean.
Adversarial review: 0 critical / 0 high.
…t-token abort (Hermes/OpenClaw)

The caller could not interrupt the agent mid-response. The STT receive loop
awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` /
`await this.runPipelineLlm(...)`), so during a long (30-90 s) Hermes/OpenClaw
tool-running turn it stopped reading transcripts — a barge-in transcript ("ferma")
was only processed AFTER the turn ended. On PSTN with echo-masked/unreliable VAD,
the transcript path is the only barge-in fallback and it was structurally dead.

Three coordinated changes, full Python/TypeScript parity:

1. Decoupled single-in-flight dispatch. The turn runs as one tracked background
   task (_dispatch_task / dispatchTask) so the receive loop keeps draining
   transcripts and runs handleBargeIn against the LIVE turn. The loop settles
   the previous dispatch before launching the next (single-in-flight), so
   conversation_history / metrics ordering is unchanged; the loop still awaits
   the final turn to settle before returning, so existing tests that inspect
   state right after the loop are unaffected.

2. Prompt pre-first-token abort (Python). Agent runtimes run tools for tens of
   seconds before the first token, during which the per-chunk cancel_event
   check never runs. The provider now races create()+first-byte against the
   cancel signal and spawns a watchdog that close()s the response the instant a
   barge-in fires (TS already aborts promptly via fetch + AbortController). The
   VAD legacy barge-in branch now also sets _llm_cancel_event (previously it
   only flipped _is_speaking, which Hermes never observed pre-first-token), and
   the OpenAI-compatible client uses an explicit httpx read/connect timeout so a
   dead gateway fails fast.

3. PATTER_FORWARD_STT_WHILE_SPEAKING (opt-in, default off). Forwards inbound
   audio to STT during TTS even with a VAD configured, so the transcript
   barge-in path can receive a transcript on echo-masked links where the VAD
   never fires. The leading-edge ring is still captured. Echo caveat (WARN on
   enable): without AEC the agent's own voice may be transcribed as a phantom
   interruption — pair with agent.barge_in_strategies.

Default behaviour (flag off, VAD present, normal LLM) is byte-identical; the
just-landed tail-grace multi-turn fix is preserved.

Tests: new test_pipeline_bargein_backgrounded.py (4), test_provider_prefirsttoken_abort.py (3),
pipeline-bargein-backgrounded.mocked.test.ts (2). Python 2219 / TypeScript 1765 pass;
tsc + build clean.
…-on-cancel, bounded teardown

Three defects found by adversarial review of the previous commit's
decoupled-dispatch barge-in, all fixed with full Python/TS parity:

1. (HIGH, TS) Per-turn history was passed to the LLM by LIVE reference. With the
   turn dispatch backgrounded, a following transcript's user push (on the drain
   loop while the turn is in flight) could land in the in-flight turn's prompt
   before buildMessages read it — conflating two turns. Now a history SNAPSHOT
   is captured at launch and threaded through dispatchTurn → runPipelineLlm →
   llmLoop.run (and the onMessage/webhook paths), mirroring Python's
   list(self.conversation_history). Regression test added.

2. (MEDIUM, Python) On cleanup/hangup hard-cancel while the provider was parked
   pre-first-token, asyncio.wait did not cancel the in-flight create() POST,
   orphaning the Hermes/OpenClaw connection ("Task exception was never
   retrieved"). _open_stream_with_cancel now catches CancelledError and aborts
   the create task. Test added.

3. (MEDIUM, TS) handleStop/handleWsClose awaited the backgrounded dispatch with
   no timeout — a hung user onMessage (no AbortSignal) could block call teardown
   indefinitely. Teardown now bounds the wait via settleDispatchForTeardown
   (DISPATCH_SETTLE_TIMEOUT_MS = 30s); Python hard-cancels the task.

Python 2220 / TypeScript 1766 pass; tsc + build clean.
…ollow-up, mark interrupted turns

Residual Hermes/OpenClaw barge-in failure (live test, PATTER_FORWARD_STT_WHILE_SPEAKING=1,
no AEC, no barge_in_strategies): barge-in fired on a PHANTOM transcript ("che tu
l'hai" — the agent's own Italian TTS echoing into Deepgram, not caught by the
English-only hallucination filter), the real follow-up was dropped leaving an
empty [interrupted] turn, and the post-barge-in context was poisoned.

A workflow root-cause (code trace + web research: Coval/Pipecat/LiveKit/Azure)
confirmed this is NOT an interruptibility problem — the abort already works
(bargein_ms=1.0). It is a GATE + ECHO + CONTEXT-REWRITE problem. Fixes, full
Python/TS parity:

1. Echo guard (language-agnostic). Track the agent's in-flight spoken text
   (_current_agent_spoken_text / currentAgentSpokenText). A new _looks_like_echo
   / looksLikeEcho (substring OR >=60% word overlap) drops any barge-in
   (_handle_barge_in) or commit (_commit_transcript) that is the agent's own
   TTS echoing back. Active ONLY while _forward_stt_while_speaking, so the
   default VAD path and real post-turn replies are unaffected.

2. Back-to-back dedup fix. The <500ms drop now applies only to a NEAR-DUPLICATE
   of the previous final (Deepgram speech_final+is_final for the same
   utterance), via _is_near_duplicate / isNearDuplicate. A genuinely different
   fast follow-up is no longer swallowed into an empty [interrupted] turn.

3. Interrupted-turn context rewrite. On a confirmed mid-turn barge-in the spoken
   prefix is appended to history with an "[interrupted by caller]" marker, so a
   stateful agent runtime (Hermes/OpenClaw, X-Hermes-Session-Id) sees next turn
   that it was cut off and what the caller actually heard.

Plus: fixed the stale _can_barge_in docstring (0.25 -> 0.5 s no-AEC gate).
Recommended caller config (unchanged SDK defaults): barge_in_strategies=
(MinWordsStrategy(min_words=2),), echo_cancellation=True.

Tests: test_pipeline_echo_dedup.py (19) + pipeline-echo-dedup.mocked.test.ts (11);
updated the back-to-back dedup tests to the corrected behaviour. Python 2236 /
TypeScript 1777 pass; tsc + build clean.
…t replies, word-boundary dedup, clean interrupted metrics

Adversarial review of the echo-safe barge-in commit found three real HIGH
false-positive risks; all fixed with full Python/TS parity:

1. (HIGH) Echo guard could silently drop a legitimate SHORT caller answer that
   repeats the agent's offered words (e.g. agent "lunedì o martedì?", caller
   "lunedì" → substring match → dropped, caller goes unheard). Real TTS echo is
   a long near-complete fragment, not a 1-3 word reply. The echo guard now
   requires >= _ECHO_MIN_CANDIDATE_WORDS (4) words before classifying a
   candidate as echo, so short answers are never dropped. (Short echo blips on a
   no-AEC link are left to AEC / barge_in_strategies.)

2. (HIGH) Back-to-back dedup used a character-level substring test, so a
   genuinely different short follow-up was dropped ("no" matched inside
   "nothing else") — and this ran on the DEFAULT path (not gated on the echo
   flag), affecting all pipeline users. _is_near_duplicate / isNearDuplicate is
   now word-boundary aware (equal, or a true word-prefix double-emit), so
   "nothing else" is no longer a duplicate of "no" while Deepgram's
   speech_final+is_final pair still de-duplicates.

3. (HIGH, TS) The interrupted-turn "[interrupted by caller]" marker leaked into
   metrics: runPipelineLlm returned the marked text and dispatchTurn fed it to
   recordTtsComplete/recordTurnComplete. runPipelineLlm now returns
   { text, interrupted }; dispatchTurn records metrics on the PLAIN text (gated
   on !interrupted) and applies the marker to the history/transcript only —
   mirroring Python, where metrics are recorded before the marker is appended.

Tests updated to the corrected behaviour (>=4-word echo examples + explicit
short-answer-exemption + word-boundary dedup cases). Python 2237 / TypeScript
1779 pass; tsc + build clean.
… VAD cancel to transcript

On a no-AEC link with PATTER_FORWARD_STT_WHILE_SPEAKING and no
barge_in_strategies, a VAD speech_start during TTS cancelled the turn
immediately. But that speech_start is very often the agent's own TTS
echo (or pre-first-token line noise on a long tool-running Hermes/OpenClaw
turn), so the agent self-interrupted almost every turn: a short normal
reply "bene bene" produced agent_text='[interrupted]', and the next turn
ran the LLM for seconds yet emitted tts_characters=0 (torn down before
its first token).

The echo guard only protected the transcript path; the raw VAD-energy
cancel had none. Defer the VAD-energy cancel to transcript confirmation
whenever forward_stt_while_speaking && aec is None — exactly as it already
worked when barge_in_strategies are configured. The speech_start now marks
the barge-in PENDING (agent keeps talking); the cancel fires only on a real
transcript that survives the echo guard, else the agent resumes after
barge_in_confirm_ms (default 1500ms). Default VAD path and forward-STT WITH
AEC keep the responsive immediate cancel — no behaviour change for existing
configs.

Full Python/TS parity. New tests drive the VAD path through on_audio_received
/ handleAudio: no-AEC+no-strategies defers to pending; AEC on still cancels
immediately; a real transcript confirms, an echo transcript does not.
…ch-number + example app

Make standing up the Hermes voice shell (Direction A) copy-paste simple, on par
with wiring a hosted custom-LLM voice agent but keeping Hermes on loopback.

New `patter hermes` CLI group (Python):
- `doctor`  — preflight across the Hermes gateway (/v1/models reachability +
  model presence), the Patter providers (HermesLLM constructible, Deepgram /
  ElevenLabs keys, ElevenLabs transport, Silero VAD), and the Twilio carrier
  (creds valid, number webhook). Each problem prints a suggested fix.
  `--no-network` skips live probes, `--json` for machine-readable output.
- `setup`   — scaffold a ready-to-run hermes-phone-agent project, run the
  checks, optionally attach a Twilio number (`--number`/`--url`). Non-interactive
  with `--yes`.
- `attach-number` / `numbers` — point a Twilio number's voice webhook at your
  Patter URL / list account numbers.

Scaffold (`getpatter/_hermes_scaffold.py`) is the single source of truth for the
committed `examples/hermes-phone-agent/` project (app.py, .env.example, README,
docker-compose, doctor/text-turn/outbound-call scripts); a test keeps them in
sync. The example defaults to REST ElevenLabs TTS and caller-hash memory.

TS CLI gains a `hermes` stub pointing to the Python wizard (mirrors the `eval`
stub); the HermesLLM provider stays available in both SDKs. Docs updated with a
zero-config setup section.

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
…mes detection, key-gen, --enable-hermes

Address the review gaps on the Hermes wizard: it now reads and (opt-in) writes
real config instead of only consulting os.environ.

doctor:
- Autoloads dotenv files before checking — ~/.hermes/.env then the project/cwd
  .env (non-overriding), with --env-file/--no-env-file to control it. Loaded
  paths are reported; secrets are never echoed.
- Reads ~/.hermes/.env + config.yaml directly: reports API_SERVER_ENABLED,
  surfaces the configured key/port/model, and runs `hermes gateway status` when
  the CLI is present.
- Sharper severity: CLI missing AND gateway unreachable is now a failure, not a
  soft warning; gateway-down fix adapts to whether the CLI is available.

setup:
- --enable-hermes writes API_SERVER_ENABLED=true (and generates an
  API_SERVER_KEY if absent) into ~/.hermes/.env, backing up to .env.bak first,
  then reminds the operator to restart the gateway.
- --generate-key writes a strong key into the project .env; when used with
  --enable-hermes the SAME key is mirrored so Patter and Hermes agree (a
  mismatch is a 401 at call time).
- Autoloads env for the preflight so checks reflect the project's .env.

New helpers (_parse_env_file / _upsert_env_file / _load_env_files /
_read_hermes_config / _enable_hermes_gateway / _generate_key), no new deps.
+11 unit tests; docs updated.

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
… trace/diagnose

Close the acceptance + debugging gaps so a green run means a real call works.

setup:
- --start-gateway runs `hermes gateway start` then polls /v1/models until the
  gateway answers, completing the enable → start → verify cycle.

New `patter hermes test` — acceptance, not just preflight: GET /v1/models, send a
real /v1/chat/completions turn with the X-Hermes-Session-Id header and report the
latency + reply snippet, confirm HermesLLM is constructible, and check the STT/TTS
keys. Exit non-zero on any blocker.

New `patter hermes trace [call]` / `diagnose [call]` — read the on-disk per-call
log (PATTER_LOG_DIR; services/call_log.py) and classify the pipeline stage by
stage (carrier → STT → Hermes → TTS), with a latency breakdown. `diagnose`
applies a decision tree and names the first broken stage with a fix, e.g.
"Hermes replied but no audio — TTS stage. Check ELEVENLABS_API_KEY / REST
transport." Defaults to the latest call; accepts a call_id or a directory.

Note: item #3 (auto-attach the tunnel URL to the carrier) is already handled by
the SDK — serve() auto-configures the Twilio/Plivo webhook once the tunnel is up
(server.py) — so the scaffold app does it on `python app.py`; documented.

Scaffold now sets PATTER_LOG_DIR and documents test/trace/diagnose; example dir
regenerated. TS CLI stub lists the new subcommands. +15 unit tests; docs updated.

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
Bring in the anonymous opt-out telemetry work (schema v5: stack/cost/install-id,
deploy-shape, feature-adoption, upgrade funnel, CLI usage, call funnel, and the
`getpatter telemetry status|disable|enable` command) that landed on main after
this branch was cut.

Conflicts resolved:
- cli.py / cli.ts: keep both the `hermes` wizard and the `telemetry` command.
- CHANGELOG.md: telemetry "Added" entries placed under Added, above Fixed.
- README.md: replaced the long "Anonymous Telemetry" section with the short
  opt-out note requested earlier, and removed the duplicate section.

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
@mintlify

mintlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
patter-06b046ce 🟢 Ready View Preview Jun 9, 2026, 8:39 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@nicolotognoni

Copy link
Copy Markdown
Collaborator Author

Closing in favor of #161, which is a strict superset of this branch.

This PR (feat/hermes-dx-multiturn) and #161 (claude/nice-planck-bv96no) share the same commit history — #161 is exactly this branch plus one additional commit (the zero-config patter openclaw CLI). Everything here (Hermes CLI, session-key factory, long-turn filler, echo-safe barge-in) is already contained in #161.

The base layer (Hermes DX: TS namespace exports, caller-hash session key, long-turn filler) already landed on main via #159. Merging both this PR and #161 would duplicate those changes, so we are consolidating into #161 alone.

The branch is left intact and this PR can be reopened if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants