Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard#162
Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard#162nicolotognoni wants to merge 12 commits into
Conversation
… long-turn filler
Three opt-in developer-experience improvements for the agent-LLM providers, full
Python/TypeScript parity.
- TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` ->
`new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects.
- session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the
X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext +
hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the
raw phone number ever reaching the wire or the logs. The factory takes precedence over
the static session_key; a falsy return omits the header. The loop dispatch was
generalised to thread caller/callee only to providers whose stream() declares them (or
**kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from
raises in both SDKs (parity).
- long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when
a turn is slow and no audio has reached the caller yet — distinct from llm_error_message
(which fires on error). Fires once, gated on emitted audio; the TS timer is serialised
via an async clear() that awaits an in-flight filler so it can never overlap the real
sentence.
Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback
could overlap the first real sentence; Python's asyncio path was immune).
Python 2206 / TypeScript 1758 tests pass; tsc + build clean.
… per-turn cancel-event reset Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'. Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity: 1. Tail-grace misclassified the next turn as a barge-in. After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to swallow the fading echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was detected as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a <=260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. New _tail_grace_active / tailGraceActive flag distinguishes "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a transcript during the tail grace now ends the grace and dispatches as a clean NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn — recovering the leading audio from the ring instead of dropping it, with no spurious send_clear / record_turn_interrupted. Real barge-in during active TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now tracked and cancelled (parity with TS clearGraceTimer) so at most one is in flight. 2. (Python) A barge-in's per-turn cancel event leaked into the next turn. _llm_cancel_event was recreated inside _process_streaming_response — AFTER LLMLoop.run had already captured the previous (still-set) event for the next turn — so the turn after any real barge-in bailed immediately. The reset moved to the top of _dispatch_turn, before dispatch; the event object is now stable through a turn (generator and consumption loop share it). TypeScript already allocates a fresh AbortController per turn in runPipelineLlm. Tests: new test_pipeline_multiturn_tail_grace.py (6) + pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean. Adversarial review: 0 critical / 0 high.
…t-token abort (Hermes/OpenClaw)
The caller could not interrupt the agent mid-response. The STT receive loop
awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` /
`await this.runPipelineLlm(...)`), so during a long (30-90 s) Hermes/OpenClaw
tool-running turn it stopped reading transcripts — a barge-in transcript ("ferma")
was only processed AFTER the turn ended. On PSTN with echo-masked/unreliable VAD,
the transcript path is the only barge-in fallback and it was structurally dead.
Three coordinated changes, full Python/TypeScript parity:
1. Decoupled single-in-flight dispatch. The turn runs as one tracked background
task (_dispatch_task / dispatchTask) so the receive loop keeps draining
transcripts and runs handleBargeIn against the LIVE turn. The loop settles
the previous dispatch before launching the next (single-in-flight), so
conversation_history / metrics ordering is unchanged; the loop still awaits
the final turn to settle before returning, so existing tests that inspect
state right after the loop are unaffected.
2. Prompt pre-first-token abort (Python). Agent runtimes run tools for tens of
seconds before the first token, during which the per-chunk cancel_event
check never runs. The provider now races create()+first-byte against the
cancel signal and spawns a watchdog that close()s the response the instant a
barge-in fires (TS already aborts promptly via fetch + AbortController). The
VAD legacy barge-in branch now also sets _llm_cancel_event (previously it
only flipped _is_speaking, which Hermes never observed pre-first-token), and
the OpenAI-compatible client uses an explicit httpx read/connect timeout so a
dead gateway fails fast.
3. PATTER_FORWARD_STT_WHILE_SPEAKING (opt-in, default off). Forwards inbound
audio to STT during TTS even with a VAD configured, so the transcript
barge-in path can receive a transcript on echo-masked links where the VAD
never fires. The leading-edge ring is still captured. Echo caveat (WARN on
enable): without AEC the agent's own voice may be transcribed as a phantom
interruption — pair with agent.barge_in_strategies.
Default behaviour (flag off, VAD present, normal LLM) is byte-identical; the
just-landed tail-grace multi-turn fix is preserved.
Tests: new test_pipeline_bargein_backgrounded.py (4), test_provider_prefirsttoken_abort.py (3),
pipeline-bargein-backgrounded.mocked.test.ts (2). Python 2219 / TypeScript 1765 pass;
tsc + build clean.
…-on-cancel, bounded teardown
Three defects found by adversarial review of the previous commit's
decoupled-dispatch barge-in, all fixed with full Python/TS parity:
1. (HIGH, TS) Per-turn history was passed to the LLM by LIVE reference. With the
turn dispatch backgrounded, a following transcript's user push (on the drain
loop while the turn is in flight) could land in the in-flight turn's prompt
before buildMessages read it — conflating two turns. Now a history SNAPSHOT
is captured at launch and threaded through dispatchTurn → runPipelineLlm →
llmLoop.run (and the onMessage/webhook paths), mirroring Python's
list(self.conversation_history). Regression test added.
2. (MEDIUM, Python) On cleanup/hangup hard-cancel while the provider was parked
pre-first-token, asyncio.wait did not cancel the in-flight create() POST,
orphaning the Hermes/OpenClaw connection ("Task exception was never
retrieved"). _open_stream_with_cancel now catches CancelledError and aborts
the create task. Test added.
3. (MEDIUM, TS) handleStop/handleWsClose awaited the backgrounded dispatch with
no timeout — a hung user onMessage (no AbortSignal) could block call teardown
indefinitely. Teardown now bounds the wait via settleDispatchForTeardown
(DISPATCH_SETTLE_TIMEOUT_MS = 30s); Python hard-cancels the task.
Python 2220 / TypeScript 1766 pass; tsc + build clean.
…ollow-up, mark interrupted turns
Residual Hermes/OpenClaw barge-in failure (live test, PATTER_FORWARD_STT_WHILE_SPEAKING=1,
no AEC, no barge_in_strategies): barge-in fired on a PHANTOM transcript ("che tu
l'hai" — the agent's own Italian TTS echoing into Deepgram, not caught by the
English-only hallucination filter), the real follow-up was dropped leaving an
empty [interrupted] turn, and the post-barge-in context was poisoned.
A workflow root-cause (code trace + web research: Coval/Pipecat/LiveKit/Azure)
confirmed this is NOT an interruptibility problem — the abort already works
(bargein_ms=1.0). It is a GATE + ECHO + CONTEXT-REWRITE problem. Fixes, full
Python/TS parity:
1. Echo guard (language-agnostic). Track the agent's in-flight spoken text
(_current_agent_spoken_text / currentAgentSpokenText). A new _looks_like_echo
/ looksLikeEcho (substring OR >=60% word overlap) drops any barge-in
(_handle_barge_in) or commit (_commit_transcript) that is the agent's own
TTS echoing back. Active ONLY while _forward_stt_while_speaking, so the
default VAD path and real post-turn replies are unaffected.
2. Back-to-back dedup fix. The <500ms drop now applies only to a NEAR-DUPLICATE
of the previous final (Deepgram speech_final+is_final for the same
utterance), via _is_near_duplicate / isNearDuplicate. A genuinely different
fast follow-up is no longer swallowed into an empty [interrupted] turn.
3. Interrupted-turn context rewrite. On a confirmed mid-turn barge-in the spoken
prefix is appended to history with an "[interrupted by caller]" marker, so a
stateful agent runtime (Hermes/OpenClaw, X-Hermes-Session-Id) sees next turn
that it was cut off and what the caller actually heard.
Plus: fixed the stale _can_barge_in docstring (0.25 -> 0.5 s no-AEC gate).
Recommended caller config (unchanged SDK defaults): barge_in_strategies=
(MinWordsStrategy(min_words=2),), echo_cancellation=True.
Tests: test_pipeline_echo_dedup.py (19) + pipeline-echo-dedup.mocked.test.ts (11);
updated the back-to-back dedup tests to the corrected behaviour. Python 2236 /
TypeScript 1777 pass; tsc + build clean.
…t replies, word-boundary dedup, clean interrupted metrics
Adversarial review of the echo-safe barge-in commit found three real HIGH
false-positive risks; all fixed with full Python/TS parity:
1. (HIGH) Echo guard could silently drop a legitimate SHORT caller answer that
repeats the agent's offered words (e.g. agent "lunedì o martedì?", caller
"lunedì" → substring match → dropped, caller goes unheard). Real TTS echo is
a long near-complete fragment, not a 1-3 word reply. The echo guard now
requires >= _ECHO_MIN_CANDIDATE_WORDS (4) words before classifying a
candidate as echo, so short answers are never dropped. (Short echo blips on a
no-AEC link are left to AEC / barge_in_strategies.)
2. (HIGH) Back-to-back dedup used a character-level substring test, so a
genuinely different short follow-up was dropped ("no" matched inside
"nothing else") — and this ran on the DEFAULT path (not gated on the echo
flag), affecting all pipeline users. _is_near_duplicate / isNearDuplicate is
now word-boundary aware (equal, or a true word-prefix double-emit), so
"nothing else" is no longer a duplicate of "no" while Deepgram's
speech_final+is_final pair still de-duplicates.
3. (HIGH, TS) The interrupted-turn "[interrupted by caller]" marker leaked into
metrics: runPipelineLlm returned the marked text and dispatchTurn fed it to
recordTtsComplete/recordTurnComplete. runPipelineLlm now returns
{ text, interrupted }; dispatchTurn records metrics on the PLAIN text (gated
on !interrupted) and applies the marker to the history/transcript only —
mirroring Python, where metrics are recorded before the marker is appended.
Tests updated to the corrected behaviour (>=4-word echo examples + explicit
short-answer-exemption + word-boundary dedup cases). Python 2237 / TypeScript
1779 pass; tsc + build clean.
… VAD cancel to transcript On a no-AEC link with PATTER_FORWARD_STT_WHILE_SPEAKING and no barge_in_strategies, a VAD speech_start during TTS cancelled the turn immediately. But that speech_start is very often the agent's own TTS echo (or pre-first-token line noise on a long tool-running Hermes/OpenClaw turn), so the agent self-interrupted almost every turn: a short normal reply "bene bene" produced agent_text='[interrupted]', and the next turn ran the LLM for seconds yet emitted tts_characters=0 (torn down before its first token). The echo guard only protected the transcript path; the raw VAD-energy cancel had none. Defer the VAD-energy cancel to transcript confirmation whenever forward_stt_while_speaking && aec is None — exactly as it already worked when barge_in_strategies are configured. The speech_start now marks the barge-in PENDING (agent keeps talking); the cancel fires only on a real transcript that survives the echo guard, else the agent resumes after barge_in_confirm_ms (default 1500ms). Default VAD path and forward-STT WITH AEC keep the responsive immediate cancel — no behaviour change for existing configs. Full Python/TS parity. New tests drive the VAD path through on_audio_received / handleAudio: no-AEC+no-strategies defers to pending; AEC on still cancels immediately; a real transcript confirms, an echo transcript does not.
…ch-number + example app Make standing up the Hermes voice shell (Direction A) copy-paste simple, on par with wiring a hosted custom-LLM voice agent but keeping Hermes on loopback. New `patter hermes` CLI group (Python): - `doctor` — preflight across the Hermes gateway (/v1/models reachability + model presence), the Patter providers (HermesLLM constructible, Deepgram / ElevenLabs keys, ElevenLabs transport, Silero VAD), and the Twilio carrier (creds valid, number webhook). Each problem prints a suggested fix. `--no-network` skips live probes, `--json` for machine-readable output. - `setup` — scaffold a ready-to-run hermes-phone-agent project, run the checks, optionally attach a Twilio number (`--number`/`--url`). Non-interactive with `--yes`. - `attach-number` / `numbers` — point a Twilio number's voice webhook at your Patter URL / list account numbers. Scaffold (`getpatter/_hermes_scaffold.py`) is the single source of truth for the committed `examples/hermes-phone-agent/` project (app.py, .env.example, README, docker-compose, doctor/text-turn/outbound-call scripts); a test keeps them in sync. The example defaults to REST ElevenLabs TTS and caller-hash memory. TS CLI gains a `hermes` stub pointing to the Python wizard (mirrors the `eval` stub); the HermesLLM provider stays available in both SDKs. Docs updated with a zero-config setup section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
…mes detection, key-gen, --enable-hermes Address the review gaps on the Hermes wizard: it now reads and (opt-in) writes real config instead of only consulting os.environ. doctor: - Autoloads dotenv files before checking — ~/.hermes/.env then the project/cwd .env (non-overriding), with --env-file/--no-env-file to control it. Loaded paths are reported; secrets are never echoed. - Reads ~/.hermes/.env + config.yaml directly: reports API_SERVER_ENABLED, surfaces the configured key/port/model, and runs `hermes gateway status` when the CLI is present. - Sharper severity: CLI missing AND gateway unreachable is now a failure, not a soft warning; gateway-down fix adapts to whether the CLI is available. setup: - --enable-hermes writes API_SERVER_ENABLED=true (and generates an API_SERVER_KEY if absent) into ~/.hermes/.env, backing up to .env.bak first, then reminds the operator to restart the gateway. - --generate-key writes a strong key into the project .env; when used with --enable-hermes the SAME key is mirrored so Patter and Hermes agree (a mismatch is a 401 at call time). - Autoloads env for the preflight so checks reflect the project's .env. New helpers (_parse_env_file / _upsert_env_file / _load_env_files / _read_hermes_config / _enable_hermes_gateway / _generate_key), no new deps. +11 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
… trace/diagnose Close the acceptance + debugging gaps so a green run means a real call works. setup: - --start-gateway runs `hermes gateway start` then polls /v1/models until the gateway answers, completing the enable → start → verify cycle. New `patter hermes test` — acceptance, not just preflight: GET /v1/models, send a real /v1/chat/completions turn with the X-Hermes-Session-Id header and report the latency + reply snippet, confirm HermesLLM is constructible, and check the STT/TTS keys. Exit non-zero on any blocker. New `patter hermes trace [call]` / `diagnose [call]` — read the on-disk per-call log (PATTER_LOG_DIR; services/call_log.py) and classify the pipeline stage by stage (carrier → STT → Hermes → TTS), with a latency breakdown. `diagnose` applies a decision tree and names the first broken stage with a fix, e.g. "Hermes replied but no audio — TTS stage. Check ELEVENLABS_API_KEY / REST transport." Defaults to the latest call; accepts a call_id or a directory. Note: item #3 (auto-attach the tunnel URL to the carrier) is already handled by the SDK — serve() auto-configures the Twilio/Plivo webhook once the tunnel is up (server.py) — so the scaffold app does it on `python app.py`; documented. Scaffold now sets PATTER_LOG_DIR and documents test/trace/diagnose; example dir regenerated. TS CLI stub lists the new subcommands. +15 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
Bring in the anonymous opt-out telemetry work (schema v5: stack/cost/install-id, deploy-shape, feature-adoption, upgrade funnel, CLI usage, call funnel, and the `getpatter telemetry status|disable|enable` command) that landed on main after this branch was cut. Conflicts resolved: - cli.py / cli.ts: keep both the `hermes` wizard and the `telemetry` command. - CHANGELOG.md: telemetry "Added" entries placed under Added, above Fixed. - README.md: replaced the long "Anonymous Telemetry" section with the short opt-out note requested earlier, and removed the duplicate section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
Closing in favor of #161, which is a strict superset of this branch. This PR ( The base layer (Hermes DX: TS namespace exports, caller-hash session key, long-turn filler) already landed on main via #159. Merging both this PR and #161 would duplicate those changes, so we are consolidating into #161 alone. The branch is left intact and this PR can be reopened if needed. |
Summary
This PR ships four interconnected features for the Hermes phone-agent integration:
patter hermes doctor|setup|attach-number|numbers) — zero-config scaffolding, preflight checks, and Twilio wiring for a self-hosted Hermes voice shell.X-Hermes-Session-Key) from caller identity, enabling durable per-caller memory without exposing raw phone numbers.Changes
Hermes CLI (
patter hermes ...)cli_hermes.py(1393 lines):doctor(preflight checks),setup(scaffold + Twilio attach),attach-number,numberscommands._hermes_scaffold.py(351 lines): Single source of truth for the starter project files (app.py, .env.example, docker-compose.yml, etc.).examples/hermes-phone-agent/: Committed scaffold tree (README, app.py, scripts, docker-compose.yml, .env.example).cli.py: Wiresbuild_hermes_parseranddispatch_hermes.test_hermes_cli.py(482 lines): Scaffold sync, doctor checks, Twilio API mocking.Session-key factory (Feature #7)
models.py: Addedhash_caller(caller: str | None) -> str | None— stable, non-reversible 64-bit hash.openai_compatible.py: Newsession_key_factoryparameter (callable) that derives the session-key header value per call fromSessionContext(call_id, caller, callee, caller_hash).llm_loop.py: Refactored_stream_accepts_call_id→_stream_accepted_context_kwargsto thread caller/callee alongside call_id; detects which kwargs each provider accepts.openai-compatible.ts: MirroredhashCaller,SessionContexttype,sessionKeyFactoryparameter.llm-loop.ts: Addedcaller/calleetoLLMStreamOptions.types.ts: NewSessionContextinterface.hermes.py/hermes.ts): ConveniencesessionKeyFrom: "caller_hash"option that auto-installs a factory.test_llm_session_key_factory.py(367 lines) +llm-session-key-factory.mocked.test.ts(286 lines): Factory resolution, context threading, header emission.import { hermes, openclaw, openaiCompatible } from "getpatter"now works in TypeScript; Python already hadfrom getpatter.llm import hermes.Long-turn filler (Feature #8)
client.py: Newlong_turn_message/long_turn_message_after_sparameters onagent().stream_handler.py:PipelineStreamHandler._process_streaming_responseschedules a filler task if no audio reaches the carrier after the timeout; filler is spoken via the same_synthesize_sentenceprimitive.stream-handler.ts: Mirrored `longTurnMessagehttps://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts