Tts fallback#797
Conversation
- Make fallback provider-agnostic (remove soniox hardcode) - Log EndFrame errors instead of silently swallowing them - Move FallbackSettings dataclass and _FALLBACK_DEFAULTS to services/fallback - BB_FALLBACK_CONFIG returns typed FallbackSettings from services/fallback - BB_FALLBACK_RAW_CONFIG in dynamic.py returns raw dict via json.loads pattern - Remove no_delay from DeepgramConfig constructor (field not supported by pipecat) - Deduplicate mid-call STT alert with _mid_call_alert_sent guard - Fix reset alert timing: poll every 60s via notify_on_expiry() instead of deleting active key early; Redis TTL is sole authority on fallback expiry
- app/services/fallback/__init__.py: enable tts in _FALLBACK_DEFAULTS (fallback_provider=cartesia), add check_and_reset_tts_fallback() poller, update initialize_fallback_tasks() to register both STT and TTS reset tasks independently - app/ai/voice/agents/breeze_buddy/tts/__init__.py: add TTSServiceResult and get_tts_service_with_fallback() — proactive routing when circuit is open, init-time fallback with record_failure() on primary error - app/ai/voice/agents/breeze_buddy/agent/pipeline.py: use get_tts_service_with_fallback() in create_services(), return TTSServiceResult in third position - app/ai/voice/agents/breeze_buddy/agent/__init__.py: store tts_provider from TTSServiceResult, add _tts_failure_recorded / _mid_call_tts_alert_sent state, detect TTS processor errors in on_pipeline_error, add _send_mid_call_tts_alert() Slack alert helper
WalkthroughThis PR introduces a Redis-backed circuit-breaker fallback system for STT and TTS services in the Breeze Buddy voice agent. When a service exceeds configured failure thresholds, it automatically routes to a fallback provider. The Agent detects mid-call failures, records them to the circuit, and sends Slack alerts without blocking call termination. ChangesService Fallback Circuit Breaker and Configuration
STT Service Fallback Integration
TTS Service Fallback Integration
Agent-Level Failure Detection and Mid-Call Alerting
Pipeline Service Creation Updates
Configuration, Utilities, and Monitoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces a Redis-backed circuit-breaker style fallback mechanism for STT/TTS, with Slack alerting and background reset tasks, and wires it into Breeze Buddy’s service creation and error handling.
Changes:
- Add a generic Redis-backed
ServiceFallbackwith failure counting, activation TTL, and Slack alerts; register background tasks to notify on fallback expiry. - Add STT/TTS “build with fallback” routing so calls can proactively use fallback providers or fall back on init failures.
- Extend Slack alert tagging and live-config type conversion utilities to support new configuration and alerting needs.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| app/services/slack/alert.py | Adds per-call override for Slack user tagging. |
| app/services/live_config/utils.py | Adds dict parsing support to dynamic type conversion. |
| app/services/fallback/init.py | New Redis-backed fallback state machine, Slack alerts, and background tasks. |
| app/main.py | Registers fallback reset background tasks at app startup. |
| app/core/config/dynamic.py | Adds dynamic config accessors for STT provider and fallback config JSON. |
| app/ai/voice/stt/soniox/config.py | Adds Soniox reconnect-on-error configuration passthrough. |
| app/ai/voice/agents/breeze_buddy/utils/common.py | Adds fire_and_forget helper to keep async tasks from being GC’d. |
| app/ai/voice/agents/breeze_buddy/tts/init.py | Adds TTS init-time fallback routing and result wrapper. |
| app/ai/voice/agents/breeze_buddy/stt/init.py | Adds STT init-time fallback routing and result wrapper; updates legacy provider selection. |
| app/ai/voice/agents/breeze_buddy/agent/pipeline.py | Updates pipeline service creation to use fallback-enabled STT/TTS builders. |
| app/ai/voice/agents/breeze_buddy/agent/init.py | Records mid-call STT/TTS failures into fallback circuit and ends call with alerts. |
| async def _send_failure_alert(self, count: int, error_msg: str) -> None: | ||
| provider = self.config.primary_provider_name.capitalize() | ||
| threshold = self.config.failure_threshold | ||
| try: | ||
| await slack_alert.send( | ||
| title=f"⚠️ STT Failure on {provider} ({count}/{threshold})", | ||
| fields=[{"name": "Fail Count", "value": f"{count}/{threshold}"}], | ||
| sections=( | ||
| [{"title": "Error", "text": f"```{error_msg}```"}] |
| async def record_failure( | ||
| self, | ||
| error_msg: str = "", | ||
| call_sid: str = "", | ||
| context: str = "unknown", | ||
| ) -> bool: |
| if active: | ||
| # Still within the fallback window — record that we've seen it. | ||
| await redis.set(self._key_seen_active, "1") | ||
| return |
| def fire_and_forget(coro) -> None: | ||
| """Schedule a coroutine as a fire-and-forget task, preventing GC.""" | ||
| task = asyncio.create_task(coro) | ||
| _background_tasks.add(task) | ||
| task.add_done_callback(_background_tasks.discard) |
| elif target_type == dict: | ||
| if isinstance(value, dict): | ||
| return value | ||
| try: | ||
| return json.loads(value) | ||
| except (ValueError, TypeError): | ||
| return None |
| # Slack tag used for all fallback alerts | ||
| _FALLBACK_TAG = "@breeze-sentinals" | ||
| _ALERT_TAG = f"{_FALLBACK_TAG},{SLACK_TAG_USERS}" if SLACK_TAG_USERS else _FALLBACK_TAG |
| async def _send_mid_call_stt_alert(self) -> None: | ||
| """Send Slack alert when STT fails mid-call and call must end.""" | ||
| from app.core.config.static import SLACK_TAG_USERS | ||
|
|
||
| _fallback_tag = "@breeze-sentinals" | ||
| tag = f"{_fallback_tag},{SLACK_TAG_USERS}" if SLACK_TAG_USERS else _fallback_tag |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (5)
app/ai/voice/agents/breeze_buddy/utils/common.py (1)
23-27: 💤 Low valueAnnotate the
coroparameter.The GC-retention pattern is correct, but the
coroparameter is unannotated. Add a type hint to satisfy the signature-typing requirement.♻️ Proposed type hint
-from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Coroutine, Dict, List, Optional, Tuple, cast @@ -def fire_and_forget(coro) -> None: +def fire_and_forget(coro: Coroutine[Any, Any, Any]) -> None: """Schedule a coroutine as a fire-and-forget task, preventing GC."""As per coding guidelines: "Add type hints on all function signatures".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/utils/common.py` around lines 23 - 27, Annotate the untyped coroutine parameter in fire_and_forget: change the signature to accept a typed coroutine (e.g., def fire_and_forget(coro: Coroutine[Any, Any, Any]) -> None) and add the required imports (from typing import Any, Coroutine) at the top; keep the existing GC-retention logic using _background_tasks and task.add_done_callback unchanged.app/main.py (1)
184-186: 💤 Low valueComment is inaccurate.
The comment says "STT fallback reset tasks" but
initialize_fallback_tasksregisters both STT and TTS fallback tasks.📝 Proposed fix
- # Initialize STT fallback reset tasks + # Initialize STT/TTS fallback reset tasks await initialize_fallback_tasks(_background_scheduler)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/main.py` around lines 184 - 186, The comment above the call to initialize_fallback_tasks is misleading—update it to accurately state that the function registers both STT and TTS fallback reset tasks; locate the call to initialize_fallback_tasks (and the preceding comment string) and change the comment text from "STT fallback reset tasks" to something like "STT and TTS fallback reset tasks" (or similar concise wording) so it reflects both responsibilities.app/ai/voice/agents/breeze_buddy/tts/__init__.py (1)
214-219: ⚡ Quick winUse
@dataclassfor consistency withSTTServiceResult.
STTServiceResultinstt/__init__.pyuses@dataclass, butTTSServiceResultis a plain class. Using dataclass reduces boilerplate and ensures consistent behavior (automatic__repr__,__eq__, etc.).Proposed fix
Add the import at the top of the file:
from dataclasses import dataclassThen replace the class:
+@dataclass class TTSServiceResult: """Wraps a TTS service instance with the resolved provider name.""" - - def __init__(self, provider: str, service: object): - self.provider = provider - self.service = service + provider: str + service: object🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/tts/__init__.py` around lines 214 - 219, TTSServiceResult is implemented as a plain class while STTServiceResult uses `@dataclass`; convert TTSServiceResult to a dataclass to reduce boilerplate and ensure consistent behavior by importing dataclass from dataclasses and decorating the TTSServiceResult class with `@dataclass` and defining provider: str and service: object as annotated fields (leave class name TTSServiceResult intact so usages remain valid).app/ai/voice/agents/breeze_buddy/agent/pipeline.py (1)
108-122: ⚡ Quick winUpdate docstring and return type to reflect result objects.
The function now returns
STTServiceResultandTTSServiceResultwrapper objects instead of raw services. The docstring and return type annotation should be updated for clarity:Proposed fix
+from app.ai.voice.agents.breeze_buddy.stt import STTServiceResult + async def create_services( configurations: Optional[ConfigurationModel], include_llm: bool = True, -) -> tuple[Optional[Any], Optional[Any], Optional[Any]]: +) -> tuple[Optional[STTServiceResult], Optional[Any], Optional[TTSServiceResult]]: """Create STT, LLM, and TTS services. Args: configurations: Template configuration model include_llm: When False, skip LLM creation (stream mode). LLM will be None. Returns: - Tuple of (stt_service, llm_service_or_None, tts_service). For realtime - / speech-to-speech LLMs, both ``stt_service`` and ``tts_service`` are + Tuple of (stt_result, llm_service_or_None, tts_result). For realtime + / speech-to-speech LLMs, both ``stt_result`` and ``tts_result`` are ``None`` because the realtime LLM handles audio in/out natively. """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/agent/pipeline.py` around lines 108 - 122, The create_services signature and docstring must reflect that STT and TTS are returned as wrapper objects: change the return type annotation to tuple[Optional[STTServiceResult], Optional[Any], Optional[TTSServiceResult]] (or the concrete LLM result type if available) and update the docstring to state that the first element is an STTServiceResult (or None), the second is the LLM service/result (or None when include_llm is False or when a realtime LLM handles audio), and the third is a TTSServiceResult (or None); update mentions of stt_service and tts_service in the docstring to indicate they are wrapper/result objects and keep ConfigurationModel and include_llm description intact.app/ai/voice/agents/breeze_buddy/agent/__init__.py (1)
728-745: 💤 Low valueConsider extracting keyword tuples to module-level constants.
The hardcoded keyword tuples for detecting STT vs TTS errors are inline within the handler. Extracting these to module-level constants would improve maintainability when new providers are added and make the handler body more readable.
♻️ Example refactor
At module level (after imports):
_STT_PROCESSOR_KEYWORDS = ("stt", "soniox", "deepgram", "transcri", "google", "sarvam") _TTS_PROCESSOR_KEYWORDS = ("tts", "elevenlabs", "cartesia", "gemini")Then in the handler:
- stt_keywords = ( - "stt", - "soniox", - "deepgram", - "transcri", - "google", - "sarvam", - ) - tts_keywords = ( - "tts", - "elevenlabs", - "cartesia", - "gemini", - ) - is_stt_error = any(kw in processor_str for kw in stt_keywords) - is_tts_error = any(kw in processor_str for kw in tts_keywords) + is_stt_error = any(kw in processor_str for kw in _STT_PROCESSOR_KEYWORDS) + is_tts_error = any(kw in processor_str for kw in _TTS_PROCESSOR_KEYWORDS)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/agent/__init__.py` around lines 728 - 745, Extract the inline keyword tuples into module-level constants and reference them from the handler: create constants (e.g. _STT_PROCESSOR_KEYWORDS = ("stt", "soniox", "deepgram", "transcri", "google", "sarvam") and _TTS_PROCESSOR_KEYWORDS = ("tts", "elevenlabs", "cartesia", "gemini")) near the top of the module (after imports), then replace the inline tuples used to compute processor_str, is_stt_error and is_tts_error with these constants so the handler reads is_stt_error = any(kw in processor_str for kw in _STT_PROCESSOR_KEYWORDS) and is_tts_error = any(kw in processor_str for kw in _TTS_PROCESSOR_KEYWORDS).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/ai/voice/agents/breeze_buddy/agent/__init__.py`:
- Around line 319-343: The fallback Slack tag in _send_mid_call_tts_alert
contains a typo ("`@breeze-sentinals`"); update the _fallback_tag value to
"`@breeze-sentinels`" so the tag correctly reads "sentinels" (ensure
SLACK_TAG_USERS concatenation logic with tag remains unchanged), then run tests
or a quick manual verify to confirm alerts now target the correct user group.
- Around line 293-318: Fix the typo in the fallback Slack tag inside async
method _send_mid_call_stt_alert: change the string value of _fallback_tag from
"`@breeze-sentinals`" to the correct "`@breeze-sentinels`" so the tag variable used
in slack_alert.send() correctly targets the intended Slack user group; update
the _fallback_tag declaration near the top of _send_mid_call_stt_alert to the
corrected spelling.
In `@app/ai/voice/agents/breeze_buddy/tts/__init__.py`:
- Line 42: The import line in app.ai.voice.agents.breeze_buddy.tts.__init__.py
is over 88 chars and breaks Black formatting; split the import from
app.services.fallback across multiple lines (either using parentheses or
one-per-line) so the names BB_FALLBACK_CONFIG, ServiceFallback, and
ServiceFallbackConfig are each wrapped to respect line-length (follow the same
multiline pattern used in stt.__init__.py).
In `@app/services/fallback/__init__.py`:
- Line 24: The import line importing BB_FALLBACK_RAW_CONFIG, BB_STT_SERVICE, and
BB_TTS_SERVICE from app.core.config.dynamic needs to be reformatted to satisfy
Black (split across multiple lines); update the import in __init__.py so each
imported symbol (or a logical grouping) is on its own line or use parentheses
with line breaks around the names to conform to Black's multi-line import
formatting for the module import statement.
- Around line 148-167: The alert text in ServiceFallback._send_failure_alert
incorrectly hardcodes "STT"; change it to use the service name from the config
(e.g., service = self.config.service_name.capitalize()) and substitute that
variable into both the title and fallback_text (and any other hardcoded "STT"
occurrences) so alerts reflect the actual service (TTS or STT); keep provider
usage (self.config.primary_provider_name) and existing count/threshold
formatting as-is and preserve the error section and exception handling.
- Around line 275-276: The fallback branch that does `count = await
redis.incr(self._key_failure_count)` when `run_script` returns None must also
set an expiry so the counter doesn't persist forever; after incrementing the key
call the Redis expire command (e.g., `await
redis.expire(self._key_failure_count, ...)`) using the same TTL used elsewhere
for failure counters (for example `self._failure_window_seconds` or the existing
TTL constant) so the key gets a time-to-live consistent with the normal path.
- Around line 402-416: The reset path can race because multiple pods can see
seen_active=True and each delete the sentinel then call _clear_provider_health
and _send_reset_alert; modify the block that reads/deletes self._key_seen_active
to perform an atomic check-and-delete (use redis.getdel(self._key_seen_active)
if available, or execute a small Lua script that returns and deletes the key in
one step) and only proceed with _clear_provider_health and _send_reset_alert
when the atomic operation indicates this caller actually removed the sentinel;
alternatively implement an NX-based deduplication key (similar to _key_notified)
that you set with SET NX and TTL and only the owner proceeds to call
_send_reset_alert. Ensure you reference and update uses of
self._key_seen_active, _clear_provider_health, and _send_reset_alert
accordingly.
In `@app/services/live_config/utils.py`:
- Around line 39-45: The branch handling target_type == dict must guard the
json.loads result against non-dict JSON; change the block in the function that
checks target_type == dict so you first check target_type is dict (to satisfy
ruff) then if value is a dict return it, otherwise parse with parsed =
json.loads(value) and return parsed only if isinstance(parsed, dict) else return
None; this ensures json scalars/arrays don't slip through and preserves the
original behavior when value is already a dict.
---
Nitpick comments:
In `@app/ai/voice/agents/breeze_buddy/agent/__init__.py`:
- Around line 728-745: Extract the inline keyword tuples into module-level
constants and reference them from the handler: create constants (e.g.
_STT_PROCESSOR_KEYWORDS = ("stt", "soniox", "deepgram", "transcri", "google",
"sarvam") and _TTS_PROCESSOR_KEYWORDS = ("tts", "elevenlabs", "cartesia",
"gemini")) near the top of the module (after imports), then replace the inline
tuples used to compute processor_str, is_stt_error and is_tts_error with these
constants so the handler reads is_stt_error = any(kw in processor_str for kw in
_STT_PROCESSOR_KEYWORDS) and is_tts_error = any(kw in processor_str for kw in
_TTS_PROCESSOR_KEYWORDS).
In `@app/ai/voice/agents/breeze_buddy/agent/pipeline.py`:
- Around line 108-122: The create_services signature and docstring must reflect
that STT and TTS are returned as wrapper objects: change the return type
annotation to tuple[Optional[STTServiceResult], Optional[Any],
Optional[TTSServiceResult]] (or the concrete LLM result type if available) and
update the docstring to state that the first element is an STTServiceResult (or
None), the second is the LLM service/result (or None when include_llm is False
or when a realtime LLM handles audio), and the third is a TTSServiceResult (or
None); update mentions of stt_service and tts_service in the docstring to
indicate they are wrapper/result objects and keep ConfigurationModel and
include_llm description intact.
In `@app/ai/voice/agents/breeze_buddy/tts/__init__.py`:
- Around line 214-219: TTSServiceResult is implemented as a plain class while
STTServiceResult uses `@dataclass`; convert TTSServiceResult to a dataclass to
reduce boilerplate and ensure consistent behavior by importing dataclass from
dataclasses and decorating the TTSServiceResult class with `@dataclass` and
defining provider: str and service: object as annotated fields (leave class name
TTSServiceResult intact so usages remain valid).
In `@app/ai/voice/agents/breeze_buddy/utils/common.py`:
- Around line 23-27: Annotate the untyped coroutine parameter in
fire_and_forget: change the signature to accept a typed coroutine (e.g., def
fire_and_forget(coro: Coroutine[Any, Any, Any]) -> None) and add the required
imports (from typing import Any, Coroutine) at the top; keep the existing
GC-retention logic using _background_tasks and task.add_done_callback unchanged.
In `@app/main.py`:
- Around line 184-186: The comment above the call to initialize_fallback_tasks
is misleading—update it to accurately state that the function registers both STT
and TTS fallback reset tasks; locate the call to initialize_fallback_tasks (and
the preceding comment string) and change the comment text from "STT fallback
reset tasks" to something like "STT and TTS fallback reset tasks" (or similar
concise wording) so it reflects both responsibilities.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 61f0fbaa-099b-4887-96d5-af5c153ce9a4
📒 Files selected for processing (11)
app/ai/voice/agents/breeze_buddy/agent/__init__.pyapp/ai/voice/agents/breeze_buddy/agent/pipeline.pyapp/ai/voice/agents/breeze_buddy/stt/__init__.pyapp/ai/voice/agents/breeze_buddy/tts/__init__.pyapp/ai/voice/agents/breeze_buddy/utils/common.pyapp/ai/voice/stt/soniox/config.pyapp/core/config/dynamic.pyapp/main.pyapp/services/fallback/__init__.pyapp/services/live_config/utils.pyapp/services/slack/alert.py
| async def _send_mid_call_stt_alert(self) -> None: | ||
| """Send Slack alert when STT fails mid-call and call must end.""" | ||
| from app.core.config.static import SLACK_TAG_USERS | ||
|
|
||
| _fallback_tag = "@breeze-sentinals" | ||
| tag = f"{_fallback_tag},{SLACK_TAG_USERS}" if SLACK_TAG_USERS else _fallback_tag | ||
| provider = (self.stt_provider or "unknown").capitalize() | ||
| try: | ||
| await slack_alert.send( | ||
| title="🚨 STT Failed — Call Ended (Breeze Buddy)", | ||
| fields=[ | ||
| {"name": "Provider", "value": provider}, | ||
| {"name": "Call SID", "value": self.call_sid or "unknown"}, | ||
| ], | ||
| sections=[ | ||
| { | ||
| "title": "What Happened", | ||
| "text": "STT failed mid-call. Call could not continue.", | ||
| } | ||
| ], | ||
| fallback_text=f"STT failed, call ended — {self.call_sid or 'unknown'}", | ||
| tag_users=tag, | ||
| ) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to send mid-call STT alert: {e}") | ||
|
|
There was a problem hiding this comment.
Typo in Slack tag: "sentinals" should be "sentinels".
The tag @breeze-sentinals appears to be misspelled. This will result in incorrect Slack user group tagging.
✏️ Proposed fix
async def _send_mid_call_stt_alert(self) -> None:
"""Send Slack alert when STT fails mid-call and call must end."""
from app.core.config.static import SLACK_TAG_USERS
- _fallback_tag = "`@breeze-sentinals`"
+ _fallback_tag = "`@breeze-sentinels`"
tag = f"{_fallback_tag},{SLACK_TAG_USERS}" if SLACK_TAG_USERS else _fallback_tag🧰 Tools
🪛 Ruff (0.15.14)
[warning] 316-316: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/ai/voice/agents/breeze_buddy/agent/__init__.py` around lines 293 - 318,
Fix the typo in the fallback Slack tag inside async method
_send_mid_call_stt_alert: change the string value of _fallback_tag from
"`@breeze-sentinals`" to the correct "`@breeze-sentinels`" so the tag variable used
in slack_alert.send() correctly targets the intended Slack user group; update
the _fallback_tag declaration near the top of _send_mid_call_stt_alert to the
corrected spelling.
| async def _send_mid_call_tts_alert(self) -> None: | ||
| """Send Slack alert when TTS fails mid-call and call must end.""" | ||
| from app.core.config.static import SLACK_TAG_USERS | ||
|
|
||
| _fallback_tag = "@breeze-sentinals" | ||
| tag = f"{_fallback_tag},{SLACK_TAG_USERS}" if SLACK_TAG_USERS else _fallback_tag | ||
| provider = (self.tts_provider or "unknown").capitalize() | ||
| try: | ||
| await slack_alert.send( | ||
| title="🚨 TTS Failed — Call Ended (Breeze Buddy)", | ||
| fields=[ | ||
| {"name": "Provider", "value": provider}, | ||
| {"name": "Call SID", "value": self.call_sid or "unknown"}, | ||
| ], | ||
| sections=[ | ||
| { | ||
| "title": "What Happened", | ||
| "text": "TTS failed mid-call. Call could not continue.", | ||
| } | ||
| ], | ||
| fallback_text=f"TTS failed, call ended — {self.call_sid or 'unknown'}", | ||
| tag_users=tag, | ||
| ) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to send mid-call TTS alert: {e}") |
There was a problem hiding this comment.
Same typo in TTS alert method.
Same "sentinals" → "sentinels" fix needed here.
✏️ Proposed fix
async def _send_mid_call_tts_alert(self) -> None:
"""Send Slack alert when TTS fails mid-call and call must end."""
from app.core.config.static import SLACK_TAG_USERS
- _fallback_tag = "`@breeze-sentinals`"
+ _fallback_tag = "`@breeze-sentinels`"
tag = f"{_fallback_tag},{SLACK_TAG_USERS}" if SLACK_TAG_USERS else _fallback_tag🧰 Tools
🪛 Ruff (0.15.14)
[warning] 342-342: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/ai/voice/agents/breeze_buddy/agent/__init__.py` around lines 319 - 343,
The fallback Slack tag in _send_mid_call_tts_alert contains a typo
("`@breeze-sentinals`"); update the _fallback_tag value to "`@breeze-sentinels`" so
the tag correctly reads "sentinels" (ensure SLACK_TAG_USERS concatenation logic
with tag remains unchanged), then run tests or a quick manual verify to confirm
alerts now target the correct user group.
| SARVAM_API_KEY, | ||
| ) | ||
| from app.core.logger import logger | ||
| from app.services.fallback import BB_FALLBACK_CONFIG, ServiceFallback, ServiceFallbackConfig |
There was a problem hiding this comment.
Fix Black formatting violation.
This import line exceeds the 88-character limit. Split it across multiple lines to match the pattern in stt/__init__.py:
Proposed fix
-from app.services.fallback import BB_FALLBACK_CONFIG, ServiceFallback, ServiceFallbackConfig
+from app.services.fallback import (
+ BB_FALLBACK_CONFIG,
+ ServiceFallback,
+ ServiceFallbackConfig,
+)As per coding guidelines: Use Black for code formatting with line-length=88.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from app.services.fallback import BB_FALLBACK_CONFIG, ServiceFallback, ServiceFallbackConfig | |
| from app.services.fallback import ( | |
| BB_FALLBACK_CONFIG, | |
| ServiceFallback, | |
| ServiceFallbackConfig, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/ai/voice/agents/breeze_buddy/tts/__init__.py` at line 42, The import line
in app.ai.voice.agents.breeze_buddy.tts.__init__.py is over 88 chars and breaks
Black formatting; split the import from app.services.fallback across multiple
lines (either using parentheses or one-per-line) so the names
BB_FALLBACK_CONFIG, ServiceFallback, and ServiceFallbackConfig are each wrapped
to respect line-length (follow the same multiline pattern used in
stt.__init__.py).
| from dataclasses import dataclass | ||
|
|
||
| from app.core.background_tasks import BackgroundTaskScheduler | ||
| from app.core.config.dynamic import BB_FALLBACK_RAW_CONFIG, BB_STT_SERVICE, BB_TTS_SERVICE |
There was a problem hiding this comment.
Fix Black formatting to resolve pipeline failure.
The pipeline indicates Black would reformat this multi-line import. Split across multiple lines per Black's formatting rules.
🔧 Proposed fix
-from app.core.config.dynamic import BB_FALLBACK_RAW_CONFIG, BB_STT_SERVICE, BB_TTS_SERVICE
+from app.core.config.dynamic import (
+ BB_FALLBACK_RAW_CONFIG,
+ BB_STT_SERVICE,
+ BB_TTS_SERVICE,
+)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/services/fallback/__init__.py` at line 24, The import line importing
BB_FALLBACK_RAW_CONFIG, BB_STT_SERVICE, and BB_TTS_SERVICE from
app.core.config.dynamic needs to be reformatted to satisfy Black (split across
multiple lines); update the import in __init__.py so each imported symbol (or a
logical grouping) is on its own line or use parentheses with line breaks around
the names to conform to Black's multi-line import formatting for the module
import statement.
| async def _send_failure_alert(self, count: int, error_msg: str) -> None: | ||
| provider = self.config.primary_provider_name.capitalize() | ||
| threshold = self.config.failure_threshold | ||
| try: | ||
| await slack_alert.send( | ||
| title=f"⚠️ STT Failure on {provider} ({count}/{threshold})", | ||
| fields=[{"name": "Fail Count", "value": f"{count}/{threshold}"}], | ||
| sections=( | ||
| [{"title": "Error", "text": f"```{error_msg}```"}] | ||
| if error_msg | ||
| else [] | ||
| ), | ||
| fallback_text=f"STT failure on {provider} ({count}/{threshold})", | ||
| tag_users=_ALERT_TAG, | ||
| ) | ||
| except Exception as e: | ||
| logger.warning( | ||
| f"Service fallback ({self.config.service_name}) " | ||
| f"failure alert failed: {e}" | ||
| ) |
There was a problem hiding this comment.
Hardcoded "STT" in failure alert breaks TTS fallback notifications.
This generic ServiceFallback class is used for both STT and TTS services, but _send_failure_alert hardcodes "STT" in the title and fallback text. When a TTS failure occurs, operators will receive confusing alerts saying "STT Failure on Cartesia".
🐛 Proposed fix
async def _send_failure_alert(self, count: int, error_msg: str) -> None:
provider = self.config.primary_provider_name.capitalize()
threshold = self.config.failure_threshold
+ service = self.config.service_name.upper()
try:
await slack_alert.send(
- title=f"⚠️ STT Failure on {provider} ({count}/{threshold})",
+ title=f"⚠️ {service} Failure on {provider} ({count}/{threshold})",
fields=[{"name": "Fail Count", "value": f"{count}/{threshold}"}],
sections=(
[{"title": "Error", "text": f"```{error_msg}```"}]
if error_msg
else []
),
- fallback_text=f"STT failure on {provider} ({count}/{threshold})",
+ fallback_text=f"{service} failure on {provider} ({count}/{threshold})",
tag_users=_ALERT_TAG,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def _send_failure_alert(self, count: int, error_msg: str) -> None: | |
| provider = self.config.primary_provider_name.capitalize() | |
| threshold = self.config.failure_threshold | |
| try: | |
| await slack_alert.send( | |
| title=f"⚠️ STT Failure on {provider} ({count}/{threshold})", | |
| fields=[{"name": "Fail Count", "value": f"{count}/{threshold}"}], | |
| sections=( | |
| [{"title": "Error", "text": f"```{error_msg}```"}] | |
| if error_msg | |
| else [] | |
| ), | |
| fallback_text=f"STT failure on {provider} ({count}/{threshold})", | |
| tag_users=_ALERT_TAG, | |
| ) | |
| except Exception as e: | |
| logger.warning( | |
| f"Service fallback ({self.config.service_name}) " | |
| f"failure alert failed: {e}" | |
| ) | |
| async def _send_failure_alert(self, count: int, error_msg: str) -> None: | |
| provider = self.config.primary_provider_name.capitalize() | |
| threshold = self.config.failure_threshold | |
| service = self.config.service_name.upper() | |
| try: | |
| await slack_alert.send( | |
| title=f"⚠️ {service} Failure on {provider} ({count}/{threshold})", | |
| fields=[{"name": "Fail Count", "value": f"{count}/{threshold}"}], | |
| sections=( | |
| [{"title": "Error", "text": f" |
🧰 Tools
🪛 Ruff (0.15.14)
[warning] 163-163: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/services/fallback/__init__.py` around lines 148 - 167, The alert text in
ServiceFallback._send_failure_alert incorrectly hardcodes "STT"; change it to
use the service name from the config (e.g., service =
self.config.service_name.capitalize()) and substitute that variable into both
the title and fallback_text (and any other hardcoded "STT" occurrences) so
alerts reflect the actual service (TTS or STT); keep provider usage
(self.config.primary_provider_name) and existing count/threshold formatting
as-is and preserve the error section and exception handling.
| if count is None: | ||
| count = await redis.incr(self._key_failure_count) |
There was a problem hiding this comment.
Fallback path omits TTL, risking permanent counter.
If run_script returns None (e.g., scripting disabled or error), the fallback incr increments the counter but never sets the TTL. This counter could persist indefinitely, causing incorrect threshold calculations or permanent fallback activation.
🐛 Proposed fix: Set TTL in fallback path
if count is None:
- count = await redis.incr(self._key_failure_count)
+ # Lua script unavailable - fallback to separate INCR + EXPIRE
+ count = await redis.incr(self._key_failure_count)
+ if count == 1:
+ await redis.expire(
+ self._key_failure_count, self.config.failure_window_secs
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/services/fallback/__init__.py` around lines 275 - 276, The fallback
branch that does `count = await redis.incr(self._key_failure_count)` when
`run_script` returns None must also set an expiry so the counter doesn't persist
forever; after incrementing the key call the Redis expire command (e.g., `await
redis.expire(self._key_failure_count, ...)`) using the same TTL used elsewhere
for failure counters (for example `self._failure_window_seconds` or the existing
TTL constant) so the key gets a time-to-live consistent with the normal path.
| # Not active — check if it *was* active (sentinel present). | ||
| seen = bool(await redis.exists(self._key_seen_active)) | ||
| if not seen: | ||
| return # Never activated during this server lifetime. | ||
|
|
||
| # TTL just expired — clear sentinel and fire the reset alert. | ||
| await redis.delete(self._key_seen_active) | ||
| await self._clear_provider_health( | ||
| redis, self.config.primary_provider_name.lower() | ||
| ) | ||
| logger.info( | ||
| f"Service fallback ({self.config.service_name}) TTL expired — " | ||
| "sending reset alert" | ||
| ) | ||
| await self._send_reset_alert() |
There was a problem hiding this comment.
Race condition may cause duplicate reset alerts.
Multiple pods polling simultaneously can all observe active=False and seen_active=True, then each delete the sentinel and send the reset alert. Unlike activation (which uses NX), the reset path has no atomic guard.
🐛 Proposed fix: Use atomic getdel or NX deduplication
- # Not active — check if it *was* active (sentinel present).
- seen = bool(await redis.exists(self._key_seen_active))
- if not seen:
- return # Never activated during this server lifetime.
-
- # TTL just expired — clear sentinel and fire the reset alert.
- await redis.delete(self._key_seen_active)
+ # Not active — atomically check-and-clear sentinel.
+ # GETDEL returns the value if it existed and deletes it atomically.
+ seen = await redis.getdel(self._key_seen_active)
+ if not seen:
+ return # Never activated or another pod already handled reset.
+
+ # TTL just expired — we won the race; fire the reset alert.
await self._clear_provider_health(
redis, self.config.primary_provider_name.lower()
)If getdel is unavailable, use a Lua script or NX-based dedup key similar to _key_notified.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/services/fallback/__init__.py` around lines 402 - 416, The reset path can
race because multiple pods can see seen_active=True and each delete the sentinel
then call _clear_provider_health and _send_reset_alert; modify the block that
reads/deletes self._key_seen_active to perform an atomic check-and-delete (use
redis.getdel(self._key_seen_active) if available, or execute a small Lua script
that returns and deletes the key in one step) and only proceed with
_clear_provider_health and _send_reset_alert when the atomic operation indicates
this caller actually removed the sentinel; alternatively implement an NX-based
deduplication key (similar to _key_notified) that you set with SET NX and TTL
and only the owner proceeds to call _send_reset_alert. Ensure you reference and
update uses of self._key_seen_active, _clear_provider_health, and
_send_reset_alert accordingly.
| elif target_type == dict: | ||
| if isinstance(value, dict): | ||
| return value | ||
| try: | ||
| return json.loads(value) | ||
| except (ValueError, TypeError): | ||
| return None |
There was a problem hiding this comment.
json.loads can return a non-dict for target_type == dict.
A JSON scalar/array (e.g. "5" → int, "[1,2]" → list) parses successfully but isn't a dict, so the function returns a value that violates the requested target type. Guard the parsed result with isinstance.
🛡️ Proposed guard
elif target_type == dict:
if isinstance(value, dict):
return value
try:
- return json.loads(value)
+ parsed = json.loads(value)
+ return parsed if isinstance(parsed, dict) else None
except (ValueError, TypeError):
return NoneNote: Ruff also flags target_type == dict (E721) on Line 39; is/isinstance is preferred, though the rest of this function uses == for consistency — align as you see fit.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif target_type == dict: | |
| if isinstance(value, dict): | |
| return value | |
| try: | |
| return json.loads(value) | |
| except (ValueError, TypeError): | |
| return None | |
| elif target_type == dict: | |
| if isinstance(value, dict): | |
| return value | |
| try: | |
| parsed = json.loads(value) | |
| return parsed if isinstance(parsed, dict) else None | |
| except (ValueError, TypeError): | |
| return None |
🧰 Tools
🪛 Ruff (0.15.14)
[error] 39-39: Use is and is not for type comparisons, or isinstance() for isinstance checks
(E721)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/services/live_config/utils.py` around lines 39 - 45, The branch handling
target_type == dict must guard the json.loads result against non-dict JSON;
change the block in the function that checks target_type == dict so you first
check target_type is dict (to satisfy ruff) then if value is a dict return it,
otherwise parse with parsed = json.loads(value) and return parsed only if
isinstance(parsed, dict) else return None; this ensures json scalars/arrays
don't slip through and preserves the original behavior when value is already a
dict.
eat: add TTS fallback circuit breaker
(fallback_provider="cartesia"), add check_and_reset_tts_fallback()
poller, update initialize_fallback_tasks() to register both STT and
TTS reset tasks independently
and get_tts_service_with_fallback() — proactive routing when circuit is
open, init-time fallback with record_failure() on primary error
get_tts_service_with_fallback() in create_services(), return
TTSServiceResult in third position
from TTSServiceResult, add _tts_failure_recorded / _mid_call_tts_alert_sent
state, detect TTS processor errors in on_pipeline_error, add
_send_mid_call_tts_alert() Slack alert helper
Summary by CodeRabbit
New Features
Improvements