Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions api-reference/server/services/stt/whisper.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ uv add "pipecat-ai[whisper]"
uv add "pipecat-ai[mlx-whisper]"
```

<Note>
MLX Whisper requires macOS on Apple Silicon (arm64). It will not work on other
platforms, including Intel Macs.
</Note>

## Prerequisites

### Local Model Setup
Expand Down Expand Up @@ -246,5 +251,7 @@ stt = WhisperSTTServiceMLX(
- **First run downloads**: If the selected model hasn't been downloaded previously, the first run will download it from the Hugging Face model hub. This may take significant time depending on model size.
- **Segmented transcription**: Both `WhisperSTTService` and `WhisperSTTServiceMLX` extend `SegmentedSTTService`, meaning they process complete audio segments after VAD detects the user has stopped speaking.
- **No-speech filtering**: The `no_speech_prob` threshold helps filter out hallucinations. Increase it to be more permissive, decrease it to filter more aggressively.
- **MLX platform requirement**: `WhisperSTTServiceMLX` requires macOS on Apple Silicon (arm64). On other platforms (including Intel Macs), use `WhisperSTTService` instead.
- **MLX quantization**: The `LARGE_V3_TURBO_Q4` model provides reduced memory usage with minimal quality loss on Apple Silicon.
- **Model enums**: `Model` and `MLXModel` are `StrEnum` types, meaning enum members can be compared directly to strings (e.g., `Model.TINY == "tiny"`). Both enum members and plain strings work when setting the model.
- **Language support**: Whisper supports 99+ languages. Use the `Language` enum for type-safe language selection. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.
Loading