diff --git a/api-reference/server/services/stt/moonshine.mdx b/api-reference/server/services/stt/moonshine.mdx new file mode 100644 index 00000000..879b752f --- /dev/null +++ b/api-reference/server/services/stt/moonshine.mdx @@ -0,0 +1,138 @@ +--- +title: "Moonshine" +description: "Speech-to-text service implementation using locally-downloaded Moonshine ONNX models" +--- + +## Overview + +`MoonshineSTTService` provides offline speech recognition using Moonshine's small, fast ASR models running locally on the CPU via ONNX Runtime. No GPU required, no API key needed - models download once on first use and are cached locally for privacy-focused transcription. + + + + Pipecat's API methods for Moonshine STT integration + + + Complete example with Moonshine STT + + + Moonshine ASR model details and research + + + Python package for Moonshine models + + + +## Installation + +```bash +uv add "pipecat-ai[moonshine]" +``` + +## Prerequisites + +### Local Model Setup + +Before using Moonshine STT service, you need: + +1. **Model Selection**: Choose appropriate Moonshine model size (tiny, base, small-streaming, medium-streaming) +2. **Storage Space**: Ensure sufficient disk space for model downloads (models are cached after first use) +3. **CPU Resources**: Moonshine runs efficiently on CPU via ONNX Runtime + +### Configuration Options + +- **Model Size**: Balance between accuracy and performance based on your needs +- **Language Support**: Moonshine supports English, Spanish, and other languages +- **No API Key**: Runs entirely locally for complete privacy + + + No API keys or GPU required - Moonshine runs efficiently on CPU for complete privacy. + + +## Configuration + + + Runtime-configurable settings for the STT service. See [MoonshineSTTService Settings](#moonshinesttsettings) below. + + +## MoonshineSTTSettings + +Runtime-configurable settings passed via the `settings` constructor argument using `MoonshineSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. + +| Parameter | Type | Default | Description | +| ---------- | ----------------- | ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `model` | `str \| Model` | `Model.SMALL_STREAMING` | Moonshine model architecture. Available models: `TINY`, `BASE`, `TINY_STREAMING`, `BASE_STREAMING`, `SMALL_STREAMING` (default), `MEDIUM_STREAMING`. | +| `language` | `Language \| str` | `Language.EN` | Language for transcription. Moonshine supports English, Spanish, and other languages. The base language code is used (e.g., "en" from "en-US"). _(Inherited from base STT settings.)_ | + +## Usage + +### Basic Setup + +```python +from pipecat.services.moonshine.stt import MoonshineSTTService + +stt = MoonshineSTTService() +``` + +### With Custom Model + +```python +from pipecat.services.moonshine.stt import MoonshineSTTService, Model + +stt = MoonshineSTTService( + settings=MoonshineSTTService.Settings( + model=Model.MEDIUM_STREAMING, + ), +) +``` + +### With Custom Language + +```python +from pipecat.services.moonshine.stt import MoonshineSTTService, Model +from pipecat.transcriptions.language import Language + +stt = MoonshineSTTService( + settings=MoonshineSTTService.Settings( + model=Model.SMALL_STREAMING, + language=Language.ES, + ), +) +``` + +### With Model as String + +```python +from pipecat.services.moonshine.stt import MoonshineSTTService + +stt = MoonshineSTTService( + settings=MoonshineSTTService.Settings( + model="base", + ), +) +``` + +## Notes + +- **First run downloads**: The selected model downloads from the Moonshine model hub on first use and is cached locally. Later runs load it from the cache. +- **Segmented transcription**: `MoonshineSTTService` extends `SegmentedSTTService`, meaning it processes complete audio segments after VAD detects the user has stopped speaking. +- **CPU-only**: Moonshine runs efficiently on CPU via ONNX Runtime, so no GPU is required. This makes it ideal for resource-constrained environments. +- **Audio format**: Expects 16-bit mono PCM audio at 16 kHz sample rate. +- **Model variants**: The streaming-capable models (`TINY_STREAMING`, `SMALL_STREAMING`, `MEDIUM_STREAMING`) can be run in batch mode just like the non-streaming variants. The larger streaming models (`SMALL_STREAMING`, `MEDIUM_STREAMING`) are only available in streaming form. +- **Language support**: Moonshine supports multiple languages (English, Spanish, and others). The service uses the base language code (e.g., "en" from "en-US"). +- **No external dependencies**: Unlike API-based STT services, Moonshine requires no API keys or network connectivity after the initial model download. diff --git a/api-reference/server/services/supported-services.mdx b/api-reference/server/services/supported-services.mdx index a5db9f10..3b7041aa 100644 --- a/api-reference/server/services/supported-services.mdx +++ b/api-reference/server/services/supported-services.mdx @@ -51,6 +51,7 @@ Speech-to-Text services receive and audio input and output transcriptions. | [Gradium](/api-reference/server/services/stt/gradium) | `uv add "pipecat-ai[gradium]"` | | [Groq (Whisper)](/api-reference/server/services/stt/groq) | `uv add "pipecat-ai[groq]"` | | [Mistral](/api-reference/server/services/stt/mistral) | `uv add "pipecat-ai[mistral]"` | +| [Moonshine](/api-reference/server/services/stt/moonshine) | `uv add "pipecat-ai[moonshine]"` | | [NVIDIA](/api-reference/server/services/stt/nvidia) | `uv add "pipecat-ai[nvidia]"` | | [OpenAI](/api-reference/server/services/stt/openai) | `uv add "pipecat-ai[openai]"` | | [Sarvam](/api-reference/server/services/stt/sarvam) | `uv add "pipecat-ai[sarvam]"` | diff --git a/docs.json b/docs.json index 79af2cf1..4530159d 100644 --- a/docs.json +++ b/docs.json @@ -348,6 +348,7 @@ "api-reference/server/services/stt/gradium", "api-reference/server/services/stt/groq", "api-reference/server/services/stt/mistral", + "api-reference/server/services/stt/moonshine", "api-reference/server/services/stt/nvidia", "api-reference/server/services/stt/openai", "api-reference/server/services/stt/sarvam",