Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions api-reference/server/services/s2s/openai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,15 @@ _Deprecated in v0.0.105. Use `settings=OpenAIRealtimeLLMService.Settings(session
`"high"` provides more detail.
</ParamField>

<ParamField path="user_audio_preroll_secs" type="float | None" default="None">
In manual turn-detection mode (`turn_detection=False`, locally-driven turns), how much recent
audio to replay after an interruption clears the input audio buffer, so the speech onset isn't lost.
Defaults to `None`: auto-sized to the upstream VAD's `start_secs` plus a small margin, falling
back to `0.5` seconds when no VAD is present. Auto-sizing assumes VAD drives turn starts (the
default `VADUserTurnStartStrategy`); set this explicitly if you use a non-VAD turn-start strategy.
No effect when server-side turn detection is enabled.
</ParamField>

<ParamField path="**kwargs" type="Any">
Additional arguments passed to parent LLMService.
</ParamField>
Expand Down Expand Up @@ -350,6 +359,7 @@ await task.queue_frame(
- **Model is connection-level**: The `model` parameter is set via the WebSocket URL at connection time and cannot be changed during a session.
- **Output modalities are single-mode**: The API supports either `["text"]` or `["audio"]` output, not both simultaneously.
- **Turn detection options**: Use `TurnDetection` for traditional VAD, `SemanticTurnDetection` for AI-based turn detection, or `False` to disable server-side detection and manage turns manually.
- **Manual turn detection pre-roll**: When server-side turn detection is disabled (`turn_detection=False`), the service maintains a rolling audio buffer that is replayed after interruptions to preserve speech onsets. Configure the buffer duration with `user_audio_preroll_secs` or let it auto-size from the upstream VAD's `start_secs`.
- **Audio output format**: The service outputs 24kHz PCM audio by default.
- **Video support**: Video frames can be sent to the model for multimodal input. Control the detail level with `video_frame_detail` and pause/resume with `set_video_input_paused()`.
- **Transcription frames**: User speech transcription frames are always emitted upstream when input audio transcription is configured.
Expand Down
Loading