diff --git a/api-reference/server/services/s2s/openai.mdx b/api-reference/server/services/s2s/openai.mdx index 4c074cbd..32c4d638 100644 --- a/api-reference/server/services/s2s/openai.mdx +++ b/api-reference/server/services/s2s/openai.mdx @@ -126,6 +126,15 @@ _Deprecated in v0.0.105. Use `settings=OpenAIRealtimeLLMService.Settings(session `"high"` provides more detail. + + In manual turn-detection mode (`turn_detection=False`, locally-driven turns), how much recent + audio to replay after an interruption clears the input audio buffer, so the speech onset isn't lost. + Defaults to `None`: auto-sized to the upstream VAD's `start_secs` plus a small margin, falling + back to `0.5` seconds when no VAD is present. Auto-sizing assumes VAD drives turn starts (the + default `VADUserTurnStartStrategy`); set this explicitly if you use a non-VAD turn-start strategy. + No effect when server-side turn detection is enabled. + + Additional arguments passed to parent LLMService. @@ -350,6 +359,7 @@ await task.queue_frame( - **Model is connection-level**: The `model` parameter is set via the WebSocket URL at connection time and cannot be changed during a session. - **Output modalities are single-mode**: The API supports either `["text"]` or `["audio"]` output, not both simultaneously. - **Turn detection options**: Use `TurnDetection` for traditional VAD, `SemanticTurnDetection` for AI-based turn detection, or `False` to disable server-side detection and manage turns manually. +- **Manual turn detection pre-roll**: When server-side turn detection is disabled (`turn_detection=False`), the service maintains a rolling audio buffer that is replayed after interruptions to preserve speech onsets. Configure the buffer duration with `user_audio_preroll_secs` or let it auto-size from the upstream VAD's `start_secs`. - **Audio output format**: The service outputs 24kHz PCM audio by default. - **Video support**: Video frames can be sent to the model for multimodal input. Control the detail level with `video_frame_detail` and pause/resume with `set_video_input_paused()`. - **Transcription frames**: User speech transcription frames are always emitted upstream when input audio transcription is configured.