Skip to content

feat(whisper): expose language + prompt config (multilingual opt-in)#8063

Open
YonganZhang wants to merge 1 commit intoAstrBotDevs:masterfrom
YonganZhang:feat/whisper-multilang
Open

feat(whisper): expose language + prompt config (multilingual opt-in)#8063
YonganZhang wants to merge 1 commit intoAstrBotDevs:masterfrom
YonganZhang:feat/whisper-multilang

Conversation

@YonganZhang
Copy link
Copy Markdown

@YonganZhang YonganZhang commented May 7, 2026

Problem

ProviderOpenAIWhisperAPI does not forward language / prompt parameters to client.audio.transcriptions.create(). Whisper auto-detect can mis-classify non-English speech (Chinese / Japanese / Korean / etc), hurting transcription accuracy. Users have no way to hint the language or provide a prompt through the existing provider config.

Solution

Expose two optional config fields, both defaulting to "" (preserves current auto-detect behavior — fully backwards compatible):

Plus temperature=0 for deterministic transcription output.

Backwards compatibility

Default ""NOT_GIVEN → Whisper auto-detect → identical behavior to current code. Existing users see no change. New parameters are pure opt-in.

Test

Tested locally on Chinese voice samples (PolyU lab). Setting language="zh" plus a short Chinese context prompt produced visibly better WER than the previous auto-detect path.

Diff size

8 effective lines: 5 init + 3 transcription call.

Summary by Sourcery

Expose configurable language and prompt options for the Whisper API provider to improve non-English transcription accuracy while preserving existing auto-detection behavior.

New Features:

  • Add optional language and prompt configuration fields to the Whisper API provider for guiding transcriptions.

Enhancements:

  • Forward configured language and prompt to the Whisper transcription call and set temperature to 0 for more deterministic output.

## Problem

`ProviderOpenAIWhisperAPI` does not pass `language` / `prompt` to
`client.audio.transcriptions.create()`. Whisper's auto-detect can
mis-classify Chinese / Japanese / Korean / etc, hurting transcription
accuracy. Users have no way to provide a language hint or prompt
through the existing provider config.

## Solution

Expose two optional config fields, both defaulting to `""` (preserves
current auto-detect behavior — fully backwards compatible):

- `language`: e.g. `"zh"` / `"ja"` / `"ko"` — Whisper language hint
- `prompt`: free-text guidance, e.g. domain vocabulary or phrasing
  (see https://platform.openai.com/docs/guides/speech-to-text/prompting)

Plus `temperature=0` for deterministic output.

## Backwards compat

Default `""` → `NOT_GIVEN` → Whisper auto-detect → identical to current behavior.
Existing users see no change. New params are pure opt-in.

## Test

Local run on Chinese voice samples: WER measurably better with
`language="zh"` + a short Chinese prompt vs auto-detect.
@auto-assign auto-assign Bot requested review from Fridemn and Raven95676 May 7, 2026 15:44
@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 7, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Hard-coding temperature=0 in the transcription call removes flexibility; consider making this configurable via provider_config with a default of 0 to keep current behavior while allowing overrides.
  • Since language and prompt are now provider-level fields, it may be useful to allow per-call overrides in get_text (e.g., optional parameters) so callers can vary hints without redefining the provider.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Hard-coding `temperature=0` in the transcription call removes flexibility; consider making this configurable via `provider_config` with a default of 0 to keep current behavior while allowing overrides.
- Since `language` and `prompt` are now provider-level fields, it may be useful to allow per-call overrides in `get_text` (e.g., optional parameters) so callers can vary hints without redefining the provider.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for optional language and prompt parameters in the Whisper API source to improve transcription accuracy. Feedback includes adding unit tests for the new functionality, sanitizing configuration inputs with strip(), and making the hardcoded temperature parameter configurable. Additionally, a potential resource leak was identified where the audio file handle is not explicitly closed, which could cause issues on certain operating systems.

Comment on lines +39 to +43
# Optional language hint + prompt to guide Whisper transcription.
# Default empty = let Whisper auto-detect (preserves existing behavior).
# Users can configure these for higher accuracy on non-English speech.
self.language = provider_config.get("language", "")
self.prompt = provider_config.get("prompt", "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the general rules, new functionality should be accompanied by corresponding unit tests. Please add tests to verify that the language and prompt parameters are correctly extracted from the configuration and passed to the transcription API.

References
  1. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

Comment on lines +42 to +43
self.language = provider_config.get("language", "")
self.prompt = provider_config.get("prompt", "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to strip() the language and prompt values to prevent issues caused by accidental leading or trailing whitespace in the configuration. Additionally, using or "" ensures that the code handles cases where the configuration value might be explicitly set to null.

Suggested change
self.language = provider_config.get("language", "")
self.prompt = provider_config.get("prompt", "")
self.language = str(provider_config.get("language") or "").strip()
self.prompt = str(provider_config.get("prompt") or "").strip()

Comment on lines +127 to +129
language=self.language or NOT_GIVEN,
prompt=self.prompt or NOT_GIVEN,
temperature=0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding these parameters, note that the open(audio_url, "rb") call on line 126 (context) creates a file handle that is not explicitly closed. This can lead to resource leaks and may cause the os.remove(audio_url) call on line 135 to fail on Windows. Consider refactoring this block to use a context manager for the file handle.

file=("audio.wav", open(audio_url, "rb")),
language=self.language or NOT_GIVEN,
prompt=self.prompt or NOT_GIVEN,
temperature=0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The temperature=0 parameter is currently hardcoded. While this is the default for the OpenAI Whisper API and ensures deterministic output, consider making it a configurable option in the provider settings to allow users to adjust it if needed (e.g., to reduce hallucinations in difficult audio).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant