feat(whisper): expose language + prompt config (multilingual opt-in) by YonganZhang · Pull Request #8063 · AstrBotDevs/AstrBot

YonganZhang · 2026-05-07T15:44:38Z

Problem

ProviderOpenAIWhisperAPI does not forward language / prompt parameters to client.audio.transcriptions.create(). Whisper auto-detect can mis-classify non-English speech (Chinese / Japanese / Korean / etc), hurting transcription accuracy. Users have no way to hint the language or provide a prompt through the existing provider config.

Solution

Expose two optional config fields, both defaulting to "" (preserves current auto-detect behavior — fully backwards compatible):

language: ISO language hint, e.g. "zh" / "ja" / "ko"
prompt: free-text guidance (OpenAI prompting guide)

Plus temperature=0 for deterministic transcription output.

Backwards compatibility

Default "" → NOT_GIVEN → Whisper auto-detect → identical behavior to current code. Existing users see no change. New parameters are pure opt-in.

Test

Tested locally on Chinese voice samples (PolyU lab). Setting language="zh" plus a short Chinese context prompt produced visibly better WER than the previous auto-detect path.

Diff size

8 effective lines: 5 init + 3 transcription call.

Summary by Sourcery

Expose configurable language and prompt options for the Whisper API provider to improve non-English transcription accuracy while preserving existing auto-detection behavior.

New Features:

Add optional language and prompt configuration fields to the Whisper API provider for guiding transcriptions.

Enhancements:

Forward configured language and prompt to the Whisper transcription call and set temperature to 0 for more deterministic output.

## Problem `ProviderOpenAIWhisperAPI` does not pass `language` / `prompt` to `client.audio.transcriptions.create()`. Whisper's auto-detect can mis-classify Chinese / Japanese / Korean / etc, hurting transcription accuracy. Users have no way to provide a language hint or prompt through the existing provider config. ## Solution Expose two optional config fields, both defaulting to `""` (preserves current auto-detect behavior — fully backwards compatible): - `language`: e.g. `"zh"` / `"ja"` / `"ko"` — Whisper language hint - `prompt`: free-text guidance, e.g. domain vocabulary or phrasing (see https://platform.openai.com/docs/guides/speech-to-text/prompting) Plus `temperature=0` for deterministic output. ## Backwards compat Default `""` → `NOT_GIVEN` → Whisper auto-detect → identical to current behavior. Existing users see no change. New params are pure opt-in. ## Test Local run on Chinese voice samples: WER measurably better with `language="zh"` + a short Chinese prompt vs auto-detect.

sourcery-ai

Hey - I've left some high level feedback:

Hard-coding temperature=0 in the transcription call removes flexibility; consider making this configurable via provider_config with a default of 0 to keep current behavior while allowing overrides.
Since language and prompt are now provider-level fields, it may be useful to allow per-call overrides in get_text (e.g., optional parameters) so callers can vary hints without redefining the provider.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Hard-coding `temperature=0` in the transcription call removes flexibility; consider making this configurable via `provider_config` with a default of 0 to keep current behavior while allowing overrides.
- Since `language` and `prompt` are now provider-level fields, it may be useful to allow per-call overrides in `get_text` (e.g., optional parameters) so callers can vary hints without redefining the provider.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces support for optional language and prompt parameters in the Whisper API source to improve transcription accuracy. Feedback includes adding unit tests for the new functionality, sanitizing configuration inputs with strip(), and making the hardcoded temperature parameter configurable. Additionally, a potential resource leak was identified where the audio file handle is not explicitly closed, which could cause issues on certain operating systems.

gemini-code-assist · 2026-05-07T15:48:49Z

+        # Optional language hint + prompt to guide Whisper transcription.
+        # Default empty = let Whisper auto-detect (preserves existing behavior).
+        # Users can configure these for higher accuracy on non-English speech.
+        self.language = provider_config.get("language", "")
+        self.prompt = provider_config.get("prompt", "")


According to the general rules, new functionality should be accompanied by corresponding unit tests. Please add tests to verify that the language and prompt parameters are correctly extracted from the configuration and passed to the transcription API.

References

New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

gemini-code-assist · 2026-05-07T15:48:49Z

+        self.language = provider_config.get("language", "")
+        self.prompt = provider_config.get("prompt", "")


It is recommended to strip() the language and prompt values to prevent issues caused by accidental leading or trailing whitespace in the configuration. Additionally, using or "" ensures that the code handles cases where the configuration value might be explicitly set to null.

Suggested change

self.language = provider_config.get("language", "")

self.prompt = provider_config.get("prompt", "")

self.language = str(provider_config.get("language") or "").strip()

self.prompt = str(provider_config.get("prompt") or "").strip()

gemini-code-assist · 2026-05-07T15:48:49Z

+            language=self.language or NOT_GIVEN,
+            prompt=self.prompt or NOT_GIVEN,
+            temperature=0,


While adding these parameters, note that the open(audio_url, "rb") call on line 126 (context) creates a file handle that is not explicitly closed. This can lead to resource leaks and may cause the os.remove(audio_url) call on line 135 to fail on Windows. Consider refactoring this block to use a context manager for the file handle.

gemini-code-assist · 2026-05-07T15:48:49Z

            file=("audio.wav", open(audio_url, "rb")),
+            language=self.language or NOT_GIVEN,
+            prompt=self.prompt or NOT_GIVEN,
+            temperature=0,


The temperature=0 parameter is currently hardcoded. While this is the default for the OpenAI Whisper API and ensures deterministic output, consider making it a configurable option in the provider settings to allow users to adjust it if needed (e.g., to reduce hallucinations in difficult audio).

auto-assign Bot requested review from Fridemn and Raven95676 May 7, 2026 15:44

dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 7, 2026

sourcery-ai Bot reviewed May 7, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(whisper): expose language + prompt config (multilingual opt-in)#8063

feat(whisper): expose language + prompt config (multilingual opt-in)#8063
YonganZhang wants to merge 1 commit intoAstrBotDevs:masterfrom
YonganZhang:feat/whisper-multilang

YonganZhang commented May 7, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		self.language = provider_config.get("language", "")
		self.prompt = provider_config.get("prompt", "")

Uh oh!

Conversation

YonganZhang commented May 7, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Backwards compatibility

Test

Diff size

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YonganZhang commented May 7, 2026 •

edited by sourcery-ai Bot

Loading