fix(tool): allow custom default vision model in OpenAIMultiModalTool#1701
Conversation
7cf1af9 to
9a89bc4
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
d843e45 to
ac88325
Compare
AgentScopeJavaBot
left a comment
There was a problem hiding this comment.
🤖 AI Review
This PR cleanly addresses a real usability issue (#1694) where the hardcoded "gpt-4o" fallback in openaiImageToText causes opaque 503 errors on non-OpenAI-compatible backends. The fix introduces a defaultModelName field with a new 3-arg constructor while keeping existing 1-arg and 2-arg constructors fully backward compatible. Validation is consistent across both public and protected constructors, and the @ToolParam description is appropriately de-coupled from OpenAI-specific model names. Tests cover the custom default model path and constructor validation for null/blank inputs. The change is minimal, well-scoped, and correctly solves the reported issue.
Note: The same hardcoded-default pattern exists in sibling methods (openaiTextToImage → "dall-e-3", openaiTextToAudio → "tts-1", openaiAudioToText → "whisper-1"). These are out of scope for this PR but would benefit from a similar refactoring in a follow-up.
AgentScopeJavaBot
left a comment
There was a problem hiding this comment.
🤖 AI Review
This PR cleanly addresses a real usability issue (#1694) where the hardcoded "gpt-4o" fallback in openaiImageToText causes opaque 503 errors on non-OpenAI-compatible backends. The fix introduces a defaultModelName field with a new 3-arg constructor while keeping existing 1-arg and 2-arg constructors fully backward compatible. Validation is consistent across both public and protected constructors, and the @ToolParam description is appropriately de-coupled from OpenAI-specific model names. Tests cover the custom default model path and constructor validation for null/blank inputs. The change is minimal, well-scoped, and correctly solves the reported issue.
Note: The same hardcoded-default pattern exists in sibling methods (openaiTextToImage → "dall-e-3", openaiTextToAudio → "tts-1", openaiAudioToText → "whisper-1"). These are out of scope for this PR but would benefit from a similar refactoring in a follow-up.
AgentScope-Java Version
2.0.0-RC2
Description
When
openaiImageToTextis called without amodelparameter, the fallbackwas hardcoded to
"gpt-4o". Non-OpenAI-compatible backends (e.g. MiniMaxproxies) don't recognize that model name and return HTTP 503, which looks like
a transient outage and is hard to diagnose.
The
@ToolParamdescription also mentionedgpt-4oandgpt-4-vision-previewexplicitly, steering the LLM toward OpenAI-specific names even when a different
backend was configured.
Fixes #1694
Changes:
defaultModelNamefield and a new 3-argument constructorOpenAIMultiModalTool(String apiKey, String baseUrl, String defaultModelName)"gpt-4o"as the default (fully backward compatible)
"gpt-4o"fallback inopenaiImageToTextwiththis.defaultModelName@ToolParamdescriptionand Javadoc
Checklist
mvn spotless:applymvn test)