Detail Bug Report
https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_aae35640-d0c9-48d7-9122-80a681af34fe
Introduced in #2 by @WilliamAGH on Sep 6, 2025
Summary
- Context: RAG retrieval uses
ModelConfiguration.isTokenConstrained(modelHint) to decide whether to reduce context (3 docs, 600 tokens vs default retrieval).
- Bug:
ChatController passes hardcoded ModelConfiguration.DEFAULT_MODEL ("gpt-5.2") as the model hint, ignoring the user's configured model.
- Actual vs. expected: Expected: Full retrieval (gpt-4o has 128K context, no tier constraints)
Actual: Constrained retrieval (model hint is hardcoded "gpt-5.2", triggering reduced RAG)
- Impact: Users who configure non-GPT-5 models receive reduced RAG retrieval when they should receive full retrieval. The PromptTruncator cannot compensate because it only removes documents, never adds them.
Code with Bug
// ChatController.java (lines 124-126)
// Pass model hint to optimize RAG for token-constrained models
ChatService.StructuredPromptOutcome promptOutcome =
chatService.buildStructuredPromptWithContextOutcome(
history, userQuery, ModelConfiguration.DEFAULT_MODEL); // <-- BUG 🔴 Ignores configured model
// ModelConfiguration.java (line 14)
public static final String DEFAULT_MODEL = "gpt-5.2";
Explanation
ChatService.buildStructuredPromptWithContextOutcome(...) makes a retrieval decision based on isTokenConstrained(modelHint), but ChatController always supplies ModelConfiguration.DEFAULT_MODEL instead of the actual model selected via configuration (e.g., LLM_PRIMARY_PROVIDER + OPENAI_MODEL / GITHUB_MODELS_CHAT_MODEL).
As a result, even when a user configures a large-context model (e.g., OPENAI_MODEL=gpt-4o), the RAG layer behaves as if the request is for a token-constrained GPT-5 model and limits retrieval (e.g., fewer documents). This cannot be corrected downstream: PromptTruncator is truncate-only and cannot fetch additional documents that RAG failed to retrieve.
Codebase Inconsistency
The inline comment in ChatController says “Pass model hint to optimize RAG for token-constrained models”, but passing a hardcoded constant prevents optimization for the actual configured model.
Recommended Fix
Pass the configured model from OpenAiRequestFactory or inject it directly:
// Option A: Add getter to OpenAiRequestFactory
public String getConfiguredModelForPrimaryProvider() {
boolean useGitHubModels = configuredPrimaryProvider() == GITHUB_MODELS;
return normalizedModelId(useGitHubModels);
}
// ChatController.java
String modelHint = openAiRequestFactory.getConfiguredModelForPrimaryProvider();
chatService.buildStructuredPromptWithContextOutcome(history, userQuery, modelHint);
History
This bug was introduced in commit 9b91f47. The commit migrated ChatController to use OpenAI SDK streaming and added the model hint parameter with hardcoded "gpt-5" for RAG optimization. Commit 3770369 later refactored the hardcoded string to ModelConfiguration.DEFAULT_MODEL but preserved the incorrect behavior.
Detail Bug Report
https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_aae35640-d0c9-48d7-9122-80a681af34fe
Introduced in #2 by @WilliamAGH on Sep 6, 2025
Summary
ModelConfiguration.isTokenConstrained(modelHint)to decide whether to reduce context (3 docs, 600 tokens vs default retrieval).ChatControllerpasses hardcodedModelConfiguration.DEFAULT_MODEL("gpt-5.2") as the model hint, ignoring the user's configured model.Actual: Constrained retrieval (model hint is hardcoded "gpt-5.2", triggering reduced RAG)
Code with Bug
Explanation
ChatService.buildStructuredPromptWithContextOutcome(...)makes a retrieval decision based onisTokenConstrained(modelHint), butChatControlleralways suppliesModelConfiguration.DEFAULT_MODELinstead of the actual model selected via configuration (e.g.,LLM_PRIMARY_PROVIDER+OPENAI_MODEL/GITHUB_MODELS_CHAT_MODEL).As a result, even when a user configures a large-context model (e.g.,
OPENAI_MODEL=gpt-4o), the RAG layer behaves as if the request is for a token-constrained GPT-5 model and limits retrieval (e.g., fewer documents). This cannot be corrected downstream:PromptTruncatoris truncate-only and cannot fetch additional documents that RAG failed to retrieve.Codebase Inconsistency
The inline comment in
ChatControllersays “Pass model hint to optimize RAG for token-constrained models”, but passing a hardcoded constant prevents optimization for the actual configured model.Recommended Fix
Pass the configured model from
OpenAiRequestFactoryor inject it directly:History
This bug was introduced in commit 9b91f47. The commit migrated ChatController to use OpenAI SDK streaming and added the model hint parameter with hardcoded
"gpt-5"for RAG optimization. Commit 3770369 later refactored the hardcoded string toModelConfiguration.DEFAULT_MODELbut preserved the incorrect behavior.