Skip to content

[Detail Bug] RAG retrieval is incorrectly reduced when users configure non-default chat models #64

Description

@detail-app

Detail Bug Report

https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_aae35640-d0c9-48d7-9122-80a681af34fe

Introduced in #2 by @WilliamAGH on Sep 6, 2025

Summary

  • Context: RAG retrieval uses ModelConfiguration.isTokenConstrained(modelHint) to decide whether to reduce context (3 docs, 600 tokens vs default retrieval).
  • Bug: ChatController passes hardcoded ModelConfiguration.DEFAULT_MODEL ("gpt-5.2") as the model hint, ignoring the user's configured model.
  • Actual vs. expected: Expected: Full retrieval (gpt-4o has 128K context, no tier constraints)
    Actual: Constrained retrieval (model hint is hardcoded "gpt-5.2", triggering reduced RAG)
  • Impact: Users who configure non-GPT-5 models receive reduced RAG retrieval when they should receive full retrieval. The PromptTruncator cannot compensate because it only removes documents, never adds them.

Code with Bug

// ChatController.java (lines 124-126)
// Pass model hint to optimize RAG for token-constrained models
ChatService.StructuredPromptOutcome promptOutcome =
        chatService.buildStructuredPromptWithContextOutcome(
                history, userQuery, ModelConfiguration.DEFAULT_MODEL);  // <-- BUG 🔴 Ignores configured model
// ModelConfiguration.java (line 14)
public static final String DEFAULT_MODEL = "gpt-5.2";

Explanation

ChatService.buildStructuredPromptWithContextOutcome(...) makes a retrieval decision based on isTokenConstrained(modelHint), but ChatController always supplies ModelConfiguration.DEFAULT_MODEL instead of the actual model selected via configuration (e.g., LLM_PRIMARY_PROVIDER + OPENAI_MODEL / GITHUB_MODELS_CHAT_MODEL).

As a result, even when a user configures a large-context model (e.g., OPENAI_MODEL=gpt-4o), the RAG layer behaves as if the request is for a token-constrained GPT-5 model and limits retrieval (e.g., fewer documents). This cannot be corrected downstream: PromptTruncator is truncate-only and cannot fetch additional documents that RAG failed to retrieve.

Codebase Inconsistency

The inline comment in ChatController says “Pass model hint to optimize RAG for token-constrained models”, but passing a hardcoded constant prevents optimization for the actual configured model.

Recommended Fix

Pass the configured model from OpenAiRequestFactory or inject it directly:

// Option A: Add getter to OpenAiRequestFactory
public String getConfiguredModelForPrimaryProvider() {
    boolean useGitHubModels = configuredPrimaryProvider() == GITHUB_MODELS;
    return normalizedModelId(useGitHubModels);
}

// ChatController.java
String modelHint = openAiRequestFactory.getConfiguredModelForPrimaryProvider();
chatService.buildStructuredPromptWithContextOutcome(history, userQuery, modelHint);

History

This bug was introduced in commit 9b91f47. The commit migrated ChatController to use OpenAI SDK streaming and added the model hint parameter with hardcoded "gpt-5" for RAG optimization. Commit 3770369 later refactored the hardcoded string to ModelConfiguration.DEFAULT_MODEL but preserved the incorrect behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions