[Detail Bug] RAG retrieval is incorrectly reduced when users configure non-default chat models

# Detail Bug Report

https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_aae35640-d0c9-48d7-9122-80a681af34fe

Introduced in [#2](https://github.com/WilliamAGH/java-chat/pull/2) by @WilliamAGH on Sep 6, 2025

# Summary
- **Context**: RAG retrieval uses `ModelConfiguration.isTokenConstrained(modelHint)` to decide whether to reduce context (3 docs, 600 tokens vs default retrieval).
- **Bug**: `ChatController` passes hardcoded `ModelConfiguration.DEFAULT_MODEL` ("gpt-5.2") as the model hint, ignoring the user's configured model.
- **Actual vs. expected**: Expected: Full retrieval (gpt-4o has 128K context, no tier constraints)
Actual: Constrained retrieval (model hint is hardcoded "gpt-5.2", triggering reduced RAG)
- **Impact**: Users who configure non-GPT-5 models receive reduced RAG retrieval when they should receive full retrieval. The PromptTruncator cannot compensate because it only removes documents, never adds them.

# Code with Bug
```java
// ChatController.java (lines 124-126)
// Pass model hint to optimize RAG for token-constrained models
ChatService.StructuredPromptOutcome promptOutcome =
        chatService.buildStructuredPromptWithContextOutcome(
                history, userQuery, ModelConfiguration.DEFAULT_MODEL);  // <-- BUG 🔴 Ignores configured model
```

```java
// ModelConfiguration.java (line 14)
public static final String DEFAULT_MODEL = "gpt-5.2";
```

# Explanation
`ChatService.buildStructuredPromptWithContextOutcome(...)` makes a retrieval decision based on `isTokenConstrained(modelHint)`, but `ChatController` always supplies `ModelConfiguration.DEFAULT_MODEL` instead of the actual model selected via configuration (e.g., `LLM_PRIMARY_PROVIDER` + `OPENAI_MODEL` / `GITHUB_MODELS_CHAT_MODEL`).

As a result, even when a user configures a large-context model (e.g., `OPENAI_MODEL=gpt-4o`), the RAG layer behaves as if the request is for a token-constrained GPT-5 model and limits retrieval (e.g., fewer documents). This cannot be corrected downstream: `PromptTruncator` is truncate-only and cannot fetch additional documents that RAG failed to retrieve.

## Codebase Inconsistency
The inline comment in `ChatController` says “Pass model hint to optimize RAG for token-constrained models”, but passing a hardcoded constant prevents optimization for the actual configured model.

# Recommended Fix
Pass the configured model from `OpenAiRequestFactory` or inject it directly:

```java
// Option A: Add getter to OpenAiRequestFactory
public String getConfiguredModelForPrimaryProvider() {
    boolean useGitHubModels = configuredPrimaryProvider() == GITHUB_MODELS;
    return normalizedModelId(useGitHubModels);
}

// ChatController.java
String modelHint = openAiRequestFactory.getConfiguredModelForPrimaryProvider();
chatService.buildStructuredPromptWithContextOutcome(history, userQuery, modelHint);
```

# History
This bug was introduced in commit 9b91f47. The commit migrated ChatController to use OpenAI SDK streaming and added the model hint parameter with hardcoded `"gpt-5"` for RAG optimization. Commit 3770369 later refactored the hardcoded string to `ModelConfiguration.DEFAULT_MODEL` but preserved the incorrect behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Detail Bug] RAG retrieval is incorrectly reduced when users configure non-default chat models #64

Detail Bug Report

Summary

Code with Bug

Explanation

Codebase Inconsistency

Recommended Fix

History

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Detail Bug] RAG retrieval is incorrectly reduced when users configure non-default chat models #64

Description

Detail Bug Report

Summary

Code with Bug

Explanation

Codebase Inconsistency

Recommended Fix

History

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions