Skip to content

feat: Context window management & token budget trimming in useChat #280

@d-oit

Description

@d-oit

Problem

useChat.ts sends the full unbounded message history to the LLM on every turn. For free-tier models (e.g. google/gemini-2.0-flash-lite-preview-02-05:free), this silently hits context limits and can inflate token costs unexpectedly. There is no mechanism to cap or trim the conversation history before it is sent to the provider.

Impact

  • Silent failures when context window is exceeded (API returns truncated/error responses)
  • Unbounded token usage growth per session
  • No budget control for free-tier or rate-limited models

Proposed Implementation

1. Token estimator utility

// src/lib/llm/token-utils.ts
export function estimateTokens(text: string): number {
  // ~4 chars per token approximation (GPT-style)
  return Math.ceil(text.length / 4);
}

export function estimateMessagesTokens(messages: Message[]): number {
  return messages.reduce((sum, m) => sum + estimateTokens(m.content) + 4, 0);
}

2. Sliding-window history trimmer

// src/lib/llm/context-manager.ts
export interface ContextManagerOptions {
  maxHistoryMessages?: number;   // default: 20
  maxContextTokens?: number;     // default: 8000
  keepSystemMessage?: boolean;   // default: true
  summarizeOnEviction?: boolean; // default: false
}

export function trimMessageHistory(
  messages: Message[],
  opts: ContextManagerOptions = {}
): Message[] {
  const { maxHistoryMessages = 20, maxContextTokens = 8000, keepSystemMessage = true } = opts;
  
  let trimmed = [...messages];
  const systemMsg = keepSystemMessage ? trimmed.find(m => m.role === 'system') : null;
  const nonSystem = trimmed.filter(m => m.role !== 'system');
  
  // Apply message count limit
  const windowed = nonSystem.slice(-maxHistoryMessages);
  
  // Apply token budget
  let tokenCount = systemMsg ? estimateTokens(systemMsg.content) : 0;
  const budgeted: Message[] = [];
  for (let i = windowed.length - 1; i >= 0; i--) {
    const t = estimateTokens(windowed[i].content);
    if (tokenCount + t > maxContextTokens) break;
    tokenCount += t;
    budgeted.unshift(windowed[i]);
  }
  
  return systemMsg ? [systemMsg, ...budgeted] : budgeted;
}

3. Integration in useChat.ts

// Before sending to provider:
const trimmedMessages = trimMessageHistory(messages, {
  maxHistoryMessages: chatOptions.maxHistoryMessages ?? 20,
  maxContextTokens: modelConfig.contextWindow ?? 8000,
});
await provider.chat(trimmedMessages, ...);

4. UI: Token usage indicator in AIHarness.tsx

Add a small token counter badge showing estimated tokens in current context vs. model limit.

Acceptance Criteria

  • trimMessageHistory() utility implemented and unit tested
  • useChat.ts uses trimmer before every provider call
  • maxHistoryMessages and maxContextTokens configurable via chat options
  • Token usage displayed in AIHarness UI
  • No regression in existing chat tests

Labels

enhancement ai performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions