Problem
useChat.ts sends the full unbounded message history to the LLM on every turn. For free-tier models (e.g. google/gemini-2.0-flash-lite-preview-02-05:free), this silently hits context limits and can inflate token costs unexpectedly. There is no mechanism to cap or trim the conversation history before it is sent to the provider.
Impact
- Silent failures when context window is exceeded (API returns truncated/error responses)
- Unbounded token usage growth per session
- No budget control for free-tier or rate-limited models
Proposed Implementation
1. Token estimator utility
// src/lib/llm/token-utils.ts
export function estimateTokens(text: string): number {
// ~4 chars per token approximation (GPT-style)
return Math.ceil(text.length / 4);
}
export function estimateMessagesTokens(messages: Message[]): number {
return messages.reduce((sum, m) => sum + estimateTokens(m.content) + 4, 0);
}
2. Sliding-window history trimmer
// src/lib/llm/context-manager.ts
export interface ContextManagerOptions {
maxHistoryMessages?: number; // default: 20
maxContextTokens?: number; // default: 8000
keepSystemMessage?: boolean; // default: true
summarizeOnEviction?: boolean; // default: false
}
export function trimMessageHistory(
messages: Message[],
opts: ContextManagerOptions = {}
): Message[] {
const { maxHistoryMessages = 20, maxContextTokens = 8000, keepSystemMessage = true } = opts;
let trimmed = [...messages];
const systemMsg = keepSystemMessage ? trimmed.find(m => m.role === 'system') : null;
const nonSystem = trimmed.filter(m => m.role !== 'system');
// Apply message count limit
const windowed = nonSystem.slice(-maxHistoryMessages);
// Apply token budget
let tokenCount = systemMsg ? estimateTokens(systemMsg.content) : 0;
const budgeted: Message[] = [];
for (let i = windowed.length - 1; i >= 0; i--) {
const t = estimateTokens(windowed[i].content);
if (tokenCount + t > maxContextTokens) break;
tokenCount += t;
budgeted.unshift(windowed[i]);
}
return systemMsg ? [systemMsg, ...budgeted] : budgeted;
}
3. Integration in useChat.ts
// Before sending to provider:
const trimmedMessages = trimMessageHistory(messages, {
maxHistoryMessages: chatOptions.maxHistoryMessages ?? 20,
maxContextTokens: modelConfig.contextWindow ?? 8000,
});
await provider.chat(trimmedMessages, ...);
4. UI: Token usage indicator in AIHarness.tsx
Add a small token counter badge showing estimated tokens in current context vs. model limit.
Acceptance Criteria
Labels
enhancement ai performance
Problem
useChat.tssends the full unbounded message history to the LLM on every turn. For free-tier models (e.g.google/gemini-2.0-flash-lite-preview-02-05:free), this silently hits context limits and can inflate token costs unexpectedly. There is no mechanism to cap or trim the conversation history before it is sent to the provider.Impact
Proposed Implementation
1. Token estimator utility
2. Sliding-window history trimmer
3. Integration in
useChat.ts4. UI: Token usage indicator in
AIHarness.tsxAdd a small token counter badge showing estimated tokens in current context vs. model limit.
Acceptance Criteria
trimMessageHistory()utility implemented and unit testeduseChat.tsuses trimmer before every provider callmaxHistoryMessagesandmaxContextTokensconfigurable via chat optionsLabels
enhancementaiperformance