|
| 1 | +# Model Compatibility Guide |
| 2 | + |
| 3 | +This document describes model-specific handling in the OpenAI-compatible provider. When adding new models or providers, review this guide to ensure proper compatibility. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | + |
| 7 | +- [Overview](#overview) |
| 8 | +- [Model-Specific Handling](#model-specific-handling) |
| 9 | + - [Kimi Models (is_error Exclusion)](#kimi-models-is_error-exclusion) |
| 10 | + - [Reasoning Models (Tuning Parameter Stripping)](#reasoning-models-tuning-parameter-stripping) |
| 11 | + - [GPT-5 (max_completion_tokens)](#gpt-5-max_completion_tokens) |
| 12 | + - [Qwen Models (DashScope Routing)](#qwen-models-dashscope-routing) |
| 13 | +- [Implementation Details](#implementation-details) |
| 14 | +- [Adding New Models](#adding-new-models) |
| 15 | +- [Testing](#testing) |
| 16 | + |
| 17 | +## Overview |
| 18 | + |
| 19 | +The `openai_compat.rs` provider translates Claude Code's internal message format to OpenAI-compatible chat completion requests. Different models have varying requirements for: |
| 20 | + |
| 21 | +- Tool result message fields (`is_error`) |
| 22 | +- Sampling parameters (temperature, top_p, etc.) |
| 23 | +- Token limit fields (`max_tokens` vs `max_completion_tokens`) |
| 24 | +- Base URL routing |
| 25 | + |
| 26 | +## Model-Specific Handling |
| 27 | + |
| 28 | +### Kimi Models (is_error Exclusion) |
| 29 | + |
| 30 | +**Affected models:** `kimi-k2.5`, `kimi-k1.5`, `kimi-moonshot`, and any model with `kimi` in the name (case-insensitive) |
| 31 | + |
| 32 | +**Behavior:** The `is_error` field is **excluded** from tool result messages. |
| 33 | + |
| 34 | +**Rationale:** Kimi models (via Moonshot AI and DashScope) reject the `is_error` field with a 400 Bad Request error: |
| 35 | +```json |
| 36 | +{ |
| 37 | + "error": { |
| 38 | + "type": "invalid_request_error", |
| 39 | + "message": "Unknown field: is_error" |
| 40 | + } |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +**Detection:** |
| 45 | +```rust |
| 46 | +fn model_rejects_is_error_field(model: &str) -> bool { |
| 47 | + let lowered = model.to_ascii_lowercase(); |
| 48 | + let canonical = lowered.rsplit('/').next().unwrap_or(lowered.as_str()); |
| 49 | + canonical.starts_with("kimi-") |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +**Testing:** See `model_rejects_is_error_field_detects_kimi_models` and related tests in `openai_compat.rs`. |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +### Reasoning Models (Tuning Parameter Stripping) |
| 58 | + |
| 59 | +**Affected models:** |
| 60 | +- OpenAI: `o1`, `o1-*`, `o3`, `o3-*`, `o4`, `o4-*` |
| 61 | +- xAI: `grok-3-mini` |
| 62 | +- Alibaba DashScope: `qwen-qwq-*`, `qwq-*`, `qwen3-*-thinking` |
| 63 | + |
| 64 | +**Behavior:** The following tuning parameters are **stripped** from requests: |
| 65 | +- `temperature` |
| 66 | +- `top_p` |
| 67 | +- `frequency_penalty` |
| 68 | +- `presence_penalty` |
| 69 | + |
| 70 | +**Rationale:** Reasoning/chain-of-thought models use fixed sampling strategies and reject these parameters with 400 errors. |
| 71 | + |
| 72 | +**Exception:** `reasoning_effort` is included for compatible models when explicitly set. |
| 73 | + |
| 74 | +**Detection:** |
| 75 | +```rust |
| 76 | +fn is_reasoning_model(model: &str) -> bool { |
| 77 | + let canonical = model.to_ascii_lowercase() |
| 78 | + .rsplit('/') |
| 79 | + .next() |
| 80 | + .unwrap_or(model); |
| 81 | + canonical.starts_with("o1") |
| 82 | + || canonical.starts_with("o3") |
| 83 | + || canonical.starts_with("o4") |
| 84 | + || canonical == "grok-3-mini" |
| 85 | + || canonical.starts_with("qwen-qwq") |
| 86 | + || canonical.starts_with("qwq") |
| 87 | + || (canonical.starts_with("qwen3") && canonical.contains("-thinking")) |
| 88 | +} |
| 89 | +``` |
| 90 | + |
| 91 | +**Testing:** See `reasoning_model_strips_tuning_params`, `grok_3_mini_is_reasoning_model`, and `qwen_reasoning_variants_are_detected` tests. |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +### GPT-5 (max_completion_tokens) |
| 96 | + |
| 97 | +**Affected models:** All models starting with `gpt-5` |
| 98 | + |
| 99 | +**Behavior:** Uses `max_completion_tokens` instead of `max_tokens` in the request payload. |
| 100 | + |
| 101 | +**Rationale:** GPT-5 models require the `max_completion_tokens` field. Legacy `max_tokens` causes request validation failures: |
| 102 | +```json |
| 103 | +{ |
| 104 | + "error": { |
| 105 | + "message": "Unknown field: max_tokens" |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +**Implementation:** |
| 111 | +```rust |
| 112 | +let max_tokens_key = if wire_model.starts_with("gpt-5") { |
| 113 | + "max_completion_tokens" |
| 114 | +} else { |
| 115 | + "max_tokens" |
| 116 | +}; |
| 117 | +``` |
| 118 | + |
| 119 | +**Testing:** See `gpt5_uses_max_completion_tokens_not_max_tokens` and `non_gpt5_uses_max_tokens` tests. |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +### Qwen Models (DashScope Routing) |
| 124 | + |
| 125 | +**Affected models:** All models with `qwen` prefix |
| 126 | + |
| 127 | +**Behavior:** Routed to DashScope (`https://dashscope.aliyuncs.com/compatible-mode/v1`) rather than default providers. |
| 128 | + |
| 129 | +**Rationale:** Qwen models are hosted by Alibaba Cloud's DashScope service, not OpenAI or Anthropic. |
| 130 | + |
| 131 | +**Configuration:** |
| 132 | +```rust |
| 133 | +pub const DEFAULT_DASHSCOPE_BASE_URL: &str = "https://dashscope.aliyuncs.com/compatible-mode/v1"; |
| 134 | +``` |
| 135 | + |
| 136 | +**Authentication:** Uses `DASHSCOPE_API_KEY` environment variable. |
| 137 | + |
| 138 | +**Note:** Some Qwen models are also reasoning models (see [Reasoning Models](#reasoning-models-tuning-parameter-stripping) above) and receive both treatments. |
| 139 | + |
| 140 | +## Implementation Details |
| 141 | + |
| 142 | +### File Location |
| 143 | +All model-specific logic is in: |
| 144 | +``` |
| 145 | +rust/crates/api/src/providers/openai_compat.rs |
| 146 | +``` |
| 147 | + |
| 148 | +### Key Functions |
| 149 | + |
| 150 | +| Function | Purpose | |
| 151 | +|----------|---------| |
| 152 | +| `model_rejects_is_error_field()` | Detects models that don't support `is_error` in tool results | |
| 153 | +| `is_reasoning_model()` | Detects reasoning models that need tuning param stripping | |
| 154 | +| `translate_message()` | Converts internal messages to OpenAI format (applies `is_error` logic) | |
| 155 | +| `build_chat_completion_request()` | Constructs full request payload (applies all model-specific logic) | |
| 156 | + |
| 157 | +### Provider Prefix Handling |
| 158 | + |
| 159 | +All model detection functions strip provider prefixes (e.g., `dashscope/kimi-k2.5` → `kimi-k2.5`) before matching: |
| 160 | + |
| 161 | +```rust |
| 162 | +let canonical = model.to_ascii_lowercase() |
| 163 | + .rsplit('/') |
| 164 | + .next() |
| 165 | + .unwrap_or(model); |
| 166 | +``` |
| 167 | + |
| 168 | +This ensures consistent detection regardless of whether models are referenced with or without provider prefixes. |
| 169 | + |
| 170 | +## Adding New Models |
| 171 | + |
| 172 | +When adding support for new models: |
| 173 | + |
| 174 | +1. **Check if the model is a reasoning model** |
| 175 | + - Does it reject temperature/top_p parameters? |
| 176 | + - Add to `is_reasoning_model()` detection |
| 177 | + |
| 178 | +2. **Check tool result compatibility** |
| 179 | + - Does it reject the `is_error` field? |
| 180 | + - Add to `model_rejects_is_error_field()` detection |
| 181 | + |
| 182 | +3. **Check token limit field** |
| 183 | + - Does it require `max_completion_tokens` instead of `max_tokens`? |
| 184 | + - Update the `max_tokens_key` logic |
| 185 | + |
| 186 | +4. **Add tests** |
| 187 | + - Unit test for detection function |
| 188 | + - Integration test in `build_chat_completion_request` |
| 189 | + |
| 190 | +5. **Update this documentation** |
| 191 | + - Add the model to the affected lists |
| 192 | + - Document any special behavior |
| 193 | + |
| 194 | +## Testing |
| 195 | + |
| 196 | +### Running Model-Specific Tests |
| 197 | + |
| 198 | +```bash |
| 199 | +# All OpenAI compatibility tests |
| 200 | +cargo test --package api providers::openai_compat |
| 201 | + |
| 202 | +# Specific test categories |
| 203 | +cargo test --package api model_rejects_is_error_field |
| 204 | +cargo test --package api reasoning_model |
| 205 | +cargo test --package api gpt5 |
| 206 | +cargo test --package api qwen |
| 207 | +``` |
| 208 | + |
| 209 | +### Test Files |
| 210 | + |
| 211 | +- Unit tests: `rust/crates/api/src/providers/openai_compat.rs` (in `mod tests`) |
| 212 | +- Integration tests: `rust/crates/api/tests/openai_compat_integration.rs` |
| 213 | + |
| 214 | +### Verifying Model Detection |
| 215 | + |
| 216 | +To verify a model is detected correctly without making API calls: |
| 217 | + |
| 218 | +```rust |
| 219 | +#[test] |
| 220 | +fn my_new_model_is_detected() { |
| 221 | + // is_error handling |
| 222 | + assert!(model_rejects_is_error_field("my-model")); |
| 223 | + |
| 224 | + // Reasoning model detection |
| 225 | + assert!(is_reasoning_model("my-model")); |
| 226 | + |
| 227 | + // Provider prefix handling |
| 228 | + assert!(model_rejects_is_error_field("provider/my-model")); |
| 229 | +} |
| 230 | +``` |
| 231 | + |
| 232 | +--- |
| 233 | + |
| 234 | +*Last updated: 2026-04-16* |
| 235 | + |
| 236 | +For questions or updates, see the implementation in `rust/crates/api/src/providers/openai_compat.rs`. |
0 commit comments