From 5c7d66db076e24bcc44e9e47f2a52de9c902a0e7 Mon Sep 17 00:00:00 2001 From: chris-colinsky Date: Tue, 9 Jun 2026 16:54:50 -0700 Subject: [PATCH 1/2] Implement wire-byte stability (proposal 0047) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add intra-impl wire-byte stability to the OpenAI provider so equivalent OA inputs produce byte-identical wire output regardless of dict insertion order. A new ``_canonicalize_dict_keys`` helper recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering (the spec's split: object keys are sorted, array order is caller-controlled). The helper applies at four user-supplied-dict boundaries: tool definitions (the ``function`` record top-level plus the parameters JSON Schema), ``response_format.json_schema.schema``, RuntimeConfig extras, and the JSON encoding of ``tool_call.arguments``. A top- level belt-and-suspenders pass over the assembled body catches anything the per-field passes miss. Closes proposal 0047 end-to-end: pieces 1 and 2 (Response.usage cache fields sourced from prompt_tokens_details + OTel observer emits the cache attributes) landed in v0.12.0; this is piece 3. Prompt-management §13 cross-variable substring stability is satisfied by the existing Jinja2 strict-undefined render path on both TextPrompt and ChatPrompt; pinned by new tests. A new ``docs/concepts/prompts.md`` section explains APC, what OA handles for users (wire-byte canonicalization, deterministic rendering), what users own (the spec's five informative authoring patterns), and a vLLM debugging callout for the cache-attribute- not-appearing case (server-side ``--enable-prefix-caching`` plus ``--enable-prompt-tokens-details``). Scope is the Chat Completions endpoint only. The OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (no python consumer today). Behavior change worth flagging: ``tool_call.arguments`` JSON encoding now uses ``sort_keys=True``. Functionally equivalent (parses to the same dict) but byte-different from the previous insertion-order encoding. --- CHANGELOG.md | 1 + conformance.toml | 31 +- docs/concepts/prompts.md | 67 ++++ src/openarmature/llm/providers/openai.py | 77 +++- tests/unit/test_llm_provider.py | 426 ++++++++++++++++++++++- tests/unit/test_prompts.py | 95 +++++ 6 files changed, 681 insertions(+), 16 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0aac2b4..43bebee 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The ### Added +- **Implicit prefix-cache wire-byte stability** (proposal 0047, spec v0.39.0). The OpenAI Chat Completions wire body is now byte-stable across equivalent OA inputs — equivalent calls produce byte-identical request bodies regardless of dict insertion order at every user-supplied-dict boundary (tool definitions including the top-level `function` record + the `parameters` JSON Schema, `response_format.json_schema.schema`, `RuntimeConfig` extras, `tool_call.arguments` JSON encoding). A new `_canonicalize_json_schema` helper recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering (the spec's split between "object keys MUST be sorted" and "array order MUST be preserved per caller-supplied order"). A top-level belt-and-suspenders canonicalization pass over the assembled body catches anything the per-field passes miss. Combined with the existing `Response.usage.cached_tokens` / `cache_creation_tokens` fields sourced from `prompt_tokens_details` (v0.12.0) and the OTel observer's `openarmature.llm.cache_read.input_tokens` + `openarmature.llm.cache_creation.input_tokens` attributes (also v0.12.0), this closes proposal 0047 end-to-end. Prompt-management §13 *Cross-variable substring stability* is satisfied by the existing Jinja2 `StrictUndefined` render path; pinned by a new test. Scope is the Chat Completions endpoint only — the OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (the providers aren't implemented in python today). - **`LlmFailedEvent` typed event variant** (proposal 0058, spec v0.53.0). Carves LLM provider failures into a spec-normatively-typed event variant alongside `LlmCompletionEvent`. 17 mirrored identity / scoping / request-side fields + 3 failure-specific fields (`error_category` always-present from the llm-provider §7 normative category enumeration; optional `error_type` for vendor-specific detail or upstream exception class name; always-present `error_message`). `OpenAIProvider.complete()` emits the typed event alongside the §7 exception on both raise paths — adapter-caught provider exceptions AND pre-send validation raises. Caller-side exception flow unchanged; the exception still raises out of `complete()`. Mutually exclusive with `LlmCompletionEvent` on the same call. Both bundled observers (OTel + Langfuse) consume `LlmFailedEvent` directly: same `openarmature.llm.complete` span / Generation shape as the success path with ERROR status / level + `openarmature.error.category` attribute (OTel) / `error_category` as statusMessage (Langfuse), `start_time` back-dated by `latency_ms` so the failure duration reflects the time-to-raise. ### Changed diff --git a/conformance.toml b/conformance.toml index da4501d..c6b1914 100644 --- a/conformance.toml +++ b/conformance.toml @@ -266,11 +266,34 @@ status = "implemented" since = "0.11.0" # Spec v0.39.0 (proposal 0047). Implicit prefix-cache wire-byte -# stability. Cross-provider invariant requiring intra-impl byte -# equality across calls with equivalent inputs. Queued for v0.13.0 -# alongside 0049 (LLM provider hardening + typed event batch). +# stability. Cross-capability proposal landed in v0.13.0 across +# three pieces: (1) ``Response.usage`` cache-stat fields +# (``cached_tokens`` / ``cache_creation_tokens``) sourced from the +# OpenAI ``prompt_tokens_details`` payload, with conditional emission +# preserved (absent-vs-zero distinction stays observable) — landed +# in the v0.12.0 cycle as the proposal's payload-side prerequisite; +# (2) OTel observer emits ``openarmature.llm.cache_read.input_tokens`` +# (and optional ``openarmature.llm.cache_creation.input_tokens``) +# when the corresponding usage field is populated — also v0.12.0; +# (3) §8.1 intra-impl wire-byte canonicalization in the OpenAI +# adapter — landed here. The canonicalizer recursively sorts dict +# keys at every nesting level while preserving caller-supplied +# array order, applied at the four user-input boundaries +# (``tool.parameters`` / ``tool.function`` record top-level per +# spec Q5, ``response_format.json_schema.schema``, ``RuntimeConfig`` +# extras, ``tool_call.arguments`` JSON encoding) plus a top-level +# belt-and-suspenders pass over the assembled request body. Scope +# is the Chat Completions endpoint only; the OpenAI Responses API +# endpoint is deferred to a future cycle (no python consumer +# today). Prompt-management §13 cross-variable substring stability +# is satisfied by the existing Jinja2 ``StrictUndefined`` render +# path; pinned by ``tests/unit/test_prompts.py:: +# test_cross_variable_substring_stability``. Anthropic / Gemini +# wire-byte conformance fixtures stay deferred — neither provider +# is implemented in python today. [proposals."0047"] -status = "not-yet" +status = "implemented" +since = "0.13.0" # Spec v0.40.0 (proposal 0048). Read-symmetric invocation metadata. # Adds ``get_invocation_metadata()`` symmetric to the existing diff --git a/docs/concepts/prompts.md b/docs/concepts/prompts.md index bb99691..08f87b0 100644 --- a/docs/concepts/prompts.md +++ b/docs/concepts/prompts.md @@ -365,6 +365,73 @@ The filesystem backend layout is `/