Emit OA-namespace cache attributes on the LLM span#140
Merged
Conversation
Wire the §5.5.3.1 cache-attribute emission from proposal 0047 into the existing OTel observer's sentinel NodeEvent handler: - Extend LlmEventPayload with cached_tokens and cache_creation_tokens, defaulting to None to preserve the absent-vs-reported-zero distinction the spec mandates. - Have the OpenAI provider's _make_llm_event populate both fields from Response.usage (the fields PR 136 of 0047 added). The cache_creation_tokens field stays None for the OpenAI-compat mapping per spec §8.1.2 but the wiring is symmetric for future providers that source it. - In the OTel observer's _handle_llm_event completed phase, emit openarmature.llm.cache_read.input_tokens when cached_tokens is not None and openarmature.llm.cache_creation.input_tokens when cache_creation_tokens is not None. The conditional emission honors the §5.5.3 convention: absent (None) means the provider did not report; 0 means the provider reported zero hits. Cover the new behavior at three levels: - Four OTel observer unit tests drive the cache-attribute emission through synthetic sentinel started/completed NodeEvent pairs: cache hit, reported zero, absent, both fields populated. - Two provider-side unit tests verify _make_llm_event populates the payload's cache fields from Response.usage at the provider-payload boundary, so a regression here surfaces independently of the observer rendering layer. - Three conformance fixtures (040, 041, 042) activated via a new _run_llm_cache_fixture handler. The handler builds a single-LLM- call graph, captures the response, and asserts on response_usage + llm_span_attributes + llm_span_attributes_absent expectations. The dispatcher uses a set-membership check matching the precedent set by _run_llm_payload_fixture. The conformance.toml 0047 status stays at not-yet — wire-byte canonicalization and prompt-management substring stability are still ahead in the v0.13.0 cycle.
There was a problem hiding this comment.
Pull request overview
This PR implements proposal 0047 §5.5.3.1 by propagating provider cache-usage stats through the LLM sentinel event payload and conditionally emitting OA-namespace cache attributes on the completed LLM span.
Changes:
- Extend
LlmEventPayloadwith optionalcached_tokens/cache_creation_tokensfields and populate them in the OpenAI provider’s_make_llm_eventfromResponse.usage. - Emit
openarmature.llm.cache_read.input_tokensandopenarmature.llm.cache_creation.input_tokenson the completed LLM span only when the corresponding payload fields are non-None(preserving absent-vs-reported-zero). - Add unit tests and conformance fixture runner coverage for cache attribute emission, omission, and reported-zero behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_observability_otel.py | Adds unit tests asserting conditional cache attribute emission on LLM spans. |
| tests/unit/test_llm_provider.py | Adds unit tests verifying provider→payload propagation of cache stats. |
| tests/conformance/test_observability.py | Activates fixtures 040–042 and adds a cache-fixture runner asserting Response.usage + span attributes. |
| src/openarmature/observability/otel/observer.py | Emits OA cache attributes on completed LLM spans when payload fields are present. |
| src/openarmature/observability/llm_event.py | Extends the sentinel LLM event payload with cache-stat fields. |
| src/openarmature/llm/providers/openai.py | Populates cache-stat fields into LlmEventPayload via _make_llm_event. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address PR review feedback: _run_llm_cache_fixture_case constructed an OpenAIProvider but never awaited aclose(), leaking the underlying httpx.AsyncClient connection pool. Fixture 005 and 038 runners elsewhere in this file close the provider in a finally block; the new cache-fixture runner now follows the same convention. Extending the fix to the typed-event runners introduced in PR #139 (_build_simple_llm_graph, _run_typed_event_fanout_case, _run_typed_event_branches_case) — same bug class, same file, CoPilot missed them in the original review. _build_simple_llm_graph now returns (graph, state_cls, provider) so the caller can close the provider in a finally; the fan-out and parallel-branches runners close their inline-constructed providers symmetrically. No behavior change for fixture pass/fail outcomes; the leak was warning-level rather than failure-inducing. Full suite stays at 1191 pass.
chris-colinsky
added a commit
that referenced
this pull request
Jun 10, 2026
* Correct v0.13.0 release narrative per spec review Three blocking + three should-fix items spec flagged on the pre-tag review. All narrative; no code behavior change. - 0047 CHANGELOG entry mis-attributed pieces 1+2 (Response.usage cache fields + OTel cache attributes) to v0.12.0. Verified via git: those landed in PRs #136 + #140 post-v0.12.0-tag, so all three pieces of 0047 ship in v0.13.0. Reframed. - conformance.toml [proposals."0047"] leading-comment block had the same v0.12.0 mis-attribution. Same correction; added PR references for traceability. - Unreleased section had two ### Added headings with the 0057 entry orphaned below ### Changed. Consolidated. - Spec pin advance text undercounted the cycle journey (said v0.51.0 → v0.53.0; actual is v0.46.0 → v0.53.0 across three hops). Reframed and listed absorbed proposals inline. - tool_call.arguments JSON encoding now uses sort_keys=True (functionally equivalent but byte-different for downstream snapshot consumers). Surfaced as its own ### Changed entry instead of buried in the 0047 ### Added. - conformance.toml [proposals."0049"] leading-comment block grew the fixture-deferral surface (057-068 + 069-073 parser- deferred pending harness directive schema catch-up; behavior pinned by unit tests) per spec OQ2. * Migrate LLM-event docs to typed-event-first Three docs still pushed the legacy sentinel-namespace pattern as the primary path for custom observers consuming LLM events and custom providers emitting them. After v0.13.0 the bundled provider emits typed LlmCompletionEvent / LlmFailedEvent variants directly; the bundled OTel + Langfuse observers consume via isinstance discrimination. Rewrites: - docs/concepts/observability.md: "Publishing LLM events for custom observers" → "Consuming LLM events in custom observers". Typed-event consumption shown as primary (isinstance branch on LlmCompletionEvent + LlmFailedEvent with the mutual-exclusion + field-set notes). Sentinel pattern demoted to a "Legacy sentinel-namespace pattern (compatibility surface)" subsection for downstream code interoperating with custom providers that haven't migrated. - docs/model-providers/authoring.md: custom-provider emission sketch rewritten — dispatch LlmCompletionEvent on success, LlmFailedEvent alongside the §7 exception on failure. Shows the current-attempt-index / current-fan-out-index / etc. scoping fields the typed events carry. Calls out the mutual-exclusion + exception-flow-preservation contracts. Legacy sentinel pattern retained as a compatibility-surface callout for older providers. - docs/agent/non-obvious-shapes.md: "filter openarmature.*- namespaced events" tip drops the openarmature.llm.complete example (v0.13.0 retired the sentinel pattern for LLM events); checkpoint sentinels stay since the tip is still applicable for those. Adjusted the follow-on paragraph mentioning LLM events. mkdocs strict build clean. * Regenerate AGENTS.md for typed-event doc migration The non-obvious-shapes doc migration changed a generator source without regenerating the committed AGENTS.md. Bring it back in sync so the drift guard passes. * Extend production-observability example for v0.13.0 Add an LlmFailureTracker observer that consumes the typed LlmFailedEvent and rolls up per-invocation error-category counts, and extend LlmUsageAccumulator to track cached_tokens and report a cache-hit ratio. The persist node now reports both rollups and the OTel formatter surfaces the cache-read attribute. Also drop spec/proposal references and em dashes from the example's comments and walk-through, which carry no meaning for end users reading the code. * Drop spec and proposal references from examples Example comments and docstrings quoted proposal numbers and spec section refs that have no meaning to end users reading the code. Reword them to describe only the implementation behavior. * Add tests for production-observability observers The examples smoke test only proves the demo loads and its build_graph() compiles. Cover the two queryable observers the production-observability example ships: cache-token accumulation and the derived cache-hit ratio, failure-category counting, mutual exclusion between the success and failure events, the per-invocation bucket cleanup, and the OTel cache-read attribute. The persist-output check drives the real persist node offline. * Guard legacy LLM observer snippet with NodeEvent check The legacy sentinel-namespace observer example accessed event.namespace / event.pre_state without narrowing to NodeEvent. A real observer receives the full ObserverEvent union, where variants like InvocationCompletedEvent have no namespace, so the snippet would raise AttributeError. Add an isinstance(event, NodeEvent) guard so the copy-paste example is correct.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
NodeEventhandler.LlmEventPayloadgrowscached_tokensandcache_creation_tokens(optional, defaultNone); the OpenAI provider's_make_llm_eventpopulates both fromResponse.usage(which PR Add cache-stat fields to Response.usage #136 added the fields to); the OTel observer's_handle_llm_eventcompleted phase emitsopenarmature.llm.cache_read.input_tokensandopenarmature.llm.cache_creation.input_tokensconditionally.Nonemeans "provider did not report" (attribute omitted),0means "provider reported zero hits" (attribute emitted with value 0)._make_llm_eventpopulation boundary, and 3 conformance fixtures (040, 041, 042) activated via a new_run_llm_cache_fixturehandler using a set-membership dispatcher pattern matching_run_llm_payload_fixture's precedent.Response.usagecache fields).Test plan
uv run pytest tests/— 1191 pass (was 1182 pre-PR; +7: 4 OTel + 2 provider + 3 fixture cases)uv run pytest tests/conformance/test_observability.py -k cache-attribute— all 3 fixtures passuv run pytest tests/unit/test_observability_otel.py -k cache— all 4 unit tests passuv run pytest tests/unit/test_llm_provider.py -k "cached_tokens or cache_creation"— provider-side tests passuv run pyright src tests— cleanuv run ruff check src tests— cleansrc/openarmature/observability/otel/observer.py(around line 1202)Notes
conformance.toml0047 status stays atnot-yet. With this PR and PR Add cache-stat fields to Response.usage #136 in, only wire-byte canonicalization (next) and prompt-management substring stability still need to land before the status flip.cache_creation_tokensabsent per spec §8.1.2 — the field exists for symmetry with future providers (Anthropic'scache_creation_input_tokens, etc.) that may source it. Conformance fixtures pin this expectation viallm_span_attributes_absent.LlmEventPayloadrather than the typedLlmCompletionEvent.usage, per the sequencing decision indiscuss-llm-completion-event-request-side-fields(the observer's typed-event migration is queued behind proposal 0057's request-side fields extension).