Emit OA-namespace cache attributes on the LLM span by chris-colinsky · Pull Request #140 · LunarCommand/openarmature-python

chris-colinsky · 2026-06-08T01:08:48Z

Summary

Wires proposal 0047's §5.5.3.1 cache-attribute emission into the existing OTel observer's sentinel NodeEvent handler. LlmEventPayload grows cached_tokens and cache_creation_tokens (optional, default None); the OpenAI provider's _make_llm_event populates both from Response.usage (which PR Add cache-stat fields to Response.usage #136 added the fields to); the OTel observer's _handle_llm_event completed phase emits openarmature.llm.cache_read.input_tokens and openarmature.llm.cache_creation.input_tokens conditionally.
The conditional emission preserves the spec's absent-vs-reported-zero distinction — None means "provider did not report" (attribute omitted), 0 means "provider reported zero hits" (attribute emitted with value 0).
Coverage at three levels: 4 new OTel observer unit tests, 2 new provider-side unit tests for the _make_llm_event population boundary, and 3 conformance fixtures (040, 041, 042) activated via a new _run_llm_cache_fixture handler using a set-membership dispatcher pattern matching _run_llm_payload_fixture's precedent.
Fifth PR of the v0.13.0 LLM hardening cycle (proposal 0047). Second of 0047's PRs after PR Add cache-stat fields to Response.usage #136 (Response.usage cache fields).

Test plan

uv run pytest tests/ — 1191 pass (was 1182 pre-PR; +7: 4 OTel + 2 provider + 3 fixture cases)
uv run pytest tests/conformance/test_observability.py -k cache-attribute — all 3 fixtures pass
uv run pytest tests/unit/test_observability_otel.py -k cache — all 4 unit tests pass
uv run pytest tests/unit/test_llm_provider.py -k "cached_tokens or cache_creation" — provider-side tests pass
uv run pyright src tests — clean
uv run ruff check src tests — clean
Eyeball the emission block in src/openarmature/observability/otel/observer.py (around line 1202)

Notes

The conformance.toml 0047 status stays at not-yet. With this PR and PR Add cache-stat fields to Response.usage #136 in, only wire-byte canonicalization (next) and prompt-management substring stability still need to land before the status flip.
The OpenAI-compat mapping leaves cache_creation_tokens absent per spec §8.1.2 — the field exists for symmetry with future providers (Anthropic's cache_creation_input_tokens, etc.) that may source it. Conformance fixtures pin this expectation via llm_span_attributes_absent.
This PR keeps the OTel observer reading cache stats off the sentinel LlmEventPayload rather than the typed LlmCompletionEvent.usage, per the sequencing decision in discuss-llm-completion-event-request-side-fields (the observer's typed-event migration is queued behind proposal 0057's request-side fields extension).

Wire the §5.5.3.1 cache-attribute emission from proposal 0047 into the existing OTel observer's sentinel NodeEvent handler: - Extend LlmEventPayload with cached_tokens and cache_creation_tokens, defaulting to None to preserve the absent-vs-reported-zero distinction the spec mandates. - Have the OpenAI provider's _make_llm_event populate both fields from Response.usage (the fields PR 136 of 0047 added). The cache_creation_tokens field stays None for the OpenAI-compat mapping per spec §8.1.2 but the wiring is symmetric for future providers that source it. - In the OTel observer's _handle_llm_event completed phase, emit openarmature.llm.cache_read.input_tokens when cached_tokens is not None and openarmature.llm.cache_creation.input_tokens when cache_creation_tokens is not None. The conditional emission honors the §5.5.3 convention: absent (None) means the provider did not report; 0 means the provider reported zero hits. Cover the new behavior at three levels: - Four OTel observer unit tests drive the cache-attribute emission through synthetic sentinel started/completed NodeEvent pairs: cache hit, reported zero, absent, both fields populated. - Two provider-side unit tests verify _make_llm_event populates the payload's cache fields from Response.usage at the provider-payload boundary, so a regression here surfaces independently of the observer rendering layer. - Three conformance fixtures (040, 041, 042) activated via a new _run_llm_cache_fixture handler. The handler builds a single-LLM- call graph, captures the response, and asserts on response_usage + llm_span_attributes + llm_span_attributes_absent expectations. The dispatcher uses a set-membership check matching the precedent set by _run_llm_payload_fixture. The conformance.toml 0047 status stays at not-yet — wire-byte canonicalization and prompt-management substring stability are still ahead in the v0.13.0 cycle.

Copilot

Pull request overview

This PR implements proposal 0047 §5.5.3.1 by propagating provider cache-usage stats through the LLM sentinel event payload and conditionally emitting OA-namespace cache attributes on the completed LLM span.

Changes:

Extend LlmEventPayload with optional cached_tokens / cache_creation_tokens fields and populate them in the OpenAI provider’s _make_llm_event from Response.usage.
Emit openarmature.llm.cache_read.input_tokens and openarmature.llm.cache_creation.input_tokens on the completed LLM span only when the corresponding payload fields are non-None (preserving absent-vs-reported-zero).
Add unit tests and conformance fixture runner coverage for cache attribute emission, omission, and reported-zero behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/unit/test_observability_otel.py	Adds unit tests asserting conditional cache attribute emission on LLM spans.
tests/unit/test_llm_provider.py	Adds unit tests verifying provider→payload propagation of cache stats.
tests/conformance/test_observability.py	Activates fixtures 040–042 and adds a cache-fixture runner asserting `Response.usage` + span attributes.
src/openarmature/observability/otel/observer.py	Emits OA cache attributes on completed LLM spans when payload fields are present.
src/openarmature/observability/llm_event.py	Extends the sentinel LLM event payload with cache-stat fields.
src/openarmature/llm/providers/openai.py	Populates cache-stat fields into `LlmEventPayload` via `_make_llm_event`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address PR review feedback: _run_llm_cache_fixture_case constructed an OpenAIProvider but never awaited aclose(), leaking the underlying httpx.AsyncClient connection pool. Fixture 005 and 038 runners elsewhere in this file close the provider in a finally block; the new cache-fixture runner now follows the same convention. Extending the fix to the typed-event runners introduced in PR #139 (_build_simple_llm_graph, _run_typed_event_fanout_case, _run_typed_event_branches_case) — same bug class, same file, CoPilot missed them in the original review. _build_simple_llm_graph now returns (graph, state_cls, provider) so the caller can close the provider in a finally; the fan-out and parallel-branches runners close their inline-constructed providers symmetrically. No behavior change for fixture pass/fail outcomes; the leak was warning-level rather than failure-inducing. Full suite stays at 1191 pass.

* Correct v0.13.0 release narrative per spec review Three blocking + three should-fix items spec flagged on the pre-tag review. All narrative; no code behavior change. - 0047 CHANGELOG entry mis-attributed pieces 1+2 (Response.usage cache fields + OTel cache attributes) to v0.12.0. Verified via git: those landed in PRs #136 + #140 post-v0.12.0-tag, so all three pieces of 0047 ship in v0.13.0. Reframed. - conformance.toml [proposals."0047"] leading-comment block had the same v0.12.0 mis-attribution. Same correction; added PR references for traceability. - Unreleased section had two ### Added headings with the 0057 entry orphaned below ### Changed. Consolidated. - Spec pin advance text undercounted the cycle journey (said v0.51.0 → v0.53.0; actual is v0.46.0 → v0.53.0 across three hops). Reframed and listed absorbed proposals inline. - tool_call.arguments JSON encoding now uses sort_keys=True (functionally equivalent but byte-different for downstream snapshot consumers). Surfaced as its own ### Changed entry instead of buried in the 0047 ### Added. - conformance.toml [proposals."0049"] leading-comment block grew the fixture-deferral surface (057-068 + 069-073 parser- deferred pending harness directive schema catch-up; behavior pinned by unit tests) per spec OQ2. * Migrate LLM-event docs to typed-event-first Three docs still pushed the legacy sentinel-namespace pattern as the primary path for custom observers consuming LLM events and custom providers emitting them. After v0.13.0 the bundled provider emits typed LlmCompletionEvent / LlmFailedEvent variants directly; the bundled OTel + Langfuse observers consume via isinstance discrimination. Rewrites: - docs/concepts/observability.md: "Publishing LLM events for custom observers" → "Consuming LLM events in custom observers". Typed-event consumption shown as primary (isinstance branch on LlmCompletionEvent + LlmFailedEvent with the mutual-exclusion + field-set notes). Sentinel pattern demoted to a "Legacy sentinel-namespace pattern (compatibility surface)" subsection for downstream code interoperating with custom providers that haven't migrated. - docs/model-providers/authoring.md: custom-provider emission sketch rewritten — dispatch LlmCompletionEvent on success, LlmFailedEvent alongside the §7 exception on failure. Shows the current-attempt-index / current-fan-out-index / etc. scoping fields the typed events carry. Calls out the mutual-exclusion + exception-flow-preservation contracts. Legacy sentinel pattern retained as a compatibility-surface callout for older providers. - docs/agent/non-obvious-shapes.md: "filter openarmature.*- namespaced events" tip drops the openarmature.llm.complete example (v0.13.0 retired the sentinel pattern for LLM events); checkpoint sentinels stay since the tip is still applicable for those. Adjusted the follow-on paragraph mentioning LLM events. mkdocs strict build clean. * Regenerate AGENTS.md for typed-event doc migration The non-obvious-shapes doc migration changed a generator source without regenerating the committed AGENTS.md. Bring it back in sync so the drift guard passes. * Extend production-observability example for v0.13.0 Add an LlmFailureTracker observer that consumes the typed LlmFailedEvent and rolls up per-invocation error-category counts, and extend LlmUsageAccumulator to track cached_tokens and report a cache-hit ratio. The persist node now reports both rollups and the OTel formatter surfaces the cache-read attribute. Also drop spec/proposal references and em dashes from the example's comments and walk-through, which carry no meaning for end users reading the code. * Drop spec and proposal references from examples Example comments and docstrings quoted proposal numbers and spec section refs that have no meaning to end users reading the code. Reword them to describe only the implementation behavior. * Add tests for production-observability observers The examples smoke test only proves the demo loads and its build_graph() compiles. Cover the two queryable observers the production-observability example ships: cache-token accumulation and the derived cache-hit ratio, failure-category counting, mutual exclusion between the success and failure events, the per-invocation bucket cleanup, and the OTel cache-read attribute. The persist-output check drives the real persist node offline. * Guard legacy LLM observer snippet with NodeEvent check The legacy sentinel-namespace observer example accessed event.namespace / event.pre_state without narrowing to NodeEvent. A real observer receives the full ObserverEvent union, where variants like InvocationCompletedEvent have no namespace, so the snippet would raise AttributeError. Add an isinstance(event, NodeEvent) guard so the copy-paste example is correct.

Copilot AI review requested due to automatic review settings June 8, 2026 01:08

Copilot started reviewing on behalf of chris-colinsky June 8, 2026 01:08 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Comment thread tests/conformance/test_observability.py

chris-colinsky merged commit 644af66 into main Jun 8, 2026
6 checks passed

chris-colinsky deleted the feature/0047-otel-cache-attribute-emission branch June 8, 2026 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit OA-namespace cache attributes on the LLM span#140

Emit OA-namespace cache attributes on the LLM span#140
chris-colinsky merged 2 commits into
mainfrom
feature/0047-otel-cache-attribute-emission

chris-colinsky commented Jun 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chris-colinsky commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chris-colinsky commented Jun 8, 2026 •

edited

Loading