Skip to content

Emit OA-namespace cache attributes on the LLM span#140

Merged
chris-colinsky merged 2 commits into
mainfrom
feature/0047-otel-cache-attribute-emission
Jun 8, 2026
Merged

Emit OA-namespace cache attributes on the LLM span#140
chris-colinsky merged 2 commits into
mainfrom
feature/0047-otel-cache-attribute-emission

Conversation

@chris-colinsky

@chris-colinsky chris-colinsky commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

  • Wires proposal 0047's §5.5.3.1 cache-attribute emission into the existing OTel observer's sentinel NodeEvent handler. LlmEventPayload grows cached_tokens and cache_creation_tokens (optional, default None); the OpenAI provider's _make_llm_event populates both from Response.usage (which PR Add cache-stat fields to Response.usage #136 added the fields to); the OTel observer's _handle_llm_event completed phase emits openarmature.llm.cache_read.input_tokens and openarmature.llm.cache_creation.input_tokens conditionally.
  • The conditional emission preserves the spec's absent-vs-reported-zero distinction — None means "provider did not report" (attribute omitted), 0 means "provider reported zero hits" (attribute emitted with value 0).
  • Coverage at three levels: 4 new OTel observer unit tests, 2 new provider-side unit tests for the _make_llm_event population boundary, and 3 conformance fixtures (040, 041, 042) activated via a new _run_llm_cache_fixture handler using a set-membership dispatcher pattern matching _run_llm_payload_fixture's precedent.
  • Fifth PR of the v0.13.0 LLM hardening cycle (proposal 0047). Second of 0047's PRs after PR Add cache-stat fields to Response.usage #136 (Response.usage cache fields).

Test plan

  • uv run pytest tests/ — 1191 pass (was 1182 pre-PR; +7: 4 OTel + 2 provider + 3 fixture cases)
  • uv run pytest tests/conformance/test_observability.py -k cache-attribute — all 3 fixtures pass
  • uv run pytest tests/unit/test_observability_otel.py -k cache — all 4 unit tests pass
  • uv run pytest tests/unit/test_llm_provider.py -k "cached_tokens or cache_creation" — provider-side tests pass
  • uv run pyright src tests — clean
  • uv run ruff check src tests — clean
  • Eyeball the emission block in src/openarmature/observability/otel/observer.py (around line 1202)

Notes

  • The conformance.toml 0047 status stays at not-yet. With this PR and PR Add cache-stat fields to Response.usage #136 in, only wire-byte canonicalization (next) and prompt-management substring stability still need to land before the status flip.
  • The OpenAI-compat mapping leaves cache_creation_tokens absent per spec §8.1.2 — the field exists for symmetry with future providers (Anthropic's cache_creation_input_tokens, etc.) that may source it. Conformance fixtures pin this expectation via llm_span_attributes_absent.
  • This PR keeps the OTel observer reading cache stats off the sentinel LlmEventPayload rather than the typed LlmCompletionEvent.usage, per the sequencing decision in discuss-llm-completion-event-request-side-fields (the observer's typed-event migration is queued behind proposal 0057's request-side fields extension).

Wire the §5.5.3.1 cache-attribute emission from proposal 0047 into
the existing OTel observer's sentinel NodeEvent handler:

- Extend LlmEventPayload with cached_tokens and cache_creation_tokens,
  defaulting to None to preserve the absent-vs-reported-zero
  distinction the spec mandates.
- Have the OpenAI provider's _make_llm_event populate both fields
  from Response.usage (the fields PR 136 of 0047 added). The
  cache_creation_tokens field stays None for the OpenAI-compat
  mapping per spec §8.1.2 but the wiring is symmetric for future
  providers that source it.
- In the OTel observer's _handle_llm_event completed phase, emit
  openarmature.llm.cache_read.input_tokens when cached_tokens is
  not None and openarmature.llm.cache_creation.input_tokens when
  cache_creation_tokens is not None. The conditional emission
  honors the §5.5.3 convention: absent (None) means the provider
  did not report; 0 means the provider reported zero hits.

Cover the new behavior at three levels:

- Four OTel observer unit tests drive the cache-attribute emission
  through synthetic sentinel started/completed NodeEvent pairs:
  cache hit, reported zero, absent, both fields populated.
- Two provider-side unit tests verify _make_llm_event populates
  the payload's cache fields from Response.usage at the
  provider-payload boundary, so a regression here surfaces
  independently of the observer rendering layer.
- Three conformance fixtures (040, 041, 042) activated via a new
  _run_llm_cache_fixture handler. The handler builds a single-LLM-
  call graph, captures the response, and asserts on response_usage
  + llm_span_attributes + llm_span_attributes_absent expectations.
  The dispatcher uses a set-membership check matching the
  precedent set by _run_llm_payload_fixture.

The conformance.toml 0047 status stays at not-yet — wire-byte
canonicalization and prompt-management substring stability are
still ahead in the v0.13.0 cycle.
Copilot AI review requested due to automatic review settings June 8, 2026 01:08

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements proposal 0047 §5.5.3.1 by propagating provider cache-usage stats through the LLM sentinel event payload and conditionally emitting OA-namespace cache attributes on the completed LLM span.

Changes:

  • Extend LlmEventPayload with optional cached_tokens / cache_creation_tokens fields and populate them in the OpenAI provider’s _make_llm_event from Response.usage.
  • Emit openarmature.llm.cache_read.input_tokens and openarmature.llm.cache_creation.input_tokens on the completed LLM span only when the corresponding payload fields are non-None (preserving absent-vs-reported-zero).
  • Add unit tests and conformance fixture runner coverage for cache attribute emission, omission, and reported-zero behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/test_observability_otel.py Adds unit tests asserting conditional cache attribute emission on LLM spans.
tests/unit/test_llm_provider.py Adds unit tests verifying provider→payload propagation of cache stats.
tests/conformance/test_observability.py Activates fixtures 040–042 and adds a cache-fixture runner asserting Response.usage + span attributes.
src/openarmature/observability/otel/observer.py Emits OA cache attributes on completed LLM spans when payload fields are present.
src/openarmature/observability/llm_event.py Extends the sentinel LLM event payload with cache-stat fields.
src/openarmature/llm/providers/openai.py Populates cache-stat fields into LlmEventPayload via _make_llm_event.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/conformance/test_observability.py
Address PR review feedback: _run_llm_cache_fixture_case constructed
an OpenAIProvider but never awaited aclose(), leaking the underlying
httpx.AsyncClient connection pool. Fixture 005 and 038 runners
elsewhere in this file close the provider in a finally block; the
new cache-fixture runner now follows the same convention.

Extending the fix to the typed-event runners introduced in PR #139
(_build_simple_llm_graph, _run_typed_event_fanout_case,
_run_typed_event_branches_case) — same bug class, same file,
CoPilot missed them in the original review. _build_simple_llm_graph
now returns (graph, state_cls, provider) so the caller can close
the provider in a finally; the fan-out and parallel-branches
runners close their inline-constructed providers symmetrically.

No behavior change for fixture pass/fail outcomes; the leak was
warning-level rather than failure-inducing. Full suite stays at
1191 pass.
@chris-colinsky chris-colinsky merged commit 644af66 into main Jun 8, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/0047-otel-cache-attribute-emission branch June 8, 2026 01:26
chris-colinsky added a commit that referenced this pull request Jun 10, 2026
* Correct v0.13.0 release narrative per spec review

Three blocking + three should-fix items spec flagged on the
pre-tag review. All narrative; no code behavior change.

- 0047 CHANGELOG entry mis-attributed pieces 1+2 (Response.usage
  cache fields + OTel cache attributes) to v0.12.0. Verified via
  git: those landed in PRs #136 + #140 post-v0.12.0-tag, so all
  three pieces of 0047 ship in v0.13.0. Reframed.
- conformance.toml [proposals."0047"] leading-comment block had
  the same v0.12.0 mis-attribution. Same correction; added PR
  references for traceability.
- Unreleased section had two ### Added headings with the 0057
  entry orphaned below ### Changed. Consolidated.
- Spec pin advance text undercounted the cycle journey (said
  v0.51.0 → v0.53.0; actual is v0.46.0 → v0.53.0 across three
  hops). Reframed and listed absorbed proposals inline.
- tool_call.arguments JSON encoding now uses sort_keys=True
  (functionally equivalent but byte-different for downstream
  snapshot consumers). Surfaced as its own ### Changed entry
  instead of buried in the 0047 ### Added.
- conformance.toml [proposals."0049"] leading-comment block
  grew the fixture-deferral surface (057-068 + 069-073 parser-
  deferred pending harness directive schema catch-up; behavior
  pinned by unit tests) per spec OQ2.

* Migrate LLM-event docs to typed-event-first

Three docs still pushed the legacy sentinel-namespace pattern as
the primary path for custom observers consuming LLM events and
custom providers emitting them. After v0.13.0 the bundled provider
emits typed LlmCompletionEvent / LlmFailedEvent variants directly;
the bundled OTel + Langfuse observers consume via isinstance
discrimination. Rewrites:

- docs/concepts/observability.md: "Publishing LLM events for
  custom observers" → "Consuming LLM events in custom observers".
  Typed-event consumption shown as primary (isinstance branch on
  LlmCompletionEvent + LlmFailedEvent with the mutual-exclusion +
  field-set notes). Sentinel pattern demoted to a "Legacy
  sentinel-namespace pattern (compatibility surface)" subsection
  for downstream code interoperating with custom providers that
  haven't migrated.
- docs/model-providers/authoring.md: custom-provider emission
  sketch rewritten — dispatch LlmCompletionEvent on success,
  LlmFailedEvent alongside the §7 exception on failure. Shows the
  current-attempt-index / current-fan-out-index / etc. scoping
  fields the typed events carry. Calls out the mutual-exclusion
  + exception-flow-preservation contracts. Legacy sentinel pattern
  retained as a compatibility-surface callout for older providers.
- docs/agent/non-obvious-shapes.md: "filter openarmature.*-
  namespaced events" tip drops the openarmature.llm.complete
  example (v0.13.0 retired the sentinel pattern for LLM events);
  checkpoint sentinels stay since the tip is still applicable for
  those. Adjusted the follow-on paragraph mentioning LLM events.

mkdocs strict build clean.

* Regenerate AGENTS.md for typed-event doc migration

The non-obvious-shapes doc migration changed a generator source
without regenerating the committed AGENTS.md. Bring it back in
sync so the drift guard passes.

* Extend production-observability example for v0.13.0

Add an LlmFailureTracker observer that consumes the typed
LlmFailedEvent and rolls up per-invocation error-category counts,
and extend LlmUsageAccumulator to track cached_tokens and report a
cache-hit ratio. The persist node now reports both rollups and the
OTel formatter surfaces the cache-read attribute.

Also drop spec/proposal references and em dashes from the
example's comments and walk-through, which carry no meaning for
end users reading the code.

* Drop spec and proposal references from examples

Example comments and docstrings quoted proposal numbers and spec
section refs that have no meaning to end users reading the code.
Reword them to describe only the implementation behavior.

* Add tests for production-observability observers

The examples smoke test only proves the demo loads and its
build_graph() compiles. Cover the two queryable observers the
production-observability example ships: cache-token accumulation
and the derived cache-hit ratio, failure-category counting,
mutual exclusion between the success and failure events, the
per-invocation bucket cleanup, and the OTel cache-read attribute.
The persist-output check drives the real persist node offline.

* Guard legacy LLM observer snippet with NodeEvent check

The legacy sentinel-namespace observer example accessed
event.namespace / event.pre_state without narrowing to NodeEvent.
A real observer receives the full ObserverEvent union, where
variants like InvocationCompletedEvent have no namespace, so the
snippet would raise AttributeError. Add an isinstance(event,
NodeEvent) guard so the copy-paste example is correct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants