Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

- **`RetryMiddleware` now takes a `RetryConfig` record** instead of individual constructor kwargs (proposal 0050 prep). The four retry settings (`max_attempts` / `classifier` / `backoff` / `on_retry`, each optional) move onto a frozen `RetryConfig`; construct as `RetryMiddleware(RetryConfig(max_attempts=...))`, while bare `RetryMiddleware()` still applies the defaults. This is a breaking change to the `RetryMiddleware` constructor. The record is the same shape the upcoming call-level `complete(retry=...)` parameter will accept, so one retry config serves both the per-node and per-call layers. `None` fields resolve to the canonical defaults (`default_classifier` / `exponential_jitter_backoff`) at use, preserving the prior behavior.
- **Failure-isolation events report the originating cause's category at non-node placements** (proposal 0065, pipeline-utilities §6.3). When `FailureIsolationMiddleware` runs as instance middleware (§9.7), branch middleware (§11.7), or parent-node middleware on a fan-out / parallel-branches node, the graph engine has already wrapped the originating error as a `node_exception` carrier before the middleware catches it. `FailureIsolatedEvent.caught_exception.category` now resolves through that carrier (and any nested carriers) to the nearest categorized originating cause and reports its category instead of the masking `node_exception`, so the reported category agrees with what the §6.1 retry classifier acted on. For example, an instance whose retries exhaust on `provider_unavailable` now surfaces `provider_unavailable` rather than `node_exception`. The `message` tracks the resolved cause for category/message coherence. Node-level placement was already faithful and is unchanged, and catch/degrade behavior is unchanged at every site (only the event's reported cause changes). The wrapped-instance/branch lineage SHOULD (`fan_out_index` / `branch_name`) is deferred to a follow-up, since it needs the engine to surface per-instance identity to the wrapping-site middleware.
- **Observer privacy flag `disable_llm_payload` renamed to `disable_provider_payload`** (proposal 0059, observability §5.5.4, spec v0.54.0). The observer-level flag on both bundled observers (`OTelObserver` and `LangfuseObserver`) is renamed, and its scope broadens from LLM-completion payload to any provider-call payload (LLM completion today; embedding and rerank when those land). This is a breaking change to both observer constructors: config passing `disable_llm_payload=True` (or `False`) updates to `disable_provider_payload=...` with no other change. The default stays `True` (payload suppressed), and the gating behavior for `LlmCompletionEvent` / `LlmFailedEvent` rendering is unchanged at every existing site. The rename is the only part of proposal 0059 adopted this cycle: the retrieval-provider capability itself (the `EmbeddingProvider` protocol, the `EmbeddingEvent` / `EmbeddingFailedEvent` typed variants, and the embedding span / observation mapping) is not yet implemented and rides as `not-yet` in `conformance.toml`. The §5.5.4 rename touches existing LLM-payload gating, so it lands with the pin. Pinned spec advances v0.53.0 → v0.54.0.

## [0.13.0] — 2026-06-09

Expand Down
18 changes: 15 additions & 3 deletions conformance.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@

[manifest]
implementation = "openarmature-python"
spec_pin = "v0.53.0"
spec_pin = "v0.54.0"

# Status values:
# implemented — shipped behavior matches the proposal's contract
Expand Down Expand Up @@ -454,9 +454,9 @@ since = "0.12.0"
# not:
# ``tests/unit/test_observability_otel.py::
# test_invocation_span_carries_implementation_attribution_attributes``;
# - OTel always-emit invariant under ``disable_llm_payload``,
# - OTel always-emit invariant under ``disable_provider_payload``,
# ``disable_genai_semconv``, ``disable_llm_spans``:
# ``::test_invocation_span_attribution_emits_under_disable_llm_payload``;
# ``::test_invocation_span_attribution_emits_under_disable_provider_payload``;
# - OTel attributes emit on every invocation span across a
# reused observer (3 sequential invocations):
# ``::test_invocation_span_attribution_emits_on_every_invocation``;
Expand Down Expand Up @@ -593,3 +593,15 @@ since = "0.13.0"
[proposals."0058"]
status = "implemented"
since = "0.13.0"

# Spec v0.54.0 (proposal 0059). Retrieval-provider capability —
# the ``EmbeddingProvider`` protocol + ``EmbeddingEvent`` /
# ``EmbeddingFailedEvent`` typed variants + OTel/Langfuse embedding
# mapping. Python has not yet shipped the embedding surface, so the
# capability is not-yet. The one piece adopted at this pin is the
# proposal's cross-spec consequence: the observer-level privacy flag
# ``disable_llm_payload`` is renamed ``disable_provider_payload`` (the
# §5.5.4 rename touches existing LLM-payload gating, so it lands with
# the pin even though the embedding capability does not).
[proposals."0059"]
status = "not-yet"
6 changes: 3 additions & 3 deletions docs/agent/non-obvious-shapes.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ else:

The discriminator is one branch; missing it gives you empty data on tool-call responses and silently wrong behavior on truncations.

### `disable_llm_payload` defaults to `True`: flip it for LLM-aware observability backends
### `disable_provider_payload` defaults to `True`: flip it for LLM-aware observability backends

The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_llm_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras).
The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_provider_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras).

That's the right default for general OpenArmature use: payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why.

Expand All @@ -61,7 +61,7 @@ from openarmature.observability import OTelObserver

observer = OTelObserver(
span_processor=your_exporter,
disable_llm_payload=False, # opt in to message-payload attributes
disable_provider_payload=False, # opt in to message-payload attributes
)
graph.attach_observer(observer)
```
Expand Down
14 changes: 7 additions & 7 deletions docs/concepts/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -702,12 +702,12 @@ source on your stack.
### LLM payload attributes

By default, LLM spans do **not** carry the messages sent or the
response content. Opt in with `disable_llm_payload=False`:
response content. Opt in with `disable_provider_payload=False`:

```python
observer = OTelObserver(
span_processor=SimpleSpanProcessor(exporter),
disable_llm_payload=False,
disable_provider_payload=False,
)
```

Expand Down Expand Up @@ -764,7 +764,7 @@ level (per llm-provider §3.1.2); only `source` is replaced. URL-form
images pass through unchanged: the URL is a short string and is
informative for trace readers.

Redaction is **not** gated by `disable_llm_payload` and is **not**
Redaction is **not** gated by `disable_provider_payload` and is **not**
configurable. Inline image bytes never leave the provider in event
form, so custom observers consuming
[`LlmCompletionEvent` / `LlmFailedEvent`](#consuming-llm-events-in-custom-observers)
Expand Down Expand Up @@ -967,7 +967,7 @@ langfuse_client = Langfuse(
)
observer = LangfuseObserver(
client=LangfuseSDKAdapter(langfuse_client),
disable_llm_payload=False,
disable_provider_payload=False,
)
```

Expand Down Expand Up @@ -1014,15 +1014,15 @@ for a runnable demo.

### Payload + truncation

`disable_llm_payload` mirrors the OTel observer's flag and defaults
`disable_provider_payload` mirrors the OTel observer's flag and defaults
to `True` for the same privacy reason. Flip to `False` to populate
`generation.input` / `output` / `metadata.request_extras` from the
LLM event payload.

```python
observer = LangfuseObserver(
client=client,
disable_llm_payload=False,
disable_provider_payload=False,
payload_byte_cap=65536,
)
```
Expand Down Expand Up @@ -1057,5 +1057,5 @@ graph.attach_observer(otel_observer)
graph.attach_observer(langfuse_observer)
```

Each observer's `disable_llm_spans` / `disable_llm_payload` flag is
Each observer's `disable_llm_spans` / `disable_provider_payload` flag is
independent; one MAY emit while the other suppresses.
6 changes: 3 additions & 3 deletions docs/examples/langfuse-observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ manual wiring at the call site.
surfaces it on every Generation that renders from that prompt.
Filesystem / in-memory backends without that reference work too,
they just produce metadata-only linkage.
- `disable_llm_payload=False` opt-in for capturing input messages +
- `disable_provider_payload=False` opt-in for capturing input messages +
output content on Generation observations. Default-off is the
privacy posture; the demo deliberately flips it.
- `correlation_id` cross-cutting metadata on the Trace and every
Expand Down Expand Up @@ -140,7 +140,7 @@ langfuse_client = Langfuse(
)
observer = LangfuseObserver(
client=LangfuseSDKAdapter(langfuse_client),
disable_llm_payload=False,
disable_provider_payload=False,
)
```

Expand Down Expand Up @@ -174,7 +174,7 @@ graph.attach_observer(OTelObserver(span_processor=batch))
graph.attach_observer(LangfuseObserver(client=langfuse_client))
```

Their `disable_llm_spans` / `disable_llm_payload` flags are
Their `disable_llm_spans` / `disable_provider_payload` flags are
independent. The `correlation_id` cross-cutting attribute is the join
key: find a slow Generation in Langfuse, search for the
`correlation_id` in OTel logs to see the surrounding infrastructure
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/production-observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ answer: The primary objective of Apollo 11 was ...
model: gpt-4o-mini-2024-07-18

--- captured OTel spans ---
[openarmature.invocation] 1240.0ms openarmature.graph.entry_node='respond', openarmature.graph.spec_version='0.53.0', openarmature.implementation.name='openarmature-python', openarmature.implementation.version='0.13.0'
[openarmature.invocation] 1240.0ms openarmature.graph.entry_node='respond', openarmature.graph.spec_version='0.54.0', openarmature.implementation.name='openarmature-python', openarmature.implementation.version='0.13.0'
[respond] 1235.0ms openarmature.node.name='respond', openarmature.user.tenantId='demo-acme', ...
[openarmature.llm.complete] 1200.0ms openarmature.user.tenantId='demo-acme', gen_ai.system='openai', gen_ai.usage.input_tokens=42, ...
[persist] 2.0ms openarmature.node.name='persist', openarmature.user.tenantId='demo-acme', ...
Expand Down Expand Up @@ -289,7 +289,7 @@ langfuse_observer = LangfuseObserver(
),
trace_input_from_state=_trace_input,
trace_output_from_state=_trace_output,
disable_llm_payload=False,
disable_provider_payload=False,
)
```

Expand Down
4 changes: 2 additions & 2 deletions examples/langfuse-observability/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,13 +262,13 @@ async def main() -> None:
# ``LangfuseClient`` Protocol; the observer code doesn't change.
client = InMemoryLangfuseClient()

# disable_llm_payload=False opts in to capturing the input messages
# disable_provider_payload=False opts in to capturing the input messages
# and output content on Generation observations. Default is True
# for the same privacy reason the OTel observer's flag exists:
# payloads may contain PII the operator hasn't audited. Flip it
# deliberately here because the demo's whole point is showing what
# the model saw and returned.
observer = LangfuseObserver(client=client, disable_llm_payload=False)
observer = LangfuseObserver(client=client, disable_provider_payload=False)

graph = build_graph()
graph.attach_observer(observer)
Expand Down
8 changes: 4 additions & 4 deletions examples/production-observability/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -582,14 +582,14 @@ def build_graph() -> CompiledGraph[BriefingState]:
# any OTLP-compatible backend.
#
# Caller hooks attach to LangfuseObserver via constructor kwargs.
# ``disable_llm_payload=False`` opts in to capturing the input
# ``disable_provider_payload=False`` opts in to capturing the input
# messages + output content on Generation observations so the demo
# output is meaningful; the default-True is the privacy-preserving
# setting.


def _build_otel_observer(exporter: InMemorySpanExporter) -> OTelObserver:
# ``disable_llm_payload=False`` opts in to capturing input messages
# ``disable_provider_payload=False`` opts in to capturing input messages
# + output content on the LLM-call span (same flag the Langfuse
# observer below flips for the same reason). The example's whole
# point is showing both backends seeing the same logical events;
Expand All @@ -600,14 +600,14 @@ def _build_otel_observer(exporter: InMemorySpanExporter) -> OTelObserver:
return OTelObserver(
span_processor=SimpleSpanProcessor(exporter),
resource=Resource.create({"service.name": "openarmature-production-observability"}),
disable_llm_payload=False,
disable_provider_payload=False,
)


def _build_langfuse_observer(client: InMemoryLangfuseClient) -> LangfuseObserver:
return LangfuseObserver(
client=client,
disable_llm_payload=False,
disable_provider_payload=False,
trace_input_from_state=_trace_input,
trace_output_from_state=_trace_output,
)
Expand Down
2 changes: 1 addition & 1 deletion openarmature-spec
Submodule openarmature-spec updated 56 files
+22 −0 CHANGELOG.md
+3 −2 README.md
+1 −0 docs/capabilities/retrieval-provider.md
+9 −0 docs/compatibility.md
+68 −0 docs/open-questions.md
+2 −1 docs/proposals.md
+1 −0 docs/proposals/0059-retrieval-provider-embedding.md
+1 −0 mkdocs.yml
+627 −0 proposals/0059-retrieval-provider-embedding.md
+96 −4 spec/graph-engine/spec.md
+4 −4 spec/observability/conformance/012-otel-llm-payload-default-off.md
+2 −2 spec/observability/conformance/012-otel-llm-payload-default-off.yaml
+3 −3 spec/observability/conformance/013-otel-llm-payload-enabled.md
+3 −3 spec/observability/conformance/013-otel-llm-payload-enabled.yaml
+1 −1 spec/observability/conformance/014-otel-llm-payload-truncation.md
+2 −2 spec/observability/conformance/014-otel-llm-payload-truncation.yaml
+1 −1 spec/observability/conformance/015-otel-llm-payload-image-redaction.md
+2 −2 spec/observability/conformance/015-otel-llm-payload-image-redaction.yaml
+4 −4 spec/observability/conformance/018-otel-llm-request-extras.md
+3 −3 spec/observability/conformance/018-otel-llm-request-extras.yaml
+1 −1 spec/observability/conformance/022-langfuse-basic-trace.yaml
+2 −2 spec/observability/conformance/023-langfuse-generation-rendering.md
+4 −4 spec/observability/conformance/023-langfuse-generation-rendering.yaml
+1 −1 spec/observability/conformance/037-langfuse-trace-input-output.md
+32 −0 spec/observability/conformance/074-embedding-event-dispatch.md
+76 −0 spec/observability/conformance/074-embedding-event-dispatch.yaml
+33 −0 spec/observability/conformance/075-embedding-failure-event-dispatch-on-provider-unavailable.md
+63 −0 spec/observability/conformance/075-embedding-failure-event-dispatch-on-provider-unavailable.yaml
+25 −0 spec/observability/conformance/076-embedding-event-mutual-exclusion.md
+85 −0 spec/observability/conformance/076-embedding-event-mutual-exclusion.yaml
+24 −0 spec/observability/conformance/077-embedding-event-call-id-distinct.md
+59 −0 spec/observability/conformance/077-embedding-event-call-id-distinct.yaml
+26 −0 spec/observability/conformance/078-embedding-event-input-strings-populated.md
+46 −0 spec/observability/conformance/078-embedding-event-input-strings-populated.yaml
+28 −0 spec/observability/conformance/079-embedding-event-request-params-populated.md
+87 −0 spec/observability/conformance/079-embedding-event-request-params-populated.yaml
+26 −0 spec/observability/conformance/080-embedding-event-input-count-and-dimensions-populated.md
+49 −0 spec/observability/conformance/080-embedding-event-input-count-and-dimensions-populated.yaml
+27 −0 spec/observability/conformance/081-embedding-event-active-prompt-populated.md
+76 −0 spec/observability/conformance/081-embedding-event-active-prompt-populated.yaml
+35 −0 spec/observability/conformance/082-otel-embedding-span-attributes.md
+65 −0 spec/observability/conformance/082-otel-embedding-span-attributes.yaml
+39 −0 spec/observability/conformance/083-langfuse-embedding-observation.md
+123 −0 spec/observability/conformance/083-langfuse-embedding-observation.yaml
+146 −16 spec/observability/spec.md
+34 −0 spec/retrieval-provider/conformance/001-embed-positive-control.md
+80 −0 spec/retrieval-provider/conformance/001-embed-positive-control.yaml
+32 −0 spec/retrieval-provider/conformance/002-embed-model-binding-error.md
+42 −0 spec/retrieval-provider/conformance/002-embed-model-binding-error.yaml
+27 −0 spec/retrieval-provider/conformance/003-embed-malformed-response-mismatched-vector-count.md
+43 −0 spec/retrieval-provider/conformance/003-embed-malformed-response-mismatched-vector-count.yaml
+26 −0 spec/retrieval-provider/conformance/004-embed-malformed-response-inconsistent-dimensions.md
+45 −0 spec/retrieval-provider/conformance/004-embed-malformed-response-inconsistent-dimensions.yaml
+25 −0 spec/retrieval-provider/conformance/005-embed-input-order-preserved.md
+52 −0 spec/retrieval-provider/conformance/005-embed-input-order-preserved.yaml
+220 −0 spec/retrieval-provider/spec.md
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec"
openarmature = "openarmature.cli:main"

[tool.openarmature]
spec_version = "0.53.0"
spec_version = "0.54.0"

[dependency-groups]
dev = [
Expand Down
10 changes: 5 additions & 5 deletions src/openarmature/AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OpenArmature — Agent documentation

*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.53.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
*This is the agent guide bundled with the openarmature Python package, version 0.13.0 (spec v0.54.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*

## TL;DR

Expand All @@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents:

## Capability contracts

_Sourced from openarmature-spec v0.53.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
_Sourced from openarmature-spec v0.54.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._

### Capability: `graph-engine`

Expand Down Expand Up @@ -1377,9 +1377,9 @@ else:

The discriminator is one branch; missing it gives you empty data on tool-call responses and silently wrong behavior on truncations.

### `disable_llm_payload` defaults to `True`: flip it for LLM-aware observability backends
### `disable_provider_payload` defaults to `True`: flip it for LLM-aware observability backends

The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_llm_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras).
The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_provider_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras).

That's the right default for general OpenArmature use: payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why.

Expand All @@ -1390,7 +1390,7 @@ from openarmature.observability import OTelObserver

observer = OTelObserver(
span_processor=your_exporter,
disable_llm_payload=False, # opt in to message-payload attributes
disable_provider_payload=False, # opt in to message-payload attributes
)
graph.attach_observer(observer)
```
Expand Down
2 changes: 1 addition & 1 deletion src/openarmature/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
"""

__version__ = "0.13.0"
__spec_version__ = "0.53.0"
__spec_version__ = "0.54.0"
# Proposal 0052 (spec observability §5.1 / §8.4.1): canonical
# package-registry name for this implementation. Surfaces on every
# OTel invocation span as ``openarmature.implementation.name`` and on
Expand Down
4 changes: 2 additions & 2 deletions src/openarmature/graph/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,7 @@ class InvocationCompletedEvent:
# which already enforces the redaction. The three payload-bearing
# fields (input_messages, output_content, request_extras) are
# populated unconditionally on the typed event per §5.5.7; observer-
# side privacy gates (OTel disable_llm_payload, Langfuse equivalents)
# side privacy gates (OTel disable_provider_payload, Langfuse equivalents)
# apply at rendering, symmetric with the §5.5.1 span attribute path.
# Custom queryable observers (per observability §9) own their own
# redaction posture — gating belongs at rendering with the consumer's
Expand Down Expand Up @@ -597,7 +597,7 @@ class LlmCompletionEvent:
#
# Privacy posture identical to LlmCompletionEvent: input_messages /
# request_params / request_extras are populated unconditionally per
# §5.5.7; observer-side privacy gates (OTel disable_llm_payload,
# §5.5.7; observer-side privacy gates (OTel disable_provider_payload,
# Langfuse equivalents) apply at rendering. Inline image bytes are
# redacted per observability §5.5.5 before population. Custom
# queryable observers own their own redaction posture.
Expand Down
Loading
Loading