From 42e11353abf60e68ad59eb26f0e10d10b0bd0aee Mon Sep 17 00:00:00 2001 From: chris-colinsky Date: Sun, 14 Jun 2026 21:51:51 -0700 Subject: [PATCH] Reconcile v0.14.0 changelog and example docs Pre-release sweep for the upcoming v0.14.0 cycle (no version bump; that lands with the release commit). Changelog: record the FailureIsolatedEvent.attempt_index behavior in a failure-isolation-wraps-retry composition (it reports the final exhausting attempt), which was the one user-visible change since v0.13.0 not yet captured. Reword the two proposal-0050 "pin unchanged" notes to be 0050-scoped so they no longer read as contradicting the cycle's consolidated pin-advance entry. Example docs: the illustrative observability output pinned the spec_version and implementation.version; replace both with placeholders so the snapshots stop going stale on every pin bump and release. Those values are release-varying, unlike the run-varying numbers the block already shows as placeholders. --- CHANGELOG.md | 4 ++-- docs/examples/langfuse-observability.md | 2 +- docs/examples/production-observability.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9b4ac6a..b8d4e6f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,8 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The ### Added -- **`FailureIsolationMiddleware`** (proposal 0050, pipeline-utilities §6.3). A third bundled middleware primitive alongside `RetryMiddleware` and `TimingMiddleware`. It catches exceptions escaping the wrapped node's inner chain and returns a configured degraded partial update, so a non-critical node can fail without aborting the whole invocation. Configuration: `degraded_update` (a static mapping or a `state -> partial_update` callable, resolved at catch time), `event_name` (required, no default, since a generic name makes downstream telemetry strictly worse), an optional `predicate` (`Exception -> bool`; only matching exceptions are caught, others propagate), and an optional async `on_caught` hook. It catches `Exception`; `BaseException` (cancellation) propagates, matching `RetryMiddleware`. On a catch it dispatches a new framework-emitted `FailureIsolatedEvent` (a distinct observer-event variant carrying `event_name`, the wrapped node's lineage identity, `pre_state` / `post_state`, and a `CaughtException` record of category plus message) onto the observer delivery queue; the bundled OTel and Langfuse observers render it as a marker span / observation. Compose it OUTER of `RetryMiddleware` for the "retry transients, degrade gracefully on exhaustion" pattern. Additive: existing pipelines see no behavior change, and the spec pin is unchanged (0050 is already within the v0.53.0 pin). -- **Call-level retry on `Provider.complete()`** (proposal 0050, llm-provider §7). The provider's `complete()` gains an optional `retry: RetryConfig | None` parameter. When supplied, the wire call is retried in-call on transient provider errors per the config (classifier, backoff, `on_retry`, `max_attempts`), so a node issuing several LLM calls in a loop does not re-run the already-successful calls when a later call hits a transient failure. The request is built and validated once (pre-send validation errors are never retried), and the call stays terminal-only on the observability surface: exactly one `LlmCompletionEvent` (eventual success) or `LlmFailedEvent` (retry exhaustion or a non-transient error) fires per `complete()` call, with a single `call_id` shared across attempts. The per-attempt span surface (N per-attempt spans and the `openarmature.llm.attempt_index` attribute) is deferred to a future cycle; `conformance.toml` marks proposal 0050 `partial` accordingly. No spec-pin change. +- **`FailureIsolationMiddleware`** (proposal 0050, pipeline-utilities §6.3). A third bundled middleware primitive alongside `RetryMiddleware` and `TimingMiddleware`. It catches exceptions escaping the wrapped node's inner chain and returns a configured degraded partial update, so a non-critical node can fail without aborting the whole invocation. Configuration: `degraded_update` (a static mapping or a `state -> partial_update` callable, resolved at catch time), `event_name` (required, no default, since a generic name makes downstream telemetry strictly worse), an optional `predicate` (`Exception -> bool`; only matching exceptions are caught, others propagate), and an optional async `on_caught` hook. It catches `Exception`; `BaseException` (cancellation) propagates, matching `RetryMiddleware`. On a catch it dispatches a new framework-emitted `FailureIsolatedEvent` (a distinct observer-event variant carrying `event_name`, the wrapped node's lineage identity, `pre_state` / `post_state`, and a `CaughtException` record of category plus message) onto the observer delivery queue; the bundled OTel and Langfuse observers render it as a marker span / observation. Compose it OUTER of `RetryMiddleware` for the "retry transients, degrade gracefully on exhaustion" pattern; in that composition `FailureIsolatedEvent.attempt_index` reports the wrapped node's final (exhausting) attempt rather than the post-retry-reset baseline. Additive: existing pipelines see no behavior change, and 0050 itself needed no pin bump (it was already within the v0.53.0 pin the cycle started from). +- **Call-level retry on `Provider.complete()`** (proposal 0050, llm-provider §7). The provider's `complete()` gains an optional `retry: RetryConfig | None` parameter. When supplied, the wire call is retried in-call on transient provider errors per the config (classifier, backoff, `on_retry`, `max_attempts`), so a node issuing several LLM calls in a loop does not re-run the already-successful calls when a later call hits a transient failure. The request is built and validated once (pre-send validation errors are never retried), and the call stays terminal-only on the observability surface: exactly one `LlmCompletionEvent` (eventual success) or `LlmFailedEvent` (retry exhaustion or a non-transient error) fires per `complete()` call, with a single `call_id` shared across attempts. The per-attempt span surface (N per-attempt spans and the `openarmature.llm.attempt_index` attribute) is deferred to a future cycle; `conformance.toml` marks proposal 0050 `partial` accordingly; 0050 needed no pin bump of its own. ### Changed diff --git a/docs/examples/langfuse-observability.md b/docs/examples/langfuse-observability.md index 027bf83..00b31f0 100644 --- a/docs/examples/langfuse-observability.md +++ b/docs/examples/langfuse-observability.md @@ -86,7 +86,7 @@ prompt: mission-briefing v7 ─── captured Langfuse trace ───────────────────────────────── Trace id=01234567-89ab-... name='answer_briefing' - metadata={correlation_id='...', entry_node='answer_briefing', spec_version='0.38.0'} + metadata={correlation_id='...', entry_node='answer_briefing', spec_version=''} [span] 'answer_briefing' level=DEFAULT metadata={attempt_index=0, correlation_id='...', namespace=['answer_briefing'], step=0} [generation] 'openarmature.llm.complete' level=DEFAULT diff --git a/docs/examples/production-observability.md b/docs/examples/production-observability.md index cdeb084..60825dd 100644 --- a/docs/examples/production-observability.md +++ b/docs/examples/production-observability.md @@ -175,7 +175,7 @@ answer: The primary objective of Apollo 11 was ... model: gpt-4o-mini-2024-07-18 --- captured OTel spans --- - [openarmature.invocation] 1240.0ms openarmature.graph.entry_node='respond', openarmature.graph.spec_version='0.54.0', openarmature.implementation.name='openarmature-python', openarmature.implementation.version='0.13.0' + [openarmature.invocation] 1240.0ms openarmature.graph.entry_node='respond', openarmature.graph.spec_version='', openarmature.implementation.name='openarmature-python', openarmature.implementation.version='' [respond] 1235.0ms openarmature.node.name='respond', openarmature.user.tenantId='demo-acme', ... [openarmature.llm.complete] 1200.0ms openarmature.user.tenantId='demo-acme', gen_ai.system='openai', gen_ai.usage.input_tokens=42, ... [persist] 2.0ms openarmature.node.name='persist', openarmature.user.tenantId='demo-acme', ...