Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **`FailureIsolationMiddleware`** (proposal 0050, pipeline-utilities §6.3). A third bundled middleware primitive alongside `RetryMiddleware` and `TimingMiddleware`. It catches exceptions escaping the wrapped node's inner chain and returns a configured degraded partial update, so a non-critical node can fail without aborting the whole invocation. Configuration: `degraded_update` (a static mapping or a `state -> partial_update` callable, resolved at catch time), `event_name` (required, no default, since a generic name makes downstream telemetry strictly worse), an optional `predicate` (`Exception -> bool`; only matching exceptions are caught, others propagate), and an optional async `on_caught` hook. It catches `Exception`; `BaseException` (cancellation) propagates, matching `RetryMiddleware`. On a catch it dispatches a new framework-emitted `FailureIsolatedEvent` (a distinct observer-event variant carrying `event_name`, the wrapped node's lineage identity, `pre_state` / `post_state`, and a `CaughtException` record of category plus message) onto the observer delivery queue; the bundled OTel and Langfuse observers render it as a marker span / observation. Compose it OUTER of `RetryMiddleware` for the "retry transients, degrade gracefully on exhaustion" pattern. Additive: existing pipelines see no behavior change, and the spec pin is unchanged (0050 is already within the v0.53.0 pin).
- **Call-level retry on `Provider.complete()`** (proposal 0050, llm-provider §7). The provider's `complete()` gains an optional `retry: RetryConfig | None` parameter. When supplied, the wire call is retried in-call on transient provider errors per the config (classifier, backoff, `on_retry`, `max_attempts`), so a node issuing several LLM calls in a loop does not re-run the already-successful calls when a later call hits a transient failure. The request is built and validated once (pre-send validation errors are never retried), and the call stays terminal-only on the observability surface: exactly one `LlmCompletionEvent` (eventual success) or `LlmFailedEvent` (retry exhaustion or a non-transient error) fires per `complete()` call, with a single `call_id` shared across attempts. The per-attempt span surface (N per-attempt spans and the `openarmature.llm.attempt_index` attribute) is deferred to a future cycle; `conformance.toml` marks proposal 0050 `partial` accordingly. No spec-pin change.
- **`FailureIsolationMiddleware`** (proposal 0050, pipeline-utilities §6.3). A third bundled middleware primitive alongside `RetryMiddleware` and `TimingMiddleware`. It catches exceptions escaping the wrapped node's inner chain and returns a configured degraded partial update, so a non-critical node can fail without aborting the whole invocation. Configuration: `degraded_update` (a static mapping or a `state -> partial_update` callable, resolved at catch time), `event_name` (required, no default, since a generic name makes downstream telemetry strictly worse), an optional `predicate` (`Exception -> bool`; only matching exceptions are caught, others propagate), and an optional async `on_caught` hook. It catches `Exception`; `BaseException` (cancellation) propagates, matching `RetryMiddleware`. On a catch it dispatches a new framework-emitted `FailureIsolatedEvent` (a distinct observer-event variant carrying `event_name`, the wrapped node's lineage identity, `pre_state` / `post_state`, and a `CaughtException` record of category plus message) onto the observer delivery queue; the bundled OTel and Langfuse observers render it as a marker span / observation. Compose it OUTER of `RetryMiddleware` for the "retry transients, degrade gracefully on exhaustion" pattern; in that composition `FailureIsolatedEvent.attempt_index` reports the wrapped node's final (exhausting) attempt rather than the post-retry-reset baseline. Additive: existing pipelines see no behavior change, and 0050 itself needed no pin bump (it was already within the v0.53.0 pin the cycle started from).
- **Call-level retry on `Provider.complete()`** (proposal 0050, llm-provider §7). The provider's `complete()` gains an optional `retry: RetryConfig | None` parameter. When supplied, the wire call is retried in-call on transient provider errors per the config (classifier, backoff, `on_retry`, `max_attempts`), so a node issuing several LLM calls in a loop does not re-run the already-successful calls when a later call hits a transient failure. The request is built and validated once (pre-send validation errors are never retried), and the call stays terminal-only on the observability surface: exactly one `LlmCompletionEvent` (eventual success) or `LlmFailedEvent` (retry exhaustion or a non-transient error) fires per `complete()` call, with a single `call_id` shared across attempts. The per-attempt span surface (N per-attempt spans and the `openarmature.llm.attempt_index` attribute) is deferred to a future cycle; `conformance.toml` marks proposal 0050 `partial` accordingly; 0050 needed no pin bump of its own.

### Changed

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/langfuse-observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ prompt: mission-briefing v7
─── captured Langfuse trace ─────────────────────────────────
Trace id=01234567-89ab-...
name='answer_briefing'
metadata={correlation_id='...', entry_node='answer_briefing', spec_version='0.38.0'}
metadata={correlation_id='...', entry_node='answer_briefing', spec_version='<spec-version>'}
[span] 'answer_briefing' level=DEFAULT
metadata={attempt_index=0, correlation_id='...', namespace=['answer_briefing'], step=0}
[generation] 'openarmature.llm.complete' level=DEFAULT
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/production-observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ answer: The primary objective of Apollo 11 was ...
model: gpt-4o-mini-2024-07-18

--- captured OTel spans ---
[openarmature.invocation] 1240.0ms openarmature.graph.entry_node='respond', openarmature.graph.spec_version='0.54.0', openarmature.implementation.name='openarmature-python', openarmature.implementation.version='0.13.0'
[openarmature.invocation] 1240.0ms openarmature.graph.entry_node='respond', openarmature.graph.spec_version='<spec-version>', openarmature.implementation.name='openarmature-python', openarmature.implementation.version='<version>'
[respond] 1235.0ms openarmature.node.name='respond', openarmature.user.tenantId='demo-acme', ...
[openarmature.llm.complete] 1200.0ms openarmature.user.tenantId='demo-acme', gen_ai.system='openai', gen_ai.usage.input_tokens=42, ...
[persist] 2.0ms openarmature.node.name='persist', openarmature.user.tenantId='demo-acme', ...
Expand Down