Skip to content

Implement 0065 failure-isolation cause fidelity#153

Merged
chris-colinsky merged 2 commits into
mainfrom
fix/0065-cause-fidelity
Jun 12, 2026
Merged

Implement 0065 failure-isolation cause fidelity#153
chris-colinsky merged 2 commits into
mainfrom
fix/0065-cause-fidelity

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

Implements proposal 0065 (failure-isolation cause fidelity, spec v0.55.0), the first of three v0.14.0 pieces. This is the behavior fix only; the conformance harness and the spec-pin bump to v0.55.0 (with fixture 064) follow in separate PRs.

FailureIsolationMiddleware previously reported caught_exception.category as the engine's masking node_exception whenever it ran at a non-node placement (instance middleware §9.7, branch middleware §11.7, or parent-node middleware), because at those sites the engine wraps the originating error as a node_exception carrier before the middleware catches it. The event now resolves through the carrier to the real cause.

What changed

  • New _resolve_cause walks the __cause__ chain, skips node_exception carrier wrappers (NodeException and its ParallelBranchesBranchFailed subtype, nested included), and returns the nearest non-carrier exception that carries a category. Both category and message come from that resolved cause so they describe one exception.
  • A deliberately re-categorized surface error wins over its deeper cause; an uncategorized surface error resolves to the categorized cause beneath it, so the reported category agrees with what the §6.1 retry classifier acted on.
  • Node-level placement is unchanged (no carrier present), and catch/degrade behavior is unchanged at every site. Only the event's reported cause changes.

The wrapped-instance/branch lineage SHOULD (fan_out_index / branch_name at non-node placements) is deferred to a follow-up, since it needs the engine to surface per-instance identity to the wrapping-site middleware.

Notes

  • The "nearest categorized non-carrier" resolution goes slightly beyond 0065's carrier-only MUST (which fixture 064 pins). It satisfies the MUST and adds value in an area the proposal leaves unspecified; it will be proposed to spec as the cross-impl contract during the consolidated v0.14.0 review.
  • No spec-pin change in this PR; the bump to v0.55.0 lands with fixture 064 separately.

Validation

  • 21 unit tests in tests/unit/test_failure_isolation_middleware.py (15 existing + 6 new carrier cases) pass; ruff and pyright clean.

FailureIsolationMiddleware now resolves the FailureIsolatedEvent's
caught_exception through node_exception carrier wrappers to the nearest
categorized originating cause, instead of reporting the masking
node_exception at non-node placements (instance / branch / parent-node
middleware). Both category and message come from the resolved cause for
coherence, and the resolution agrees with what the retry classifier
acts on. Node-level placement is unchanged.
Copilot AI review requested due to automatic review settings June 12, 2026 01:47

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements proposal 0065 “failure-isolation cause fidelity” by changing FailureIsolationMiddleware to resolve through graph-engine node_exception carrier wrappers and report the nearest categorized underlying cause in FailureIsolatedEvent.caught_exception (both category and message), with accompanying unit tests and changelog entry.

Changes:

  • Add _resolve_cause() to walk __cause__, skip NodeException-family carriers, and select the nearest categorized non-carrier cause.
  • Update failure-isolation event emission to use the resolved cause’s category/message for coherent reporting.
  • Add unit tests covering carrier wrappers (including nested and ParallelBranchesBranchFailed) and update the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/openarmature/graph/middleware/failure_isolation.py Add cause-resolution logic and use it when populating FailureIsolatedEvent.caught_exception.
tests/unit/test_failure_isolation_middleware.py Add tests validating category/message fidelity through node_exception carriers and nested causes.
CHANGELOG.md Document the failure-isolation event cause-fidelity behavior change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/graph/middleware/failure_isolation.py
Traverse only BaseException instances and track visited exceptions by
id, so a non-exception __cause__ ends the walk and a self-referential
chain terminates instead of hanging or raising in the degrade path.
@chris-colinsky chris-colinsky merged commit cc430e4 into main Jun 12, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the fix/0065-cause-fidelity branch June 12, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants