Implement 0065 failure-isolation cause fidelity#153
Merged
Conversation
FailureIsolationMiddleware now resolves the FailureIsolatedEvent's caught_exception through node_exception carrier wrappers to the nearest categorized originating cause, instead of reporting the masking node_exception at non-node placements (instance / branch / parent-node middleware). Both category and message come from the resolved cause for coherence, and the resolution agrees with what the retry classifier acts on. Node-level placement is unchanged.
There was a problem hiding this comment.
Pull request overview
Implements proposal 0065 “failure-isolation cause fidelity” by changing FailureIsolationMiddleware to resolve through graph-engine node_exception carrier wrappers and report the nearest categorized underlying cause in FailureIsolatedEvent.caught_exception (both category and message), with accompanying unit tests and changelog entry.
Changes:
- Add
_resolve_cause()to walk__cause__, skipNodeException-family carriers, and select the nearest categorized non-carrier cause. - Update failure-isolation event emission to use the resolved cause’s category/message for coherent reporting.
- Add unit tests covering carrier wrappers (including nested and
ParallelBranchesBranchFailed) and update the changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/openarmature/graph/middleware/failure_isolation.py |
Add cause-resolution logic and use it when populating FailureIsolatedEvent.caught_exception. |
tests/unit/test_failure_isolation_middleware.py |
Add tests validating category/message fidelity through node_exception carriers and nested causes. |
CHANGELOG.md |
Document the failure-isolation event cause-fidelity behavior change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Traverse only BaseException instances and track visited exceptions by id, so a non-exception __cause__ ends the walk and a self-referential chain terminates instead of hanging or raising in the degrade path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements proposal 0065 (failure-isolation cause fidelity, spec v0.55.0), the first of three v0.14.0 pieces. This is the behavior fix only; the conformance harness and the spec-pin bump to v0.55.0 (with fixture 064) follow in separate PRs.
FailureIsolationMiddlewarepreviously reportedcaught_exception.categoryas the engine's maskingnode_exceptionwhenever it ran at a non-node placement (instance middleware §9.7, branch middleware §11.7, or parent-node middleware), because at those sites the engine wraps the originating error as anode_exceptioncarrier before the middleware catches it. The event now resolves through the carrier to the real cause.What changed
_resolve_causewalks the__cause__chain, skipsnode_exceptioncarrier wrappers (NodeExceptionand itsParallelBranchesBranchFailedsubtype, nested included), and returns the nearest non-carrier exception that carries a category. Bothcategoryandmessagecome from that resolved cause so they describe one exception.The wrapped-instance/branch lineage SHOULD (
fan_out_index/branch_nameat non-node placements) is deferred to a follow-up, since it needs the engine to surface per-instance identity to the wrapping-site middleware.Notes
Validation
tests/unit/test_failure_isolation_middleware.py(15 existing + 6 new carrier cases) pass; ruff and pyright clean.