Summary
Refactor DefaultPolicyEngine so evaluate() (decide) and explain() (describe)
are driven by one rule-chain definition, eliminating the two hand-maintained
~150-line copies of the same logic.
Why this matters
The two methods must agree forever: an explanation that diverges from the actual
decision is worse than no explanation, because operators will trust it. Today any
rule added to evaluate() must be manually mirrored in explain() — a classic
drift bug waiting to happen in the security-critical module. The DSL engine already
demonstrates the separated pattern (policy_dsl.py vs policy_dsl_explain.py)
that keeps each file in budget, but the default engine duplicates logic, not just
location.
Current evidence
policy.py: def evaluate( at line 180 and def explain( at line 466, each walking the same conditions (safety class rules, roles, justification, rate limits) independently; the file is 652 lines (ISSUE 12).
policy_reasons.py holds the stable reason codes both paths must emit consistently.
External context
Not required for this issue.
Proposed implementation
- Extract each rule as a small object/function returning a structured outcome
(passed, reason_code, description, would_deny).
evaluate() short-circuits on first deny; explain() runs all rules and reports
each — same chain, two traversal modes.
- Property test: for randomized principals/capabilities/contexts,
explain()'s
predicted decision equals evaluate()'s actual decision (hypothesis-style or
parametrized grid; deterministic seeds).
- Sequence before ISSUE 12's
policy.py split so the split moves the unified chain.
AI-agent execution notes
- Inspect first:
policy.py (both methods, rate-limit interaction), policy_reasons.py, tests/test_policy.py.
- Critical edge case:
evaluate() records rate-limit usage; explain() must never mutate limiter state. The unified chain needs an explicit read-only mode.
- Preserve every reason code and message string exactly (error-message contract).
- This is security-grade code: behavior-preserving refactor, verified by the full existing policy test matrix.
Acceptance criteria
- One rule-chain definition drives both methods; the duplicated blocks are gone.
- An evaluate/explain agreement test passes across a broad input grid.
explain() provably does not mutate rate-limiter state (test).
- All existing policy tests pass unchanged.
Test plan
Existing tests/test_policy.py matrix + new agreement and no-mutation tests.
Run make ci.
Documentation plan
Update docs/agent-context/architecture.md policy section; CHANGELOG Changed
(internal).
Migration and compatibility notes
No public API change. Not expected to require migration.
Risks and tradeoffs
Refactoring security-critical code carries regression risk — mitigate with the
agreement property test landing before the refactor. Slight indirection cost in
readability, repaid by impossibility of drift.
Suggested labels
architecture, refactor, security
Summary
Refactor
DefaultPolicyEnginesoevaluate()(decide) andexplain()(describe)are driven by one rule-chain definition, eliminating the two hand-maintained
~150-line copies of the same logic.
Why this matters
The two methods must agree forever: an explanation that diverges from the actual
decision is worse than no explanation, because operators will trust it. Today any
rule added to
evaluate()must be manually mirrored inexplain()— a classicdrift bug waiting to happen in the security-critical module. The DSL engine already
demonstrates the separated pattern (
policy_dsl.pyvspolicy_dsl_explain.py)that keeps each file in budget, but the default engine duplicates logic, not just
location.
Current evidence
policy.py:def evaluate(at line 180 anddef explain(at line 466, each walking the same conditions (safety class rules, roles, justification, rate limits) independently; the file is 652 lines (ISSUE 12).policy_reasons.pyholds the stable reason codes both paths must emit consistently.External context
Not required for this issue.
Proposed implementation
(passed, reason_code, description, would_deny).evaluate()short-circuits on first deny;explain()runs all rules and reportseach — same chain, two traversal modes.
explain()'spredicted decision equals
evaluate()'s actual decision (hypothesis-style orparametrized grid; deterministic seeds).
policy.pysplit so the split moves the unified chain.AI-agent execution notes
policy.py(both methods, rate-limit interaction),policy_reasons.py,tests/test_policy.py.evaluate()records rate-limit usage;explain()must never mutate limiter state. The unified chain needs an explicit read-only mode.Acceptance criteria
explain()provably does not mutate rate-limiter state (test).Test plan
Existing
tests/test_policy.pymatrix + new agreement and no-mutation tests.Run
make ci.Documentation plan
Update
docs/agent-context/architecture.mdpolicy section; CHANGELOGChanged(internal).
Migration and compatibility notes
No public API change. Not expected to require migration.
Risks and tradeoffs
Refactoring security-critical code carries regression risk — mitigate with the
agreement property test landing before the refactor. Slight indirection cost in
readability, repaid by impossibility of drift.
Suggested labels
architecture, refactor, security