Skip to content

feat: enforcement and intent assessor improvements (ADR A.5, A.8)#484

Merged
jwm4 merged 3 commits into
mainfrom
feat/461-enforcement-intent-improvements
May 29, 2026
Merged

feat: enforcement and intent assessor improvements (ADR A.5, A.8)#484
jwm4 merged 3 commits into
mainfrom
feat/461-enforcement-intent-improvements

Conversation

@jwm4
Copy link
Copy Markdown
Contributor

@jwm4 jwm4 commented May 29, 2026

Summary

  • Reprioritize DeterministicEnforcementAssessor scoring: agent hooks (.claude/settings.json) now score 60 pts (up from 30), git hooks (pre-commit/Husky) score 40 pts (down from 60), pass threshold lowered to 40
  • Add design doc enforcement detection to DesignIntentAssessor: advisory enforcement in AGENTS.md (+10 pts) or deterministic enforcement via hooks/skills (+15 pts)
  • Add two recommended starter hooks to .claude/settings.json: auto-format on edit (black + isort) and destructive command blocker
  • Update test-assess skill cleanup to use find -delete instead of rm -rf (which the new hook blocks)

Implements Proposals A.5 and A.8 from the accepted ADR. Fourth of six implementation PRs.

Self-score change: 74.8 -> 75.5 Gold (agent hooks bring deterministic_enforcement from 50 to 100, crossing the Gold threshold).

A.5: Hook scoring reprioritization

The BP insight: "context file instructions are advisory; hooks are deterministic." Agent hooks always execute during agent workflows and cannot be bypassed, while git hooks can be skipped with --no-verify. New scoring:

Signal Before After
.claude/settings.json with hooks 30 pts 60 pts
.pre-commit-config.yaml 60 pts 40 pts
.husky with hook scripts 60 pts 40 pts
Pass threshold 60 40

Repos with only pre-commit still pass (40 >= 40). Repos with only agent hooks now pass (60 >= 40, was 30 < 60).

A.8: Design intent enforcement detection

The assessor now checks whether design doc updates are enforced, not just whether design docs exist. Two levels:

  • Advisory (10 pts): AGENTS.md/CLAUDE.md contains rules requiring design doc updates with architectural changes
  • Deterministic (15 pts): Hooks or skills that enforce design doc updates

The higher of the two is awarded (not additive).

Starter hooks

Added two hooks from the BP recommended starters to .claude/settings.json:

  1. Auto-format on edit: Runs black and isort after every Edit/Write
  2. Block destructive operations: Pattern-matches against rm -rf, DROP TABLE, --force

Note on the destructive command blocker: This hook is a lightweight guardrail, not a comprehensive safety solution. An agent can achieve the same destructive effect through alternative commands that don't match the pattern (e.g., find -delete instead of rm -rf). In practice, many agents won't work around the restriction, so it's a net positive for catching obvious destructive operations, but it should not be relied on as a security boundary.

Related issues

  1. Remove redundant assessors, realign tiers, rebalance weights (Remove redundant assessors, realign tiers, rebalance weights (ADR C.1-4, E.1-2) #458) - merged in refactor: remove redundant assessors, realign tiers, rebalance weights #464
  2. Context file assessor improvements (Context file assessor improvements (ADR A.1, A.9, A.4) #459) - merged in feat: context file assessor improvements (ADR A.1, A.9, A.4) #477
  3. Test assessor enhancements (Test assessor enhancements (ADR A.2, A.3) #460) - merged in feat: test assessor enhancements for command documentation and organization (ADR A.2, A.3) #481
  4. This PR - Enforcement and intent assessor improvements (Enforcement and intent assessor improvements (ADR A.5, A.8) #461)
  5. Code quality assessor enhancements (Code quality assessor enhancements (ADR A.6, A.7) #462)
  6. New assessors for architectural boundaries and threat models (New assessors for architectural boundaries and threat models (ADR B.1, B.2) #463)

Test plan

  • black . && isort . && ruff check . passes
  • pytest tests/unit/ passes (1122 passed, 17 skipped)
  • agentready assess . runs successfully (75.5/100 Gold)
  • 12 new tests for enforcement scoring and design intent enforcement detection
  • All existing tests pass unchanged (except 3 updated for new score values)

Closes #461

Posted by Bill Murdock with assistance from Claude Code.

Summary by CodeRabbit

  • New Features

    • Added automatic formatting after tool use and a safety check to block destructive shell commands.
  • Documentation

    • Revised enforcement docs and scoring/threshold guidance for deterministic agent hooks.
  • Bug Fixes

    • Safer test cleanup to avoid recursive removal of directories.
  • Tests

    • Expanded coverage for enforcement detection, scoring, and precedence.
  • Chores

    • Ignore local agent/tooling config files in VCS.

Reprioritize DeterministicEnforcementAssessor scoring so agent hooks
(60 pts) outrank bypassable git hooks (40 pts), and add design doc
enforcement detection to DesignIntentAssessor (advisory 10 pts,
deterministic 15 pts). Also adds recommended starter hooks to
.claude/settings.json and updates test-assess skill cleanup to
avoid triggering the new destructive-command blocker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@jwm4, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 18 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e9767417-4b6f-4195-af43-ad25e7719e24

📥 Commits

Reviewing files that changed from the base of the PR and between 7858491 and efa1843.

📒 Files selected for processing (4)
  • src/agentready/assessors/patterns.py
  • src/agentready/assessors/testing.py
  • tests/unit/test_assessors_patterns.py
  • tests/unit/test_assessors_testing.py
📝 Walkthrough

Walkthrough

Adds deterministic agent hooks (formatting + destructive-command guard), reprioritizes enforcement scoring toward .claude/settings.json, extends DesignIntentAssessor to detect enforcement in hooks/skills/AGENTS.md, updates tests to assert exact scoring/precedence, and updates docs and gitignore.

Changes

Agent Hook Prioritization and Design Intent Enforcement

Layer / File(s) Summary
Agent Hook Safety & ignore updates
.claude/settings.json, .gitignore, .claude/skills/test-assess/SKILL.md
Adds hooks with a PreToolUse Bash guard blocking destructive patterns and PostToolUse formatting (black/isort); updates test-skill cleanup to avoid rm -rf; ignores local agent settings and tooling.
DesignIntentAssessor enforcement detection
src/agentready/assessors/patterns.py
Adds DesignIntentAssessor._check_design_enforcement, detecting deterministic enforcement (hooks/skills) and advisory enforcement (AGENTS.md/CLAUDE.md), returning a (score, evidence) tuple and applied inside assess().
DeterministicEnforcementAssessor scoring changes
src/agentready/assessors/testing.py
Reweights scoring: .claude/settings.json agent hooks score higher than pre-commit and Husky; lowers pass threshold from 60 to 40; updates remediation wording/order.
DesignIntentAssessor tests
tests/unit/test_assessors_patterns.py
Adds tests for advisory (+10) and deterministic (+15) enforcement bonuses, precedence rules, skill-triggered deterministic bonus, and behavior when design docs are absent.
DeterministicEnforcementAssessor tests
tests/unit/test_assessors_testing.py
Tightens existing hook tests to exact scores and adds cases covering .claude/settings.json hook scoring, failing/pass boundaries, and combined configurations.
Attribute documentation
docs/attributes.md
Updates deterministic enforcement and design-intent sections to reflect agent-hook determinism, new scoring breakdowns, pass threshold, and explicit enforcement bonus rules (+10 advisory, +15 deterministic).

Possibly related PRs

Suggested labels

released

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format with 'feat' type and clearly describes the main changes: enforcement and intent assessor improvements with ADR references.
Linked Issues check ✅ Passed All coding requirements from #461 are met: DeterministicEnforcementAssessor scoring reprioritized (agent hooks 60 pts, pre-commit/Husky 40 pts, threshold 40), DesignIntentAssessor enhanced to detect enforcement via hooks/skills/AGENTS.md with bonus scoring (10/15 pts), documentation updated, and starter hooks added.
Out of Scope Changes check ✅ Passed All changes are directly aligned with ADR A.5/A.8 objectives: assessor scoring/logic updates, documentation revisions, hook implementations, skill cleanup, and test coverage additions. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 95.45% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/461-enforcement-intent-improvements
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/461-enforcement-intent-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

📈 Test Coverage Report

Branch Coverage
This PR 73.4%
Main 73.2%
Diff ✅ +0.2%

Coverage calculated from unit tests only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/agentready/assessors/patterns.py`:
- Around line 355-356: The current check in patterns.py awards
deterministic_score when hooks exist and doc_ref_pattern.search(hooks_str)
matches, which gives +15 for mere mentions; change the condition to require both
a design-doc reference and an enforcement-intent keyword in hooks_str (e.g.,
"must", "must not", "require", "required", "ensure", "update", "check",
"verify", "enforce") by creating/intending an intent_pattern and using
intent_pattern.search(hooks_str.lower()) &&
doc_ref_pattern.search(hooks_str.lower()) before setting deterministic_score
(refer to hooks, hooks_str, doc_ref_pattern, deterministic_score).

In `@src/agentready/assessors/testing.py`:
- Around line 797-806: The current check in the claude_settings block awards 60
points if the "hooks" key exists even when it's empty; change the condition that
awards points (the if that checks "hooks" in content) to verify the hooks object
is non-empty (e.g., content.get("hooks") is truthy and, for dict/list, has len()
> 0) before incrementing score and appending evidence; update the check around
claude_settings.exists(), content = json.loads(...), and the branch that
modifies score and evidence to only run when content["hooks"] contains at least
one entry.
- Around line 893-895: Update the remediation example and guidance text that
mentions .claude/settings.json so the PostToolUse entries use the nested hooks
schema expected by the repo: each PostToolUse should be an object like
{"hooks":[{"type":"command","command": ...}]} (i.e., include the hooks array and
type key), not a flat {"command": ...} entry; modify the example config
string(s) and the lines mentioning PostToolUse in the assessor (the example
block in testing.py that currently lists the two hooks) to show the nested
structure and note the "type":"command" wrapper so users copy a valid
.claude/settings.json configuration.

In `@tests/unit/test_assessors_patterns.py`:
- Around line 448-571: Add a new unit that mirrors
test_deterministic_enforcement_bonus but uses a .claude/settings.json hooks
entry that references "docs/design" without enforcement wording (e.g., command
"check-design-doc.sh" or a benign matcher) and assert that
DesignIntentAssessor().assess(repo) does NOT grant the +15 deterministic bonus:
check finding.score remains 50.0 (or equals the dir-only baseline) and that no
evidence contains "deterministic"; place the test next to other tests like
test_deterministic_enforcement_bonus and reference DesignIntentAssessor,
.claude/settings.json, and PreToolUse in the test name and docstring for
clarity.

In `@tests/unit/test_assessors_testing.py`:
- Around line 976-994: The test test_agent_hooks_score_higher_than_precommit
currently reuses the same repo so precommit_finding inherits the agent hooks;
update the test to measure pre-commit in isolation by creating a fresh repo (or
removing the .claude settings) before calling _make_repo for precommit_finding:
i.e., ensure DeterministicEnforcementAssessor.assess is invoked on a repo that
only has .pre-commit-config.yaml (no .claude/settings.json) so agent_finding and
precommit_finding are independent and the assertions reflect the intended 60 vs
40 comparison.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f301a2ef-3a06-4b7d-a0da-02ee22ca0dd1

📥 Commits

Reviewing files that changed from the base of the PR and between 333496d and d9295cc.

📒 Files selected for processing (7)
  • .claude/settings.json
  • .claude/skills/test-assess/SKILL.md
  • docs/attributes.md
  • src/agentready/assessors/patterns.py
  • src/agentready/assessors/testing.py
  • tests/unit/test_assessors_patterns.py
  • tests/unit/test_assessors_testing.py

Comment thread src/agentready/assessors/patterns.py Outdated
Comment thread src/agentready/assessors/testing.py
Comment thread src/agentready/assessors/testing.py
Comment thread tests/unit/test_assessors_patterns.py
Comment thread tests/unit/test_assessors_testing.py
- Require non-empty hook entries before awarding 60 pts (not just key presence)
- Require both design-doc reference AND enforcement verb for deterministic bonus
- Fix remediation example to use correct nested hook schema
- Isolate agent vs pre-commit scoring test with separate repo paths
- Add negative test for hooks mentioning design docs without enforcement verbs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jwm4
Copy link
Copy Markdown
Contributor Author

jwm4 commented May 29, 2026

Review

The scoring reprioritization (A.5) and design intent enforcement detection (A.8) are well-implemented. The two fix commits addressed CodeRabbit's round-1 feedback (enforcement verb pattern, empty hooks handling, nested hook schema in remediation). Test coverage is thorough with 12 new tests covering advisory, deterministic, precedence, skill-based detection, and edge cases.

Validated against real repos:

  • fastapi/fastapi (pre-commit only): deterministic_enforcement = 40/100 PASS. Correctly scores at the new threshold.
  • commercetools/merchant-center-application-kit (agent hooks + Husky): deterministic_enforcement = 100/100 PASS. Agent hooks (60) + Husky (40) correctly cap at 100.
  • ambient-code/agentready (main, pre-commit + settings without hooks): deterministic_enforcement = 50/100 PASS (40 + 10).

One suggestion (non-blocking):

The enforcement_verb_pattern in _check_design_enforcement (patterns.py:341-344) matches common words like "update", "check", "create" against the entire serialized hooks JSON blob. A real-world settings.json with multiple hooks will likely contain these words in unrelated commands (e.g., a formatter command containing "check"), so any repo that also mentions "design doc" anywhere in its hooks config could get the 15pt bonus regardless of whether a hook actually enforces design doc updates. Consider matching both patterns within the same hook entry rather than across the whole serialized blob. Low risk at 15pts on a T4 attribute, but worth tightening in a follow-up.

Posted by Bill Murdock with assistance from Claude Code.

@jwm4 jwm4 merged commit efe7507 into main May 29, 2026
6 checks passed
@jwm4 jwm4 deleted the feat/461-enforcement-intent-improvements branch May 29, 2026 17:30
github-actions Bot pushed a commit that referenced this pull request May 29, 2026
# [2.46.0](v2.45.0...v2.46.0) (2026-05-29)

### Features

* enforcement and intent assessor improvements (ADR A.5, A.8) ([#484](#484)) ([efe7507](efe7507))
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 2.46.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enforcement and intent assessor improvements (ADR A.5, A.8)

1 participant