feat: enforcement and intent assessor improvements (ADR A.5, A.8) by jwm4 · Pull Request #484 · ambient-code/agentready

jwm4 · 2026-05-29T13:53:22Z

Summary

Reprioritize DeterministicEnforcementAssessor scoring: agent hooks (.claude/settings.json) now score 60 pts (up from 30), git hooks (pre-commit/Husky) score 40 pts (down from 60), pass threshold lowered to 40
Add design doc enforcement detection to DesignIntentAssessor: advisory enforcement in AGENTS.md (+10 pts) or deterministic enforcement via hooks/skills (+15 pts)
Add two recommended starter hooks to .claude/settings.json: auto-format on edit (black + isort) and destructive command blocker
Update test-assess skill cleanup to use find -delete instead of rm -rf (which the new hook blocks)

Implements Proposals A.5 and A.8 from the accepted ADR. Fourth of six implementation PRs.

Self-score change: 74.8 -> 75.5 Gold (agent hooks bring deterministic_enforcement from 50 to 100, crossing the Gold threshold).

A.5: Hook scoring reprioritization

The BP insight: "context file instructions are advisory; hooks are deterministic." Agent hooks always execute during agent workflows and cannot be bypassed, while git hooks can be skipped with --no-verify. New scoring:

Signal	Before	After
`.claude/settings.json` with hooks	30 pts	60 pts
`.pre-commit-config.yaml`	60 pts	40 pts
`.husky` with hook scripts	60 pts	40 pts
Pass threshold	60	40

Repos with only pre-commit still pass (40 >= 40). Repos with only agent hooks now pass (60 >= 40, was 30 < 60).

A.8: Design intent enforcement detection

The assessor now checks whether design doc updates are enforced, not just whether design docs exist. Two levels:

Advisory (10 pts): AGENTS.md/CLAUDE.md contains rules requiring design doc updates with architectural changes
Deterministic (15 pts): Hooks or skills that enforce design doc updates

The higher of the two is awarded (not additive).

Starter hooks

Added two hooks from the BP recommended starters to .claude/settings.json:

Auto-format on edit: Runs black and isort after every Edit/Write
Block destructive operations: Pattern-matches against rm -rf, DROP TABLE, --force

Note on the destructive command blocker: This hook is a lightweight guardrail, not a comprehensive safety solution. An agent can achieve the same destructive effect through alternative commands that don't match the pattern (e.g., find -delete instead of rm -rf). In practice, many agents won't work around the restriction, so it's a net positive for catching obvious destructive operations, but it should not be relied on as a security boundary.

Related issues

Remove redundant assessors, realign tiers, rebalance weights (Remove redundant assessors, realign tiers, rebalance weights (ADR C.1-4, E.1-2) #458) - merged in refactor: remove redundant assessors, realign tiers, rebalance weights #464
Context file assessor improvements (Context file assessor improvements (ADR A.1, A.9, A.4) #459) - merged in feat: context file assessor improvements (ADR A.1, A.9, A.4) #477
Test assessor enhancements (Test assessor enhancements (ADR A.2, A.3) #460) - merged in feat: test assessor enhancements for command documentation and organization (ADR A.2, A.3) #481
This PR - Enforcement and intent assessor improvements (Enforcement and intent assessor improvements (ADR A.5, A.8) #461)
Code quality assessor enhancements (Code quality assessor enhancements (ADR A.6, A.7) #462)
New assessors for architectural boundaries and threat models (New assessors for architectural boundaries and threat models (ADR B.1, B.2) #463)

Test plan

black . && isort . && ruff check . passes
pytest tests/unit/ passes (1122 passed, 17 skipped)
agentready assess . runs successfully (75.5/100 Gold)
12 new tests for enforcement scoring and design intent enforcement detection
All existing tests pass unchanged (except 3 updated for new score values)

Closes #461

Posted by Bill Murdock with assistance from Claude Code.

Summary by CodeRabbit

New Features
- Added automatic formatting after tool use and a safety check to block destructive shell commands.
Documentation
- Revised enforcement docs and scoring/threshold guidance for deterministic agent hooks.
Bug Fixes
- Safer test cleanup to avoid recursive removal of directories.
Tests
- Expanded coverage for enforcement detection, scoring, and precedence.
Chores
- Ignore local agent/tooling config files in VCS.

Reprioritize DeterministicEnforcementAssessor scoring so agent hooks (60 pts) outrank bypassable git hooks (40 pts), and add design doc enforcement detection to DesignIntentAssessor (advisory 10 pts, deterministic 15 pts). Also adds recommended starter hooks to .claude/settings.json and updates test-assess skill cleanup to avoid triggering the new destructive-command blocker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-29T13:53:34Z

Warning

Review limit reached

@jwm4, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 18 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e9767417-4b6f-4195-af43-ad25e7719e24

📥 Commits

Reviewing files that changed from the base of the PR and between 7858491 and efa1843.

📒 Files selected for processing (4)

src/agentready/assessors/patterns.py
src/agentready/assessors/testing.py
tests/unit/test_assessors_patterns.py
tests/unit/test_assessors_testing.py

📝 Walkthrough

Walkthrough

Adds deterministic agent hooks (formatting + destructive-command guard), reprioritizes enforcement scoring toward .claude/settings.json, extends DesignIntentAssessor to detect enforcement in hooks/skills/AGENTS.md, updates tests to assert exact scoring/precedence, and updates docs and gitignore.

Changes

Agent Hook Prioritization and Design Intent Enforcement

Layer / File(s)	Summary
Agent Hook Safety & ignore updates `.claude/settings.json`, `.gitignore`, `.claude/skills/test-assess/SKILL.md`	Adds `hooks` with a `PreToolUse` Bash guard blocking destructive patterns and `PostToolUse` formatting (black/isort); updates test-skill cleanup to avoid `rm -rf`; ignores local agent settings and tooling.
DesignIntentAssessor enforcement detection `src/agentready/assessors/patterns.py`	Adds `DesignIntentAssessor._check_design_enforcement`, detecting deterministic enforcement (hooks/skills) and advisory enforcement (AGENTS.md/CLAUDE.md), returning a (score, evidence) tuple and applied inside `assess()`.
DeterministicEnforcementAssessor scoring changes `src/agentready/assessors/testing.py`	Reweights scoring: `.claude/settings.json` agent hooks score higher than pre-commit and Husky; lowers pass threshold from 60 to 40; updates remediation wording/order.
DesignIntentAssessor tests `tests/unit/test_assessors_patterns.py`	Adds tests for advisory (+10) and deterministic (+15) enforcement bonuses, precedence rules, skill-triggered deterministic bonus, and behavior when design docs are absent.
DeterministicEnforcementAssessor tests `tests/unit/test_assessors_testing.py`	Tightens existing hook tests to exact scores and adds cases covering `.claude/settings.json` hook scoring, failing/pass boundaries, and combined configurations.
Attribute documentation `docs/attributes.md`	Updates deterministic enforcement and design-intent sections to reflect agent-hook determinism, new scoring breakdowns, pass threshold, and explicit enforcement bonus rules (+10 advisory, +15 deterministic).

Possibly related PRs

ambient-code/agentready#467: Overlapping Husky/pre-commit detection and scoring adjustments in enforcement assessor.
ambient-code/agentready#382: Prior work introducing/renaming enforcement and intent assessor foundations used here.
ambient-code/agentready#478: Earlier changes to the test-assess skill that this PR further modifies.

Suggested labels

released

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title follows Conventional Commits format with 'feat' type and clearly describes the main changes: enforcement and intent assessor improvements with ADR references.
Linked Issues check	✅ Passed	All coding requirements from `#461` are met: DeterministicEnforcementAssessor scoring reprioritized (agent hooks 60 pts, pre-commit/Husky 40 pts, threshold 40), DesignIntentAssessor enhanced to detect enforcement via hooks/skills/AGENTS.md with bonus scoring (10/15 pts), documentation updated, and starter hooks added.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with ADR A.5/A.8 objectives: assessor scoring/logic updates, documentation revisions, hook implementations, skill cleanup, and test coverage additions. No unrelated changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 95.45% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/461-enforcement-intent-improvements

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/461-enforcement-intent-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-29T13:55:31Z

📈 Test Coverage Report

Branch	Coverage
This PR	73.4%
Main	73.2%
Diff	✅ +0.2%

Coverage calculated from unit tests only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/agentready/assessors/patterns.py`:
- Around line 355-356: The current check in patterns.py awards
deterministic_score when hooks exist and doc_ref_pattern.search(hooks_str)
matches, which gives +15 for mere mentions; change the condition to require both
a design-doc reference and an enforcement-intent keyword in hooks_str (e.g.,
"must", "must not", "require", "required", "ensure", "update", "check",
"verify", "enforce") by creating/intending an intent_pattern and using
intent_pattern.search(hooks_str.lower()) &&
doc_ref_pattern.search(hooks_str.lower()) before setting deterministic_score
(refer to hooks, hooks_str, doc_ref_pattern, deterministic_score).

In `@src/agentready/assessors/testing.py`:
- Around line 797-806: The current check in the claude_settings block awards 60
points if the "hooks" key exists even when it's empty; change the condition that
awards points (the if that checks "hooks" in content) to verify the hooks object
is non-empty (e.g., content.get("hooks") is truthy and, for dict/list, has len()
> 0) before incrementing score and appending evidence; update the check around
claude_settings.exists(), content = json.loads(...), and the branch that
modifies score and evidence to only run when content["hooks"] contains at least
one entry.
- Around line 893-895: Update the remediation example and guidance text that
mentions .claude/settings.json so the PostToolUse entries use the nested hooks
schema expected by the repo: each PostToolUse should be an object like
{"hooks":[{"type":"command","command": ...}]} (i.e., include the hooks array and
type key), not a flat {"command": ...} entry; modify the example config
string(s) and the lines mentioning PostToolUse in the assessor (the example
block in testing.py that currently lists the two hooks) to show the nested
structure and note the "type":"command" wrapper so users copy a valid
.claude/settings.json configuration.

In `@tests/unit/test_assessors_patterns.py`:
- Around line 448-571: Add a new unit that mirrors
test_deterministic_enforcement_bonus but uses a .claude/settings.json hooks
entry that references "docs/design" without enforcement wording (e.g., command
"check-design-doc.sh" or a benign matcher) and assert that
DesignIntentAssessor().assess(repo) does NOT grant the +15 deterministic bonus:
check finding.score remains 50.0 (or equals the dir-only baseline) and that no
evidence contains "deterministic"; place the test next to other tests like
test_deterministic_enforcement_bonus and reference DesignIntentAssessor,
.claude/settings.json, and PreToolUse in the test name and docstring for
clarity.

In `@tests/unit/test_assessors_testing.py`:
- Around line 976-994: The test test_agent_hooks_score_higher_than_precommit
currently reuses the same repo so precommit_finding inherits the agent hooks;
update the test to measure pre-commit in isolation by creating a fresh repo (or
removing the .claude settings) before calling _make_repo for precommit_finding:
i.e., ensure DeterministicEnforcementAssessor.assess is invoked on a repo that
only has .pre-commit-config.yaml (no .claude/settings.json) so agent_finding and
precommit_finding are independent and the assertions reflect the intended 60 vs
40 comparison.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f301a2ef-3a06-4b7d-a0da-02ee22ca0dd1

📥 Commits

Reviewing files that changed from the base of the PR and between 333496d and d9295cc.

📒 Files selected for processing (7)

.claude/settings.json
.claude/skills/test-assess/SKILL.md
docs/attributes.md
src/agentready/assessors/patterns.py
src/agentready/assessors/testing.py
tests/unit/test_assessors_patterns.py
tests/unit/test_assessors_testing.py

- Require non-empty hook entries before awarding 60 pts (not just key presence) - Require both design-doc reference AND enforcement verb for deterministic bonus - Fix remediation example to use correct nested hook schema - Isolate agent vs pre-commit scoring test with separate repo paths - Add negative test for hooks mentioning design docs without enforcement verbs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jwm4 · 2026-05-29T16:58:40Z

Review

The scoring reprioritization (A.5) and design intent enforcement detection (A.8) are well-implemented. The two fix commits addressed CodeRabbit's round-1 feedback (enforcement verb pattern, empty hooks handling, nested hook schema in remediation). Test coverage is thorough with 12 new tests covering advisory, deterministic, precedence, skill-based detection, and edge cases.

Validated against real repos:

fastapi/fastapi (pre-commit only): deterministic_enforcement = 40/100 PASS. Correctly scores at the new threshold.
commercetools/merchant-center-application-kit (agent hooks + Husky): deterministic_enforcement = 100/100 PASS. Agent hooks (60) + Husky (40) correctly cap at 100.
ambient-code/agentready (main, pre-commit + settings without hooks): deterministic_enforcement = 50/100 PASS (40 + 10).

One suggestion (non-blocking):

The enforcement_verb_pattern in _check_design_enforcement (patterns.py:341-344) matches common words like "update", "check", "create" against the entire serialized hooks JSON blob. A real-world settings.json with multiple hooks will likely contain these words in unrelated commands (e.g., a formatter command containing "check"), so any repo that also mentions "design doc" anywhere in its hooks config could get the 15pt bonus regardless of whether a hook actually enforces design doc updates. Consider matching both patterns within the same hook entry rather than across the whole serialized blob. Low risk at 15pts on a T4 attribute, but worth tightening in a follow-up.

Posted by Bill Murdock with assistance from Claude Code.

# [2.46.0](v2.45.0...v2.46.0) (2026-05-29) ### Features * enforcement and intent assessor improvements (ADR A.5, A.8) ([#484](#484)) ([efe7507](efe7507))

github-actions · 2026-05-29T17:30:52Z

🎉 This PR is included in version 2.46.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

chore: add .agents/ and .codex/ to .gitignore

7858491

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot requested changes May 29, 2026

View reviewed changes

Comment thread src/agentready/assessors/patterns.py Outdated

Comment thread src/agentready/assessors/testing.py

Comment thread src/agentready/assessors/testing.py

Comment thread tests/unit/test_assessors_patterns.py

Comment thread tests/unit/test_assessors_testing.py

jwm4 merged commit efe7507 into main May 29, 2026
6 checks passed

jwm4 deleted the feat/461-enforcement-intent-improvements branch May 29, 2026 17:30

github-actions Bot pushed a commit that referenced this pull request May 29, 2026

chore(release): 2.46.0 [skip ci]

6470c9f

# [2.46.0](v2.45.0...v2.46.0) (2026-05-29) ### Features * enforcement and intent assessor improvements (ADR A.5, A.8) ([#484](#484)) ([efe7507](efe7507))

github-actions Bot added the released label May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enforcement and intent assessor improvements (ADR A.5, A.8)#484

feat: enforcement and intent assessor improvements (ADR A.5, A.8)#484
jwm4 merged 3 commits into
mainfrom
feat/461-enforcement-intent-improvements

jwm4 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Possibly related PRs

Suggested labels

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jwm4 commented May 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jwm4 commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

A.5: Hook scoring reprioritization

A.8: Design intent enforcement detection

Starter hooks

Related issues

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Possibly related PRs

Suggested labels

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📈 Test Coverage Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jwm4 commented May 29, 2026

Review

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jwm4 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading