Skip to content

ci(claude-review): submit a formal PR review (verdict + summary) via the Reviews API#2391

Merged
causten merged 4 commits into
developfrom
users/umayadav/claude-review-formal-review
May 29, 2026
Merged

ci(claude-review): submit a formal PR review (verdict + summary) via the Reviews API#2391
causten merged 4 commits into
developfrom
users/umayadav/claude-review-formal-review

Conversation

@umangyadav
Copy link
Copy Markdown
Member

@umangyadav umangyadav commented May 29, 2026

Summary

Switch the bot's overall review from a stand-alone issue comment to a formal pull-request review (gh pr review --comment) carrying a new verdict field (APPROVE / REQUEST_CHANGES / COMMENT) plus the Markdown summary.

Every verdict is unconditionally submitted as --comment (advisory). This prevents the bot from ever ticking a branch-protection approving-review slot or leaving a sticky CHANGES_REQUESTED block (the gh-aw#27655 / PR #27662 wedge). The model's verdict is preserved in a computed body header. Full rationale in CLAUDE_AUTO_REVIEW.md §13.

Also adds defense-in-depth sanitizer re-checks (verdict enum, summary non-empty string, every string-typed field really is a string or null) to close a redaction-bypass class where a schema regression could partially leak values via jq's truncated error message.

Test plan

  • Sanitizer fixture suite: 148/148 PASS (121 → 148, +27 covering verdict / summary / field-type / redaction).
  • End-to-end smoke with stubbed gh: COMMENT initial, APPROVE re-review with all 3 thread-update types, REQUEST_CHANGES + suggestion fence, bad-verdict defensive arm, 422 soft-skip, 500 hard-failure resilience.
  • shellcheck clean on the modified scripts.
  • CI: validate on first claude-review label after merge.

…the Reviews API

Post the model's verdict + Markdown summary as a formal pull-request
review (`gh pr review --comment`) instead of a stand-alone issue
comment. The verdict is unconditionally submitted as `--comment`
(advisory) so the bot can never tick a branch-protection approving-
review slot or leave a sticky CHANGES_REQUESTED block (the
github/gh-aw#27655 + PR #27662 wedge for the same class of tool).

The review body has a deterministic header computed from `verdict` +
finding counts; the model owns only the `summary` section
(`## Scope` / `## Findings` / `## Notes` / `## CI status`). The header
label switches from `Findings:` to `New findings:` on re-review runs
so a fully-resolved PR cannot be misread as "always clean".

Defense-in-depth sanitizer re-checks of schema-expressible fields:
the `verdict` enum; that `summary` is a non-empty string; that every
other string-typed field (`inline_comments[].body` / `.suggestion`,
`thread_updates[].body`) actually is a string (or null). Closes a
redaction-bypass class where a future schema regression slipping a
non-string into the later `utf8bytelength` / `test` / `contains` /
`split` scans would partially leak the value via jq's truncated
error message. Fail-closed with a fixed-text error; no model-
controlled content reaches the public Actions log.

Sanitizer fixture suite: 121 -> 148 (verdict enum, summary
type, field-type checks, plus redaction pins on each rejection path).
Architecture, threat model, and porting checklist in
CLAUDE_AUTO_REVIEW.md sections 8, 12, 13.

Co-authored-by: Cursor <cursoragent@cursor.com>
@umangyadav umangyadav requested a review from causten as a code owner May 29, 2026 15:44
@rocmlir-pr-reviewer rocmlir-pr-reviewer Bot added the modifies-ci-paths PR modifies the Claude review CI security perimeter; audit before applying claude-review label May 29, 2026
@rocmlir-pr-reviewer
Copy link
Copy Markdown

⚠️ This PR modifies the Claude-review CI security perimeter

The following files in this PR control whether and how the
claude-review workflow protects its LLM Gateway secrets at
runtime (see .github/workflows/CLAUDE_AUTO_REVIEW.md):

  • .claude/skills/review-rocmlir-pr/SKILL.md
  • .claude/skills/update-pr-review/SKILL.md
  • .github/scripts/post_claude_review.sh
  • .github/scripts/sanitize_claude_actions.sh
  • .github/scripts/tests/test_sanitize.sh
  • .github/workflows/CLAUDE_AUTO_REVIEW.md
  • .github/workflows/claude_auto_review.yml

Before applying the claude-review label on this PR, please:

  1. Audit the diff in these paths line-by-line. A malicious or
    accidental change could disable the --allowedTools
    restriction, the overlay step, the sanitizer, or the
    review/post job split -- any of which would expose the
    secrets in env to the PR-modified workflow.
  2. If the changes are legitimate and you want a Claude review,
    do NOT apply the claude-review label. Instead, run via
    Actions → Claude Auto Review → Run workflow and enter
    this PR's number. The dispatch path runs from the trusted,
    code-owner-approved version of the workflow on
    develop, so a malicious PR-side
    modification cannot affect the run.

(This banner is automated. The modifies-ci-paths label
will be removed automatically if a future push removes the
perimeter modifications. The Layer-3 in-workflow guard will
additionally fail the claude-review label-triggered run
on this PR if the label is applied while perimeter changes
are present.)

…n-0 comments

Address PR review feedback (#2391):

1. The COMMENT-case comment on line 299 read "model verdict matches event",
   which could be misread as "the bot's submitted event sometimes varies".
   Reword to be explicit that the bot always submits as COMMENT, so a
   model verdict of COMMENT specifically needs no annotation.

2. The defensive *) arm's `return 0` is essential flow control (skipping
   the body-build / `gh pr review --comment` submission below), not just
   an exit-status statement. Without it a sanitizer-bypassed bad verdict
   would land verbatim in the rendered review header. Add a clarifying
   comment so a future reader doesn't mistake it for dead code.

No behavioural change. Smoke + sanitizer fixture suite (148/148) unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Claude auto-review pipeline to submit its result as a formal pull-request review (via the Reviews API / gh pr review --comment) instead of a top-level PR conversation comment, while adding a model-emitted verdict field (APPROVE / REQUEST_CHANGES / COMMENT) that is displayed in a deterministic header but always submitted as an advisory COMMENT review. It also adds defense-in-depth sanitizer checks (verdict enum, non-empty summary, and string-typed field enforcement) plus expanded sanitizer fixtures.

Changes:

  • Switch posting behavior from “top-level summary comment” to “formal PR review submission” (always --comment), including a computed header showing verdict and finding counts.
  • Extend the structured output contract to require verdict and a non-empty Markdown summary, and mirror that across workflow schema + skills docs.
  • Harden sanitize_claude_actions.sh against schema-regression redaction bypasses and expand sanitizer test coverage accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
.github/workflows/claude_auto_review.yml Updates the model JSON schema to require verdict, updates prompt guidance, and renames the post step to reflect review submission.
.github/workflows/CLAUDE_AUTO_REVIEW.md Documents the new “formal review” behavior, COMMENT-only submission policy, and updated output contract/security rationale.
.github/scripts/tests/test_sanitize.sh Adds extensive fixtures for verdict/summary validation, type enforcement, and stderr redaction guarantees.
.github/scripts/sanitize_claude_actions.sh Adds verdict/summary validation and string-type defense-in-depth checks before later jq string operations.
.github/scripts/post_claude_review.sh Replaces the prior top-level summary comment with gh pr review --comment, adding a deterministic header and preserving marker attribution.
.claude/skills/update-pr-review/SKILL.md Updates the required output schema/rules to include verdict and Markdown summary sections.
.claude/skills/review-rocmlir-pr/SKILL.md Updates the output schema/rules to include verdict and the new structured Markdown summary expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/CLAUDE_AUTO_REVIEW.md Outdated
**Review body layout** (built by `post_claude_review.sh`, NOT the model):

```markdown
**Verdict:** APPROVE -- submitted as COMMENT (automated reviews are advisory) · **Findings:** N (C Critical, J Major, K Minor)
@rocmlir-pr-reviewer
Copy link
Copy Markdown

Scope

CI-only change: switches the bot's overall review from a stand-alone issue comment to a formal pull-request review (gh pr review --comment) carrying a new verdict field, and adds defense-in-depth sanitizer re-checks for the verdict enum, the summary non-empty-string contract, and the string-typedness of inline_comments[].body/.suggestion and thread_updates[].body. Touches 7 files under .github/ and .claude/skills/ only -- no MLIR / Rock / C++ surface.

Findings

No blocking issues found.

Notes

  • post_review correctly returns 0 (but sets HAD_FAILURE=1) on an unknown verdict, leaving inline comments and thread updates already posted and surfacing the failure at script exit -- the right behaviour given the post-step ordering.
  • The all(... ; . == null or type == "string") predicate is vacuously true on empty arrays, so an APPROVE with no inline comments and no thread updates still passes the new type re-check, as intended.
  • COMMENT-only submission is implemented as a single hardcoded rule in post_review with no env/workflow toggle, matching the §13 threat-model rationale (a misconfiguration cannot flip it).
  • One previously-noted Copilot comment (header doc example uses regular spaces / letter placeholders while the script outputs &nbsp;·&nbsp; / numeric counts) is doc-only and already raised in a prior thread; not duplicating it here.

CI status

No non-self CI failures. Jenkins / Build-and-Test / MIGraphX still pending; sanitizer-fixture-suite passing (148/148 per PR body).

umangyadav and others added 2 commits May 29, 2026 16:04
…ew, not thread_updates.length

Address PR review feedback (#2391): the post-review header label
switched from `Findings:` to `New findings:` when `thread_updates`
was non-empty. That heuristic missed a real re-review state: on a
re-run where every prior thread was suppressed by its dedup gate
(Scenario D non-regression -- still present, no human reply, last
bot reply was clarify/null; plus Scenarios A/B/C dedup-gated as
already-resolved or already-clarified), the model emits
inline_comments == [] AND thread_updates == [], the header reads
`Findings: 0`, and a maintainer can misread the PR as "always
clean" -- exactly what the New-findings: switch exists to prevent.

Fix: derive the signal in the pre-fetch step (which already has
prev_comments.json in hand) using the same BOT_LOGIN + marker +
root-comment filter the prompt's Step 1 uses, and stash
`is_re_review` on meta.json (already uploaded as an artifact). The
post job reads it via `.is_re_review // false`, so a workflow
rollback between runs is still safe.

The two derivations (pre-fetch jq and prompt Step 1) MUST stay
byte-for-byte identical; added a row in the doc §15 sync table.

Verified: - 4 smoke cases (initial / dedup-gated re-review / missing
    field fallback / re-review with updates emitted) all render
    the correct header.
  - 7 unit tests on the pre-fetch jq derivation against mock
    prev_comments.json shapes (empty, only humans, bot reply
    sub-thread, bot root w/o marker, bot root w/ marker, mixed,
    wrong bot identity) all PASS.
  - Sanitizer fixture suite still 148/148.
Co-authored-by: Cursor <cursoragent@cursor.com>
Address Copilot review feedback (#2391): the `markdown`-fenced example
of the review body layout drifted from what `post_claude_review.sh`
actually writes:
  - Dot separator: doc had `  ·  ` (2 ASCII spaces + middle-dot
    + 2 ASCII spaces); the script writes `&nbsp;·&nbsp;` (literal
    HTML entities around the middle-dot). The rendered output looks
    the same, but a maintainer grepping a real review body for ` · `
    would never match -- the doc's `markdown` fence claims to show
    source, so showing source-that-isn't is misleading.
  - Per-severity counts: doc used opaque placeholders `N (C Critical,
    J Major, K Minor)`; the script always writes concrete integers
    (`%d`). The `C/J/K` letters aren't mnemonic (`J`, `K` have no
    relation to "Major"/"Minor") and the APPROVE verdict in the
    example is inconsistent with non-zero counts (the decision rule
    says zero findings -> APPROVE).

Switch the example to a self-consistent REQUEST_CHANGES + 3 (1/1/1)
that also exercises the `submitted as COMMENT` annotation, and add
the helper sentence "Numbers shown are illustrative" so a future
reader doesn't read the integers as a literal template.

Verified byte-for-byte equality of the doc example against the
script's `printf` output. No code change; sanitizer suite still
148/148.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

@causten causten merged commit 80b3fe0 into develop May 29, 2026
7 of 16 checks passed
@causten causten deleted the users/umayadav/claude-review-formal-review branch May 29, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

modifies-ci-paths PR modifies the Claude review CI security perimeter; audit before applying claude-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants