Skip to content

[PM-36854] feat: Add generating-review-storybook skill (POC)#111

Draft
SaintPatrck wants to merge 4 commits into
mainfrom
feat/generating-review-storybook
Draft

[PM-36854] feat: Add generating-review-storybook skill (POC)#111
SaintPatrck wants to merge 4 commits into
mainfrom
feat/generating-review-storybook

Conversation

@SaintPatrck
Copy link
Copy Markdown
Contributor

@SaintPatrck SaintPatrck commented May 7, 2026

🎟️ Tracking

PM-36854

⚠️ POC — proof of concept exploring whether a static, self-contained storybook artifact can meaningfully reduce reviewer cognitive load on stacked AI-written PRs. Not intended for merge without further discussion on scope, hosting, and maintenance.

📔 Objective

Adds a new generating-review-storybook skill to the bitwarden-code-review plugin. The skill packages a stack of PRs (or commits) into a self-contained, double-clickable HTML walkthrough designed for humans reviewing AI-written code:

  • Verdict-first hierarchy — cover surfaces approve / approve-fix / block / pending before any diff
  • Narrative chapters — each PR is broken into 2–5 logical chapters with a narrative paragraph, not a flat alphabetical file list
  • Inline annotations as marginalia — Claude findings (from code-review-local) and optionally human reviewer threads (via gh api graphql reviewThreads) render at the diff line they reference
  • Copy-as-Markdown export — reviewer notes leave the storybook as a Markdown handoff

What ships

  • SKILL.md (1,087 words, ~106 lines) interviews the user, builds a config, and runs the scaffolder
  • scripts/scaffold.py — deterministic renderer; default output $CLAUDE_PLUGIN_DATA/storybooks/<slug>-<timestamp>/
  • scripts/capture_diffs.py — fetch PR/commit diffs as base64
  • scripts/parse_review_md.py — extract verdicts and findings from existing bitwarden-code-review summary files
  • scripts/fetch_pr_threads.py — pull existing GitHub review threads into the storybook comments[] shape (outdated skipped, resolved kept with _(resolved)_ suffix)
  • assets/template/ — bundled HTML/CSS/JS template
  • references/data-schema.md and references/customization.md

Why a POC

The storybook design makes several bets that should be tested with real reviewers before this graduates:

  • (1) verdict-first hierarchy actually reduces triage time vs. a flat PR list,
  • (2) narrative chapters are worth the synthesis cost on Claude's end,
  • (3) static-site-as-artifact is the right delivery mode rather than e.g. a Slack thread or GitHub Check.

Feedback on these is the explicit goal of the POC.

Example

Example storybook based on bitwarden/android/pull/6863

storybook-andriod-6863.zip

📸 Screenshots

Screen.Recording.2026-05-07.at.4.57.20.PM.mp4

Packages a stack of PRs or commits into a self-contained HTML walkthrough
for humans reviewing AI-generated code. Verdict-first cover, severity-grouped
findings (with file:line locations and suggestions), per-PR diffs as drill-down
detail, inline notes, and copy-as-Markdown export. Optional pre-baking from
existing bitwarden-code-review summary files via parse_review_md.py.

Bumps plugin to 1.11.0; adds SKILL.md, three Python scripts (scaffold,
capture_diffs, parse_review_md), data-schema and customization references,
bundled HTML/CSS/JS template, and CHANGELOG entry.
… annotations in review storybooks

The objective is to enhance the `generating-review-storybook` skill to provide a structured, scene-based walkthrough of code changes. This replaces the flat, per-PR file list with a narrative-driven experience where findings and comments are rendered inline.

### Behavioral Changes
*   **Narrative-First Navigation:** Storybooks now partition PRs into logical "Chapters" and "Scenes," each with its own page, title, and descriptive narrative to guide the reviewer.
*   **Inline Annotations:** AI-generated findings and human comments are now rendered as "gloss" marginalia directly at the relevant diff lines, rather than being grouped in a separate section.
*   **Enhanced Triage:** The file index and table of contents now include "severity dots" (visual indicators) to show where critical issues or comments exist within specific files.
*   **Walkthrough Hierarchy:** The cover page now displays a structured hierarchy of chapters and scenes instead of a simple PR list.

### Specific Changes

#### Core Logic and Scaffolding
*   **`scaffold.py`**: Introduced `build_page_order` to map the nested PR/Chapter/Scene structure into a linear sequence of HTML pages. Added validation for the new `chapters` and `comments` schema fields.
*   **`app.js.tmpl`**: Refactored the rendering engine to support multi-page walkthroughs. Implemented `buildAnnotations` to group findings and comments by file and line number for inline rendering.
*   **`data-schema.md`**: Updated the configuration schema to include `chapters` (logical groupings with titles and narratives) and `comments` (human reviewer feedback).

#### UI and Styling
*   **`styles.css`**: Added comprehensive styles for the new walkthrough components, including `.chapter-intro`, `.scene-page`, and the `.gloss` annotation system. Replaced the old card-based finding styles with a marginalia-inspired design.
*   **File Row Enhancements**: Updated `buildFileRow` to render status dots that reflect the severity and quantity of annotations within a file.
*   **Inline Gloss**: Implemented `renderGloss` to display AI findings (severity-colored) and human comments (accent-colored) with author metadata and suggested fixes.

#### Agent Guidance
*   **`SKILL.md`**: Updated the AI agent instructions to prioritize "synthesizing chapters." The agent is now directed to group files logically and provide narrative context that explains the *intent* of changes rather than just describing file modifications.
*   **`CHANGELOG.md`**: Noted the shift to a verdict-first hierarchy and inline marginalia for improved reviewer triage.
…en gh_repo

Address two findings from local review of the generating-review-storybook
skill:

- fetch_pr_threads.py was on disk but never referenced from SKILL.md, while
  the docs separately claimed real-PR-thread fetching was a "future
  integration." Wire it in as optional step 3b, update SKILL.md preamble,
  and reflect it as a shipping producer in references/data-schema.md.
- gh_repo was the only user-supplied config value flowing into the
  app.js.tmpl JS template literal without escaping or validation. Add an
  owner/name regex check in scaffold.load_config so a malformed value is
  rejected up front, matching the rigor already applied to PR numbers in
  capture_diffs.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant