Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion internal/cli/cli_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -746,7 +746,10 @@ func TestShippedWorkflowSkillsValidate(t *testing.T) {
"dev-architecture", "dev-epics-stories", "dev-readiness-check",
"dev-sprint", "dev-retrospective",
}
for _, name := range workflowSkills {
// The shipped discipline skills (test-first, verification, review, refactor) must
// validate too — a regression guard for their required headings (e.g. ## Goal).
disciplineSkills := []string{"tdd-cycle", "verify-change", "pr-review", "safe-refactor"}
for _, name := range append(workflowSkills, disciplineSkills...) {
code, out, errOut := run(t, "skill", "validate", name, "--root", dir)
if code != 0 {
t.Errorf("shipped skill %q failed validation (code=%d):\n%s%s", name, code, out, errOut)
Expand Down
4 changes: 2 additions & 2 deletions internal/cli/update_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,8 @@ func TestUpdatePreservesManagedSkillModelEffort(t *testing.T) {
skill := filepath.Join(dir, ".claude", "skills", "verify-change", "SKILL.md")
withOverrides := strings.Replace(
readFile(t, skill),
"description: Run the project's executable checks (tests, lint, typecheck, build) and produce evidence. Use before reporting a task complete or before opening a PR.\n---",
"description: Run the project's executable checks (tests, lint, typecheck, build) and produce evidence. Use before reporting a task complete or before opening a PR.\nmodel: sonnet\neffort: high\n---",
"description: Use before claiming a task is complete, a fix works, or tests pass — and before opening a PR. Turns assertions into evidence from the project's own checks.\n---",
"description: Use before claiming a task is complete, a fix works, or tests pass — and before opening a PR. Turns assertions into evidence from the project's own checks.\nmodel: sonnet\neffort: high\n---",
1,
)
if err := os.WriteFile(skill, []byte(withOverrides), 0o644); err != nil {
Expand Down
35 changes: 34 additions & 1 deletion internal/templater/templates/skills/pr-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
@@ -1,14 +1,26 @@
---
name: pr-review
description: Review the current diff, commit it once review is clean, and push (tests run at push time). Use when a task slice is complete and ready to commit and raise/update a PR.
description: Use when a task slice is complete and ready to be committed and raised (or updated) as a PR. Takes a reviewed slice from clean to pushed, with evidence.
---

# PR Review

## Goal

Take a completed slice from "reviewed" to "pushed", with evidence. The
adversarial review happens first; the commit is written only after review is
clean; tests run at the moment of pushing via the git `pre-push` hook.

## The Iron Law

> **The order is fixed and unskippable: review clean → commit → push (gate runs) → PR. No
> commit with an open Blocker; no `git push --no-verify`, ever.**

No exceptions:
- The reviewer sub-agents are read-only — you apply fixes, then re-review until clean.
- A red pre-push gate is fixed, not bypassed.
- A PR without spec links and verification evidence is incomplete.

## Default Operation Mode

Direct, gated. The order is fixed: **review → commit → push → PR**. Do not
Expand Down Expand Up @@ -48,6 +60,27 @@ commit before review is clean; do not push before the commit exists.
- When a GitHub MCP server is connected, prefer its tools for reading PR/issue/Actions
state (status, review threads, CI results) over shelling to `gh` — same data, typed.

## Rationalizations

Every one of these is a signal to stop, not a reason to skip. **Violating the letter of
the rule is violating its spirit.**

| Excuse | Reality |
|---|---|
| "The change is trivial, skip the reviewer." | Trivial changes carry the bugs you stopped looking for. Run code-reviewer. |
| "I'll fix the Blocker in a follow-up." | A Blocker blocks the commit. Resolve it first. |
| "The pre-push gate is flaky, I'll `--no-verify`." | Bypassing the gate pushes red. Read the failure and fix it. |
| "I'll write the PR body later." | A PR without spec links + evidence is incomplete. Fill it now. |

## Red Flags — STOP

- You're about to commit before review is clean.
- A Blocker is still open.
- You're reaching for `git push --no-verify`.
- Your PR body has no spec/task links or no verification output.

If any fire, stop and return to the fixed order above.

## Gotchas

- Never commit with an open Blocker, and never `git push --no-verify` to skip the gate.
Expand Down
35 changes: 34 additions & 1 deletion internal/templater/templates/skills/safe-refactor/SKILL.md.tmpl
Original file line number Diff line number Diff line change
@@ -1,13 +1,25 @@
---
name: safe-refactor
description: Restructure code without changing its observable behavior, under a green test suite. Use when improving structure/naming/duplication separately from feature work.
description: Use when improving structure, naming, or duplication separately from feature work — restructuring code without changing its observable behavior, under a green test suite.
---

# Safe Refactor

## Goal

Refactoring changes structure, never behavior. The test suite is the contract;
it must stay green at every step.

## The Iron Law

> **A refactor changes structure, never observable behavior. The suite is green before the
> first step and after every step; any behavior diff is a failure, not progress.**

No exceptions:
- No behavior change sneaks in under "refactor" — if you must change behavior, stop and open a separate task/spec.
- Red after a step means revert that step, not fix-forward.
- If it needs a new test for new behavior, it isn't a refactor.

## Default Operation Mode

Direct, incremental. Small steps, each verified.
Expand All @@ -32,6 +44,27 @@ Direct, incremental. Small steps, each verified.
- Run the full relevant suite. Behavior outputs must be identical to before.
- The public API/contract must be unchanged unless the task explicitly authorizes it.

## Rationalizations

Every one of these is a signal to stop, not a reason to skip. **Violating the letter of
the rule is violating its spirit.**

| Excuse | Reality |
|---|---|
| "While I'm in here, I'll fix this bug too." | That's a behavior change. Stop and open a separate task. |
| "The test went red but my change is better, I'll fix forward." | Red means you changed behavior. Revert the step. |
| "This needs a new test for the new behavior." | Then it isn't a refactor. New behavior = new spec/task. |
| "Coverage is thin but I know it's safe." | Without a safety net you can't prove parity. Add characterization tests first. |

## Red Flags — STOP

- You're editing behavior under the banner of "refactor".
- You're fixing forward after a step went red.
- You're adding tests for new behavior.
- You're touching unrelated code you happened to pass through.

If any fire, stop and return to the workflow above.

## Gotchas

- No behavior changes sneak in under "refactor". If you must change behavior, stop and open a separate task/spec.
Expand Down
39 changes: 38 additions & 1 deletion internal/templater/templates/skills/tdd-cycle/SKILL.md.tmpl
Original file line number Diff line number Diff line change
@@ -1,13 +1,27 @@
---
name: tdd-cycle
description: Execute one small red→green→refactor cycle for a single task from specs/*/tasks.md. Use for behavior tasks under the test-first flows (development_flow tdd or tdd-e2e); under the unit flow the implementer covers behavior with a same-task test instead of this RED-first cycle.
description: Use for a single behavior task under the test-first flows (development_flow tdd or tdd-e2e), when driving one task from specs/*/tasks.md to done. Not for the unit flow — there the implementer covers behavior with a same-task test, not this RED-first cycle.
---

# TDD Cycle

## Goal

Use this skill for **exactly one task at a time**, driving it through
red → green → refactor.

## The Iron Law

> **Under `tdd`/`tdd-e2e`, production code does not exist until a test has failed for the
> expected reason. The first artifact you produce is a failing test — and no task is marked
> `[x]` until `csdd spec test-report` has recorded it green.**

No exceptions:
- Don't write the implementation "just to see if it works" and back-fill the test.
- Don't keep untested code "as a reference" while you write the test — delete it and start test-first.
- Don't mark the task done now and record the evidence later.
- One task per cycle — don't batch.

## When this applies

This is the **test-first** cycle for specs whose `development_flow` is `tdd` or
Expand Down Expand Up @@ -80,6 +94,29 @@ test, not production code.
- Mark **only** the task you just completed — never another task. Leave blocked or
unfinished tasks unchecked so `tasks.md` (and the dashboard) reflect reality.

## Rationalizations

Every one of these is a signal to stop, not a reason to skip. **Violating the letter of
the rule is violating its spirit.**

| Excuse | Reality |
|---|---|
| "It's a one-line change, I'll add the test after." | A test written after the code passes immediately and proves nothing. Writing it first takes 30 seconds. |
| "I already manually checked it works." | Manual checks are ad-hoc and vanish; the recorded test is the contract that survives. |
| "Writing the test first is slower." | Slower than typing the code, faster than debugging untested code later. |
| "The test passed on the first run." | Then it tests existing behavior, not your change. Make it fail first. |
| "I'll batch these three tasks to save time." | Batching hides which test covers which behavior. One task per cycle. |

## Red Flags — STOP

- You're about to write implementation before a failing test exists.
- The test passed on its very first run.
- You can't quote the failure message from the RED step.
- You're thinking "I'll record the csdd evidence later."
- You're about to mark more than one task `[x]`.

If any fire, stop and return to the workflow above.

## Gotchas

- One task per invocation. Do not batch tasks "to save time".
Expand Down
36 changes: 35 additions & 1 deletion internal/templater/templates/skills/verify-change/SKILL.md.tmpl
Original file line number Diff line number Diff line change
@@ -1,13 +1,25 @@
---
name: verify-change
description: Run the project's executable checks (tests, lint, typecheck, build) and produce evidence. Use before reporting a task complete or before opening a PR.
description: Use before claiming a task is complete, a fix works, or tests pass — and before opening a PR. Turns assertions into evidence from the project's own checks.
---

# Verify Change

## Goal

Turn "I think it works" into evidence a reviewer can trust. This skill runs the
project's own checks and reports their real output.

## The Iron Law

> **No "done", "it works", or "tests pass" claim without fresh output from the project's
> real commands, read and confirmed in this session.**

No exceptions:
- A prior run doesn't count — edits since then may have broken it.
- "Should pass" is a prediction, not evidence.
- A skipped or failing check appears in the report. Never hide it.

## Default Operation Mode

Direct execution. The deliverable is a verification block with the exact
Expand Down Expand Up @@ -42,6 +54,28 @@ and top error flows through the Playwright MCP server. If the project has no
frontend, or the Playwright MCP server is not connected, state that this step was
skipped and why — do not fabricate a result.

## Rationalizations

Every one of these is a signal to stop, not a reason to skip. **Violating the letter of
the rule is violating its spirit.**

| Excuse | Reality |
|---|---|
| "It should pass now." | "Should" is a prediction, not evidence. Run it. |
| "It compiled, so it works." | Compiling is not passing. Different checks catch different failures. |
| "The linter passed, the build's fine." | Linter ≠ compiler ≠ tests. Run each one. |
| "I ran it a few edits ago." | Edits since then may have broken it. Fresh run only. |
| "I'm confident it's right." | Confidence is not output. Paste the output. |

## Red Flags — STOP

- You're reaching for "should", "probably", or "seems to".
- You feel satisfaction before running anything ("Done!", "Perfect!").
- You're about to open a PR without a fresh run.
- You're trusting a previous run, or an agent's "success", instead of output you just saw.

If any fire, stop and run the real command.

## Gotchas

- Use the project's actual commands; a wrong guessed command that "passes" is worse than none.
Expand Down
Loading