Skip to content

feat(design-audit): make reference-grounded redesign job-first, not aesthetic-first#124

Merged
drewstone merged 1 commit into
mainfrom
feat/job-first-redesign-engine
Jun 23, 2026
Merged

feat(design-audit): make reference-grounded redesign job-first, not aesthetic-first#124
drewstone merged 1 commit into
mainfrom
feat/job-first-redesign-engine

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Problem

The reference-grounded redesign engine grounded every page in a world-class exemplar's visual DNA and the ranker judged visual craft — so it optimized for "looks tasteful," not "serves the user's task." On functional pages that regressed them into generic brochures:

  • docs (python tutorial) → lost its table of contents, prev/next nav, and dense code reference for a hero + two marketing cards.
  • aggregator (HN) → dropped from 30 stories to 9 with editorial serif + whitespace.
  • dashboard (githubstatus) → shed services into a few spacious cards.

A blind LLM-judge panel scored these "redesigns" as decisive wins — because LLM judges share the aesthetic LLMs generate, and the rubric discounted density. The metric was an AI judging AI; the redesigns would make real apps worse.

Fix — job-first, not aesthetic-first

  • Generator (reference/generate/prompt.ts): persona art director → product designer. Hard rules in priority order: task-first → preserve functional affordances (never delete navigation/ToC/search to look cleaner) → preserve density where it is the value (docs/dashboards/feeds keep their item count) → right-size the intervention (never turn one kind of page into another) → the exemplar is visual craft only, never a structural template.
  • Functional contract: a per-page preservation block derived from the page's own measured DNA (nav-affordance count, layout density, archetype) — concrete, data-driven, and density is required only when the page is measured dense (a genuinely sparse page is never forced dense).
  • Ranker/judge (reference/judge/prompt.ts): scores task fitness + functional preservation before visual craft; a polished direction that strips nav or density loses.

Validation — re-ran the regressed pages with the fixed engine

Not an AI beauty panel — a check of whether the brief now keeps what makes each page work:

page before (aesthetic-first) after (job-first)
docs deleted ToC; 2 marketing cards + hero full ToC sidebar + breadcrumb + prev/next + dense code examples; craft applied within the docs structure
HN 9 stories, editorial serif all 30 stories, nav + voting kept, no motion, framed as "rapid scanning at max density"
dashboard services spaced into cards dense service grid, all services in the brief, real uptime values, nav kept

Gates: tsc --noEmit clean · check:boundaries pass · full suite 1898 pass (2 pre-existing telemetry-rollup-remote nvm-shim failures, unrelated). Regression tests added across generator + judge.

Changeset: minor.

…esthetic-first

The engine grounded every page in an exemplar's visual DNA and judged on visual
craft, so it regressed functional pages into generic brochures (docs lost its
table of contents + density for marketing cards; an aggregator dropped 30 items
to 9; a dashboard shed services into spacious cards).

- generate/prompt.ts: persona art director -> product designer; hard rules in
  priority order — task-first, preserve affordances (never delete nav/ToC),
  preserve density where it is the value, right-size (don't reskin a tool into a
  landing page), exemplar is craft-only not a template. Plus a data-driven
  FUNCTIONAL CONTRACT derived from the page's own DNA (nav count, density,
  archetype) — density required only when the page is measured dense.
- judge/prompt.ts: score task fitness + functional preservation before visual
  craft; a polished direction that strips nav or density loses.

Validated by re-running the regressed pages: docs keeps its ToC + nav + dense
code; HN keeps all 30 stories; the dashboard stays a dense service grid.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — a31d2d42

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T23:01:30Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 2 (1 medium-concern, 1 low)
Heuristic 0.0s
Duplication 0.0s
Interrogation 117.2s (2 bridge agents)
Total 117.2s

💰 Value — sound

Reframes the reference-grounded redesign engine from aesthetic-first to job-first via system-prompt rules and a data-driven functional contract, with matching judge priority order; tested, linted, and changeset'd.

  • What it does: Changes two prompt builders and their tests. In src/design/audit/reference/generate/prompt.ts, the generator persona shifts from "world-class art director" to "senior product designer", the exemplar is explicitly craft-only (not a structural template), and hard rules are reordered so task fitness and preserving functional affordances/density outrank visual polish. It also adds `renderFunctionalC
  • Goals it achieves: Stops the redesign engine from regressing functional pages (docs, dashboards, aggregators) into generic, sparse marketing pages when grounded against tasteful but structurally different exemplars. It makes "keep navigation/ToC/density" an explicit, measured constraint rather than an optional aesthetic preference, and re-aligns the LLM judge so it does not reward prettier-but-less-usable directions
  • Assessment: Good change. It is narrowly scoped to the prompt layer, builds directly on existing DesignDNA fields and the pure prompt-builder pattern already in the codebase, adds regression tests, includes a changeset, and passes pnpm lint and pnpm check:boundaries. The data-driven density gating (density === 'dense') avoids forcing sparse pages to stay sparse, which matches the stated problem.
  • Better / existing approach: none — this is the right approach. I checked the existing rubric/anchor system (src/design/audit/rubric/anchors/docs.yaml, src/design/audit/rubric/anchors/dashboard.yaml, src/design/audit/rubric/fragments/type-docs.md, src/design/audit/rubric/rollup-weights.ts) and it already encodes page-type priorities, but rubricBody is optional in both GenerationContext and JudgePairInput, so the
  • Model: opencode/kimi-for-coding/k2p7
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

A well-integrated job-first reframe of the redesign+judge prompts that fixes a real AI-judging-AI regression; the one data-driven piece (density gate) is keyed to a signal that misclassifies the cited content-dense pages.

  • Integration: Fully reachable. buildDirectionPrompt is consumed by generate/generator.ts:78; buildPairwisePrompt/buildQualityPrompt by judge/text-judge.ts:46-47 and judge/vision-judge.ts:162-163. The new renderFunctionalContract is a private helper invoked inline at prompt.ts:193 — no new public surface, no dead code.
  • Fit with existing patterns: Follows the established pure-prompt-builder grain exactly. renderFunctionalContract mirrors the existing renderConstraints/renderExemplarBlock helpers (deterministic, pure, section-joined). All referenced DNA fields (layout.density, layout.archetype, components.nav) exist as required fields on DesignDNA (contracts.ts:271,277,288); Density is 'sparse'|'balanced'|'dense' (contracts.ts:85). No compet
  • Real-world viability: Mostly holds: persona reframe, unconditional system-prompt priority rules, nav preservation, and judge priority ordering are signal-independent and will fire on every call. The density gate does not — see finding.
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 1

🔎 Heuristic Signals

🟡 Cruft: todo added src/design/audit/reference/generate/prompt.ts

  • ' facts or use placeholders like "TODO" or "lorem ipsum".',

🎯 Usefulness Audit

🟠 Density gate is keyed to component-pattern variety, not information density — no-ops on the cited regressions [problem-fit] ``

renderFunctionalContract gates the 'This page is DENSE: keep at least as many items/rows' directive on density === 'dense' (prompt.ts:139). But layout.density is derived at dna/derive.ts:308 as deriveDensity(buttons+inputs+cards+nav) — and deriveDensity (line 109-118) is called WITHOUT whitespaceRatio (hardcoded undefined at derive.ts:343), so it runs purely off distinctComponentCount, which returns set.size of component FINGERPRINTS (line 226) = pattern variety, not instance count. A page needs


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260623T232433Z

@drewstone drewstone merged commit a2055b2 into main Jun 23, 2026
5 checks passed
@drewstone drewstone deleted the feat/job-first-redesign-engine branch June 23, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants