feat(design-audit): make reference-grounded redesign job-first, not aesthetic-first by drewstone · Pull Request #124 · tangle-network/browser-agent-driver

drewstone · 2026-06-23T23:01:23Z

Problem

The reference-grounded redesign engine grounded every page in a world-class exemplar's visual DNA and the ranker judged visual craft — so it optimized for "looks tasteful," not "serves the user's task." On functional pages that regressed them into generic brochures:

docs (python tutorial) → lost its table of contents, prev/next nav, and dense code reference for a hero + two marketing cards.
aggregator (HN) → dropped from 30 stories to 9 with editorial serif + whitespace.
dashboard (githubstatus) → shed services into a few spacious cards.

A blind LLM-judge panel scored these "redesigns" as decisive wins — because LLM judges share the aesthetic LLMs generate, and the rubric discounted density. The metric was an AI judging AI; the redesigns would make real apps worse.

Fix — job-first, not aesthetic-first

Generator (reference/generate/prompt.ts): persona art director → product designer. Hard rules in priority order: task-first → preserve functional affordances (never delete navigation/ToC/search to look cleaner) → preserve density where it is the value (docs/dashboards/feeds keep their item count) → right-size the intervention (never turn one kind of page into another) → the exemplar is visual craft only, never a structural template.
Functional contract: a per-page preservation block derived from the page's own measured DNA (nav-affordance count, layout density, archetype) — concrete, data-driven, and density is required only when the page is measured dense (a genuinely sparse page is never forced dense).
Ranker/judge (reference/judge/prompt.ts): scores task fitness + functional preservation before visual craft; a polished direction that strips nav or density loses.

Validation — re-ran the regressed pages with the fixed engine

Not an AI beauty panel — a check of whether the brief now keeps what makes each page work:

page	before (aesthetic-first)	after (job-first)
docs	deleted ToC; 2 marketing cards + hero	full ToC sidebar + breadcrumb + prev/next + dense code examples; craft applied within the docs structure
HN	9 stories, editorial serif	all 30 stories, nav + voting kept, no motion, framed as "rapid scanning at max density"
dashboard	services spaced into cards	dense service grid, all services in the brief, real uptime values, nav kept

Gates: tsc --noEmit clean · check:boundaries pass · full suite 1898 pass (2 pre-existing telemetry-rollup-remote nvm-shim failures, unrelated). Regression tests added across generator + judge.

Changeset: minor.

…esthetic-first The engine grounded every page in an exemplar's visual DNA and judged on visual craft, so it regressed functional pages into generic brochures (docs lost its table of contents + density for marketing cards; an aggregator dropped 30 items to 9; a dashboard shed services into spacious cards). - generate/prompt.ts: persona art director -> product designer; hard rules in priority order — task-first, preserve affordances (never delete nav/ToC), preserve density where it is the value, right-size (don't reskin a tool into a landing page), exemplar is craft-only not a template. Plus a data-driven FUNCTIONAL CONTRACT derived from the page's own DNA (nav count, density, archetype) — density required only when the page is measured dense. - judge/prompt.ts: score task fitness + functional preservation before visual craft; a polished direction that strips nav or density loses. Validated by re-running the regressed pages: docs keeps its ToC + nav + dense code; HN keeps all 30 stories; the dashboard stays a dense service grid.

tangletools

✅ Auto-approved PR — `a31d2d42`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T23:01:30Z}

tangletools

🟡 Value Audit — sound-with-nits


Verdict	sound-with-nits
Concerns	2 (1 medium-concern, 1 low)
Heuristic	0.0s
Duplication	0.0s
Interrogation	117.2s (2 bridge agents)
Total	117.2s

💰 Value — sound

Reframes the reference-grounded redesign engine from aesthetic-first to job-first via system-prompt rules and a data-driven functional contract, with matching judge priority order; tested, linted, and changeset'd.

What it does: Changes two prompt builders and their tests. In src/design/audit/reference/generate/prompt.ts, the generator persona shifts from "world-class art director" to "senior product designer", the exemplar is explicitly craft-only (not a structural template), and hard rules are reordered so task fitness and preserving functional affordances/density outrank visual polish. It also adds `renderFunctionalC
Goals it achieves: Stops the redesign engine from regressing functional pages (docs, dashboards, aggregators) into generic, sparse marketing pages when grounded against tasteful but structurally different exemplars. It makes "keep navigation/ToC/density" an explicit, measured constraint rather than an optional aesthetic preference, and re-aligns the LLM judge so it does not reward prettier-but-less-usable directions
Assessment: Good change. It is narrowly scoped to the prompt layer, builds directly on existing DesignDNA fields and the pure prompt-builder pattern already in the codebase, adds regression tests, includes a changeset, and passes pnpm lint and pnpm check:boundaries. The data-driven density gating (density === 'dense') avoids forcing sparse pages to stay sparse, which matches the stated problem.
Better / existing approach: none — this is the right approach. I checked the existing rubric/anchor system (src/design/audit/rubric/anchors/docs.yaml, src/design/audit/rubric/anchors/dashboard.yaml, src/design/audit/rubric/fragments/type-docs.md, src/design/audit/rubric/rollup-weights.ts) and it already encodes page-type priorities, but rubricBody is optional in both GenerationContext and JudgePairInput, so the
Model: opencode/kimi-for-coding/k2p7
Bridge attempts: 1

🎯 Usefulness — sound-with-nits

A well-integrated job-first reframe of the redesign+judge prompts that fixes a real AI-judging-AI regression; the one data-driven piece (density gate) is keyed to a signal that misclassifies the cited content-dense pages.

Integration: Fully reachable. buildDirectionPrompt is consumed by generate/generator.ts:78; buildPairwisePrompt/buildQualityPrompt by judge/text-judge.ts:46-47 and judge/vision-judge.ts:162-163. The new renderFunctionalContract is a private helper invoked inline at prompt.ts:193 — no new public surface, no dead code.
Fit with existing patterns: Follows the established pure-prompt-builder grain exactly. renderFunctionalContract mirrors the existing renderConstraints/renderExemplarBlock helpers (deterministic, pure, section-joined). All referenced DNA fields (layout.density, layout.archetype, components.nav) exist as required fields on DesignDNA (contracts.ts:271,277,288); Density is 'sparse'|'balanced'|'dense' (contracts.ts:85). No compet
Real-world viability: Mostly holds: persona reframe, unconditional system-prompt priority rules, nav preservation, and judge priority ordering are signal-independent and will fire on every call. The density gate does not — see finding.
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

🔎 Heuristic Signals

🟡 Cruft: todo added src/design/audit/reference/generate/prompt.ts

' facts or use placeholders like "TODO" or "lorem ipsum".',

🎯 Usefulness Audit

🟠 Density gate is keyed to component-pattern variety, not information density — no-ops on the cited regressions [problem-fit] ``

renderFunctionalContract gates the 'This page is DENSE: keep at least as many items/rows' directive on density === 'dense' (prompt.ts:139). But layout.density is derived at dna/derive.ts:308 as deriveDensity(buttons+inputs+cards+nav) — and deriveDensity (line 109-118) is called WITHOUT whitespaceRatio (hardcoded undefined at derive.ts:343), so it runs purely off distinctComponentCount, which returns set.size of component FINGERPRINTS (line 226) = pattern variety, not instance count. A page needs

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260623T232433Z}

tangletools approved these changes Jun 23, 2026

View reviewed changes

tangletools reviewed Jun 23, 2026

View reviewed changes

drewstone merged commit a2055b2 into main Jun 23, 2026
5 checks passed

drewstone deleted the feat/job-first-redesign-engine branch June 23, 2026 23:25

github-actions Bot mentioned this pull request Jun 23, 2026

Release: version packages #121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(design-audit): make reference-grounded redesign job-first, not aesthetic-first#124

feat(design-audit): make reference-grounded redesign job-first, not aesthetic-first#124
drewstone merged 1 commit into
mainfrom
feat/job-first-redesign-engine

drewstone commented Jun 23, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 23, 2026

Problem

Fix — job-first, not aesthetic-first

Validation — re-ran the regressed pages with the fixed engine

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — a31d2d42

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟡 Value Audit — sound-with-nits

💰 Value — sound

🎯 Usefulness — sound-with-nits

🔎 Heuristic Signals

🎯 Usefulness Audit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `a31d2d42`