You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Drop window-based compaction in favor of per-block size caps (8 KB for
assistant text, toolCall.input, and toolResult content; 2 KB aggressive
fallback). The window approach missed the dominant failure mode: the
latest assistant message carrying a 9 MB artifact dump inside the "keep
verbatim" window, which produced 3.97M-token requests and 400s.
New philosophy: history is intent tracking, not payload storage. Large
outputs get stubbed regardless of position — current state is always
recoverable via view().
Also resolves the conflicting instructions that produced the 9 MB
payload in the first place: base output-rules §"Artifact wrapper" and
workflow step 7 both require the model to emit the full design as
assistant text inside <artifact>, while the agent guidance says do NOT
emit it. Adds an explicit OVERRIDE block at the top of AGENTIC_TOOL_
GUIDANCE so the tool-mode branch wins unambiguously, and tightens the
final-turn note to spell out that pasting the file crashes the next
request.
- packages/core/src/context-prune.ts: rewrite around per-block caps
- packages/core/src/context-prune.test.ts: 7 tests for new contract
- packages/core/src/agent.ts: OVERRIDE block + final-turn warning
' 4. One short prose line reflecting on what landed ("Three KPIs in place — the deltas use mono tnum so they line up.").',
356
367
' 5. Tick the matching todo via `set_todos`.',
357
368
' That is **2 prose lines + 3 tool calls per turn**. Never batch multiple sections into a single str_replace; never run two str_replace tools in the same turn without a prose line in between.',
358
-
'4. **Polish passes — interactive depth.** At least ONE dedicated turn that ADDS interactive depth, not just cosmetic tweaks. Before `done`, make sure:',
359
-
' (a) ≥2 state changes actually wire up (tab switch in a section, accordion open, favorite toggle, dropdown on avatar, drawer "See details", inline-edit click)',
360
-
' (b) ≥1 list / grid / table has a believable empty state component defined (icon + reason + CTA), even if the current data is non-empty',
361
-
' (c) all buttons / cards have hover + press feedback (`transition: transform 120ms var(--ease-out), background-color 120ms`; press = `scale(0.96)`; cards lift 2px on hover)',
362
-
' (d) data is real-sounding, not Lorem: varied names, realistic numbers, relative dates ("3h ago")',
363
-
' Add an explicit `Interactive polish` todo item in `set_todos` so the user sees it ticked.',
364
-
'5. **Final turn — summary.** 2–4 sentences of natural-language prose explaining 2–3 design decisions worth noting (e.g. "Used three distinct surface tones for depth"). Do NOT re-emit the file content; the host extracts it from the virtual fs.',
369
+
'4. **Polish passes — interactive depth (MANDATORY, ≥2 dedicated turns).** The first polish turn wires interactions; the second adds small-detail craft. These are NOT optional — if the user sees static pixels where they expected live UI, the artifact fails. Before `done` every item on this list must be TRUE:',
370
+
' (a) **≥3 functional state changes** that a user can trigger and observe. Tab switch revealing a different view, accordion open/close, drawer slide-in, favorite/like toggle that persists, dropdown/menu expand, inline-edit, filter chip toggle, modal open. Pure hover effects do NOT count toward this three.',
371
+
' (b) **≥1 animated view/page transition** if there is any nav (tabs, sidebar, bottom bar, breadcrumbs). 180–260ms, opacity + small translate. A hard cut between views is a failure.',
372
+
' (c) **Every `<button>` and `<a>` does something.** No decorative buttons. Wire a state change, open a modal, fire a toast, or remove it. Login / Sign-up / CTA buttons on marketing pages may open a modal stub — still real, not dead.',
373
+
' (d) **Uniform hover + press + focus** across ALL clickable elements. Required cadence: `transition: transform 120ms var(--ease-out), background-color 120ms, box-shadow 160ms;` hover lifts 2px; press = `scale(0.96)`; focus = 2px offset ring in accent color (never rely on browser default outline).',
374
+
' (e) **≥3 small-detail "craft-surplus" touches** from the craft-directives catalog. Pick from: stateful counter/badge with pop animation, keyboard shortcut chip (`⌘K`, `/`, `esc`), inline-editable field, copy-to-clipboard with "Copied ✓" feedback, dismissible toast/banner, contextual tooltip with directional arrow, scroll-linked header shrink, relative-time tick ("3m ago"), segmented control with weighted active state, thoughtful empty-state SVG scene, expandable accordion inside a card, a deliberate visual rhythm-break section. Adding a gradient and shadow does NOT count.',
375
+
' (f) **≥1 empty-state variant** visible or coded (icon + one-sentence reason + CTA) on a list/grid/table, even when current data is non-empty.',
376
+
' (g) **Active nav indicator uses weight/shape, not color alone** — underline, inset background, side-accent bar, or pill — so color-blind users can tell where they are.',
377
+
' (h) Data reads real: varied names, non-round numbers (87 %, $14.2k), relative dates ("3h ago", "yesterday"), not Lorem / 100 % / Jan 1 2020.',
378
+
' Break this into TWO todo items: `Interactive wiring (state + transitions)` and `Craft surplus (small details)`. Tick them explicitly so the user can see both phases landed.',
379
+
'5. **Final turn — summary.** 2–4 sentences of natural-language prose explaining 2–3 design decisions worth noting (e.g. "Used three distinct surface tones for depth"). Do NOT re-emit the file content; the host extracts it from the virtual fs. Pasting the full file here wastes ~2M tokens on the next turn and will crash the request — this is a hard failure, not a style nit.',
365
380
'',
366
381
'### File output policy (STRICT)',
367
382
"- Use `str_replace_based_edit_tool` for ALL file content. Do NOT emit `<artifact>` tags or fenced ```jsx/```html blocks containing the source in your prose — the host extracts the artifact from the virtual fs and any inline source spams the user's chat.",
@@ -714,7 +729,7 @@ export async function generateViaAgent(
714
729
// Without this, assistant.toolCall.input + big view results grow O(N²)
715
730
// in LLM-facing size across a long tool-using run and blow past 1 M
716
731
// tokens. See context-prune.ts for the full strategy.
0 commit comments