feat: snippet skeletonization + content-aware rendering by denfry · Pull Request #17 · denfry/codebase-index

denfry · 2026-06-24T06:31:46Z

What & why

Ports the core idea behind headroomlabs-ai/headroom — separating structural tokens from compressible ones (its StructureMask + AST code_handler) — into the retrieval layer, adapted for a code-search tool rather than a generic compression proxy.

Retrieval snippets are now focus skeletons: imports/signatures/class headers and the query-matching line are kept, while function/method bodies collapse to a marker like ... 24 lines elided (read 88-134). Result: more ranked results fit the same token_budget. On a real source file the transform cut a snippet from 2211 → 896 tokens (59%) while keeping every signature.

Key adaptations vs. headroom (it's a proxy; this is a retriever):

Focus skeleton — the matched line is never elided (a body line is often the answer).
Routing by detect_language(path) — no ML content detector; we already know the path.
Line-granularity mask — robust for partial window chunks; matches the codebase's line-range idiom.
Reversible — recommended_reads / line_start-line_end remain the expand path.
Lossless-safe — a savings guard (≥25%) and a tree-sitter→regex→raw fallback chain mean output is never worse than today; compactor=None / --raw reproduce current output byte-for-byte.

How it works

pipeline.search builds a compactor (intent → context width; query → focus terms) and injects it into apply_budget. Per candidate: classify lines → render skeleton → redact → budget on the reduced token estimate. New skeleton.py routes code (AST via existing parse_file), markdown (headings), and structured config (key lines); everything else is untouched.

Surface

New result fields: skeletonized: bool, elided_lines: int.
Config: retrieval.compact_snippets (default true), retrieval.compact_min_reduction (default 0.25) — retrieval-time only, no reindex.
Disable per-call: --raw (CLI search/explain) or raw: true (MCP search_code/explain_code).
SKILL.md updated (all synced copies) so the agent interprets the new fields.

Tests

TDD throughout. tests/test_skeleton.py (render/classify/focus/guard/fallback/determinism), budget injection, pipeline on/off, config (incl. config_hash unchanged), CLI/MCP flag, regenerated CLI+MCP goldens (additive fields only). 440 passed / 2 skipped, 85.38% coverage; ruff + mypy clean.

Design & plan: docs/superpowers/specs/2026-06-24-snippet-skeletonization-design.md, docs/superpowers/plans/2026-06-24-snippet-skeletonization.md.

🤖 Generated with Claude Code

Ports headroom's AST structure handler + StructureMask idea into the retrieval layer: focus skeletons (signatures + matched lines, bodies elided) so more results fit the token budget. Reversible, content-aware (code/markdown/structured), retrieval-time only, raw-fallback safe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…llback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Edit canonical skill_template/SKILL.md + sync all copies; CHANGELOG entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Goldens gain additive skeletonized/elided_lines fields; budget.py types the compactor as Optional[Callable[[Candidate], Compacted]]; tidy test imports. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d58c231c27

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-24T06:37:35Z

+    for t in _TERM_RE.findall(query):
+        tl = t.lower()
+        if len(tl) >= 3 and tl not in _STOPWORDS:
+            out.append(tl)


Preserve matches found through identifier subtokens

When a camelCase query retrieves a snake_case hit via the existing FTS subtoken expansion, this extractor only focus-keeps the unsplit lowercased token (for example refreshAccessToken), so the actual matching body line refresh_access_token can still be elided. That breaks the feature's guarantee that the query-matching line is preserved and makes cross-style identifier searches return skeletons that omit the line that caused the hit; please expand query terms the same way retrieval does before applying focus.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-24T06:37:35Z

+    Scans forward (bounded) for the line that opens the body so multi-line
+    signatures stay visible; defaults to ``start`` when nothing matches.
+    """
+    limit = min(end, start + _MAX_SIG_SCAN)


Preserve multi-line signatures past five lines

For declarations whose body opener is more than five lines after the start line, this cap makes _signature_end fall back to start; _classify_code then treats the rest of the declaration as body and elides parameter/type lines. This is common for long typed Python/TS/Go signatures and contradicts the skeleton contract that signatures stay visible, so either scan to the actual body opener within the symbol span or fall back to the raw snippet when the opener is not found.

Useful? React with 👍 / 👎.

denfry and others added 11 commits June 24, 2026 08:22

docs: implementation plan for snippet skeletonization

a6c1f7d

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(skeleton): render_skeleton collapses keep/elide mask into markers

384a1f4

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(skeleton): code classifier + compact() with focus, guard, raw fa…

1bff1e2

…llback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(skeleton): markdown heading + structured key classifiers

54768b3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(skeleton): make_compactor factory with intent->context policy

8715ad1

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(budget): inject snippet compactor; emit skeletonized/elided_lines

4fb5164

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(pipeline): build snippet compactor; add compact config knobs

adc9273

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(cli,mcp): --raw / raw flag to disable snippet skeletonization

dc74e9b

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs: document snippet skeletonization fields and --raw flag

f566e5a

Edit canonical skill_template/SKILL.md + sync all copies; CHANGELOG entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test: regenerate goldens for skeletonized fields; fix ruff/mypy

d58c231

Goldens gain additive skeletonized/elided_lines fields; budget.py types the compactor as Optional[Callable[[Candidate], Compacted]]; tidy test imports. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

denfry merged commit 08dd2ff into main Jun 24, 2026
10 checks passed

denfry deleted the feat/snippet-skeletonization branch June 24, 2026 06:35

chatgpt-codex-connector Bot reviewed Jun 24, 2026

View reviewed changes

denfry mentioned this pull request Jun 24, 2026

release: v1.6.0 #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: snippet skeletonization + content-aware rendering#17

feat: snippet skeletonization + content-aware rendering#17
denfry merged 11 commits into
mainfrom
feat/snippet-skeletonization

denfry commented Jun 24, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

denfry commented Jun 24, 2026

What & why

How it works

Surface

Tests

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant