feat(cli)!: make `codehub analyze` the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in) by theagenticguy · Pull Request #110 · theagenticguy/opencodehub

theagenticguy · 2026-05-14T03:33:20Z

Summary

Reshapes codehub analyze so a bare invocation produces the full local
artifact set agents rely on — graph, SARIF, SBOM, coverage overlay
(when present), ownership, cochanges — while keeping the LLM phase
opt-in.

Before: codehub analyze built the graph; codehub scan and
--sbom/--coverage were separate steps the operator had to remember.
Agents calling list_findings or verdict on a fresh index got empty
tables because the SARIF hadn't been written yet.

After: one command, one .codehub/ folder with everything the MCP
surface reads. Zero AWS, zero Bedrock, zero LLM calls unless explicitly
opted in.

Default behavior (new)

Phase	Before	After	Opt-out
Graph (tree-sitter + SCIP + communities + processes + cochanges + ownership + dependencies + detectors)	on	on	—
scan (Priority-1 scanners → `.codehub/scan.sarif` → graph findings)	separate command	on	`--no-scan`
sbom (CycloneDX + SPDX from Dependency nodes)	off	on	`--no-sbom`
coverage (lcov / cobertura / jacoco / coverage.py overlay)	off	auto — probe known paths, enable if found	`--no-coverage`
summaries (Bedrock LLM narrative summaries)	on	off — fully opt-in	`--summaries` to enable
embeddings	off	off	`--embeddings` to enable

--offline flag continues to work: network-backed scanners
(osv-scanner, grype, npm/pip audit) self-skip, so the scan default
stays honest for air-gapped runs.

Why

A bare codehub analyze should be the dependable foundation of every
agent workflow. Before this change, calling list_findings or
verdict on a fresh analyze returned empty — the SARIF hadn't been
written yet. Operators either knew to chain codehub analyze && codehub scan, or silently got the degraded experience. Folding scan into
analyze makes the MCP surface work on day one.

sbom and coverage follow the same logic: the cost of producing
them is negligible when the data is already in the graph, and every
downstream audit wants them. Coverage auto-detect is the key wrinkle —
we silently skip when no report exists so repos without tests don't
get warnings, and force-on (--coverage) still warns to catch setup
errors.

Commits

feat(cli)!: make codehub analyze fast by default; summaries are opt-in
— flips --summaries default from ON to OFF; adds
CODEHUB_BEDROCK_SUMMARIES=1 env opt-in; fixes a latent bug where
the CLI collapsed --summaries=true to undefined before
forwarding.
feat(cli)!: scan + sbom default on; coverage auto-detects reports
— folds runScan() into runAnalyze as a best-effort step; flips
--sbom default to ON; adds resolveCoverageEnabled +
detectCoverageReport for silent auto-detect.

Precedence — summaries

`CODEHUB_BEDROCK_DISABLED=1`	`flag=true`	`flag=false`	`CODEHUB_BEDROCK_SUMMARIES=1`	result
yes	any	any	any	off
no	yes	—	any	on
no	no	yes	any	off
no	unset	unset	yes	on
no	unset	unset	no	off (new default)

Changes

resolveSummariesEnabled() — truth table flipped, new env opt-in.
resolveSbomEnabled() — trivial default-on.
resolveScanEnabled() — trivial default-on.
resolveCoverageEnabled() + detectCoverageReport() — auto-detect
lcov / cobertura / jacoco / coverage.py at the same candidate paths
the ingestion phase uses.
runAnalyze — invokes runScan() (from ./scan.js) as a best-effort
step at the end, logs findings count, writes .codehub/scan.sarif.
Scanner failure logs-and-continues so analyze never regresses the
graph over a flaky scanner.
CLI entry — forwards three-state sbom/coverage/scan opts, adds
--no-sbom / --no-coverage / --no-scan.
Tests — 26 new unit tests (3 × sbom, 3 × scan, 7 × detectCoverageReport,
4 × resolveCoverageEnabled, 9 × resolveSummariesEnabled).
Docs — reference/cli.md and guides/indexing-a-repo.md updated.

Verification

pnpm typecheck — green across 19 workspace projects.
pnpm lint — green. (The one info diagnostic is the stale Biome
$schema URL fixed in build(deps): unify dependabot bumps for npm + GitHub Actions #109.)
pnpm -r test — 2,044 tests pass (was 2,027 on main; +26 new /
-9 replaced).
Pre-push hook: typecheck + test passed. verdict flagged this PR
as dual_review (2 symbols / 4 communities / 28 symbols affected) —
expected for a breaking CLI change; surfaced here for reviewer
awareness, not a CI failure.

Smoke test on a fresh 2-function TS repo (bare `codehub analyze`)

codehub scan: running 5 scanner(s): semgrep, betterleaks, osv-scanner, detect-secrets, grype
codehub ingest-sarif: 5 findings, 5 edges from .codehub/scan.sarif
codehub analyze: scan — 5 scanner(s), 5 finding(s), sarif=.codehub/scan.sarif
codehub analyze: smoke-repo — 7 nodes, 6 edges, graph f66a5d27, total ~10s

.codehub/ now contains: graph.lbug, temporal.duckdb, meta.json,
scan.sarif, sbom.cyclonedx.json, sbom.spdx.json, scan-state.json,
parse-cache/. Previously it contained only graph.*, meta.json, and
parse-cache/.

--no-scan --no-sbom reproduces the pre-flip fast path (~727 ms).

Opt-in Bedrock path:

CODEHUB_BEDROCK_SUMMARIES=1 codehub analyze /tmp/smoke-repo --max-summaries 0 --verbose
codehub analyze: summarize — considered=2, skippedUnconfirmed=2, cacheHits=0,
                 summarized=0, wouldHaveSummarized=0, failed=0

BREAKING CHANGES

Bare codehub analyze no longer runs the summarize phase. Workflows
depending on implicit summarization must add --summaries or set
CODEHUB_BEDROCK_SUMMARIES=1. CODEHUB_BEDROCK_DISABLED=1
continues to work unchanged.
Bare codehub analyze now runs Priority-1 scanners and emits SBOMs.
Pass --no-scan / --no-sbom for the pre-flip graph-only behavior.
Workflows that previously chained codehub analyze && codehub scan
still work; the chained scan is now redundant.

Test plan

Resolver truth tables covered by unit tests.
Coverage auto-detect covered for all 5 candidate paths + miss +
priority order.
pnpm -r test — 2,044 passing, 0 fails.
Smoke-tested end-to-end: default, --no-scan --no-sbom,
--coverage with report, --summaries,
CODEHUB_BEDROCK_SUMMARIES=1, and kill-switch all behave as
specified.

`codehub analyze` with no flags now runs only the fast, local, deterministic pipeline — tree-sitter parse, SCIP resolution, graph composition, cochanges, ownership, and detectors. No Bedrock, no network hop, no AWS credentials required. The Bedrock-backed `summarize` phase is opt-in. Opt in one of: - `--summaries` (per-invocation) - `CODEHUB_BEDROCK_SUMMARIES=1` (environment / CI-wide) `--no-summaries` and `CODEHUB_BEDROCK_DISABLED=1` still force the phase off; the kill-switch continues to win against both opt-in forms. Embeddings were already opt-in via `--embeddings`; this change aligns the LLM phase with the same model. Why: a bare `codehub analyze` should not block on a network hop, spend on LLM tokens, or require AWS creds. Summaries can be expensive on large repos (the auto cap is 10% of callables; hundreds of Bedrock calls on a mid-sized monorepo). Making them opt-in matches the indexing contract the rest of the CLI already follows (sbom, coverage, embeddings, skills) and lets `codehub analyze` be the dependable foundation of every agent workflow that doesn't need narrative text. Changes: - `resolveSummariesEnabled()` truth table flipped. Unknown flag + no env → false. New env opt-in `CODEHUB_BEDROCK_SUMMARIES=1`. Test suite refactored (9 tests, full coverage of the combined precedence). - CLI entry now forwards `--summaries=true` to `runAnalyze` (was previously collapsed to `undefined`, which masked a dead code path). - Help text, CLI reference table, indexing guide, and configuration reference all updated. Pipeline-level doc unchanged — it already documented the `PipelineOptions.summaries` default as `false`; only the CLI wrapper's historical default had drifted. BREAKING CHANGE: bare `codehub analyze` no longer runs the summarize phase. Workflows that depended on implicit summarization must add `--summaries` or set `CODEHUB_BEDROCK_SUMMARIES=1`.

Extends the fast-default rework so a bare `codehub analyze` produces the full local artifact set agents rely on: - scan (NEW default ON): runs Priority-1 scanners at the end of analyze, writes `.codehub/scan.sarif`, and ingests findings into the graph. Makes `verdict`, `list_findings`, and `list_findings_delta` work on day one without a separate `codehub scan` step. Network-backed scanners (osv-scanner, grype, npm/pip audit) self-skip under `--offline`, so the on-default stays honest for air-gapped runs. Opt out with `--no-scan`; a scanner failure logs-and-continues so analyze never regresses the graph because of a flaky scanner. - sbom (flipped to default ON): emitting CycloneDX + SPDX from the Dependency nodes the graph already has is cheap and universally wanted. Opt out with `--no-sbom`. - coverage (flipped to default AUTO): probes `coverage/lcov.info`, `lcov.info`, `coverage.xml`, `build/reports/jacoco/test/ jacocoTestReport.xml`, `coverage.json` in that order and enables the phase only when a report exists. Silent no-op otherwise (no spurious "no report found" warning on repos without tests). `--coverage` still force-enables and warns; `--no-coverage` force-disables. New exports for tests: `resolveSbomEnabled`, `resolveScanEnabled`, `resolveCoverageEnabled`, `detectCoverageReport`. 17 new unit tests (2044 total, was 2027): - 3 × resolveSbomEnabled (default on, explicit on, --no-sbom) - 3 × resolveScanEnabled (default on, explicit on, --no-scan) - 7 × detectCoverageReport (5 candidate paths + miss + priority order) - 4 × resolveCoverageEnabled (explicit true/false, undefined+none, undefined+report-found) Smoke-tested end-to-end on a throwaway 2-function TS repo: - default `analyze` → runs scan (5 scanners, SARIF written, findings ingested), emits sbom.cyclonedx.json + sbom.spdx.json, detects no coverage report → silent no-op. Total ~10s. - `analyze --no-scan --no-sbom` → 727 ms; pre-flip fast path preserved. - `analyze` with `lcov.info` in repo → coverage phase auto-engages. Docs: CLI reference table + indexing guide updated with the new defaults and the auto-detect candidate paths. BREAKING CHANGE: bare `codehub analyze` now runs Priority-1 scanners and emits SBOMs. Pass `--no-scan` / `--no-sbom` for the pre-flip graph-only behavior. Workflows that previously invoked `codehub scan` separately still work; the post-analyze scan just makes the separate invocation optional.

🤖 Automated release via release-please --- <details><summary>cli: 0.3.0</summary> ## [0.3.0](cli-v0.2.3...cli-v0.3.0) (2026-05-15) ### ⚠ BREAKING CHANGES * **cli:** make `codehub analyze` the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in) ([#110](#110)) * **plugin:** the five slash commands (/probe, /verdict, /owners, /audit-deps, /rename) shipped by the Claude Code plugin are gone with no backward compatibility. Slash commands as a plugin surface are deprecated; the same workflows are still available via: ### Features * **cli:** make `codehub analyze` the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in) ([#110](#110)) ([62bff2f](62bff2f)) * **plugin:** remove deprecated Claude Code slash commands ([5769fc1](5769fc1)) </details> <details><summary>root: 0.4.0</summary> ## [0.4.0](root-v0.3.2...root-v0.4.0) (2026-05-15) ### ⚠ BREAKING CHANGES * **cli:** make `codehub analyze` the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in) ([#110](#110)) * **plugin:** the five slash commands (/probe, /verdict, /owners, /audit-deps, /rename) shipped by the Claude Code plugin are gone with no backward compatibility. Slash commands as a plugin surface are deprecated; the same workflows are still available via: ### Features * **cli:** make `codehub analyze` the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in) ([#110](#110)) ([62bff2f](62bff2f)) * **plugin:** remove deprecated Claude Code slash commands ([5769fc1](5769fc1)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

theagenticguy changed the title ~~feat(cli)!: make codehub analyze fast by default; summaries are opt-in~~ feat(cli)!: make codehub analyze the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in) May 14, 2026

theagenticguy added 2 commits May 14, 2026 14:25

theagenticguy force-pushed the feat/analyze-fast-default branch from f06d1f2 to 154cd99 Compare May 14, 2026 14:29

theagenticguy merged commit 62bff2f into main May 14, 2026
37 checks passed

theagenticguy deleted the feat/analyze-fast-default branch May 14, 2026 16:05

github-actions Bot mentioned this pull request May 14, 2026

chore: release main #97

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli)!: make `codehub analyze` the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in)#110

feat(cli)!: make `codehub analyze` the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in)#110
theagenticguy merged 2 commits into
mainfrom
feat/analyze-fast-default

theagenticguy commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

theagenticguy commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Default behavior (new)

Why

Commits

Precedence — summaries

Changes

Verification

Smoke test on a fresh 2-function TS repo (bare codehub analyze)

BREAKING CHANGES

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

theagenticguy commented May 14, 2026 •

edited

Loading

Smoke test on a fresh 2-function TS repo (bare `codehub analyze`)