Skip to content

feat(cli)!: make codehub analyze the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in)#110

Merged
theagenticguy merged 2 commits into
mainfrom
feat/analyze-fast-default
May 14, 2026
Merged

feat(cli)!: make codehub analyze the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in)#110
theagenticguy merged 2 commits into
mainfrom
feat/analyze-fast-default

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

@theagenticguy theagenticguy commented May 14, 2026

Summary

Reshapes codehub analyze so a bare invocation produces the full local
artifact set agents rely on — graph, SARIF, SBOM, coverage overlay
(when present), ownership, cochanges — while keeping the LLM phase
opt-in.

Before: codehub analyze built the graph; codehub scan and
--sbom/--coverage were separate steps the operator had to remember.
Agents calling list_findings or verdict on a fresh index got empty
tables because the SARIF hadn't been written yet.

After: one command, one .codehub/ folder with everything the MCP
surface reads. Zero AWS, zero Bedrock, zero LLM calls unless explicitly
opted in.

Default behavior (new)

Phase Before After Opt-out
Graph (tree-sitter + SCIP + communities + processes + cochanges + ownership + dependencies + detectors) on on
scan (Priority-1 scanners → .codehub/scan.sarif → graph findings) separate command on --no-scan
sbom (CycloneDX + SPDX from Dependency nodes) off on --no-sbom
coverage (lcov / cobertura / jacoco / coverage.py overlay) off auto — probe known paths, enable if found --no-coverage
summaries (Bedrock LLM narrative summaries) on off — fully opt-in --summaries to enable
embeddings off off --embeddings to enable

--offline flag continues to work: network-backed scanners
(osv-scanner, grype, npm/pip audit) self-skip, so the scan default
stays honest for air-gapped runs.

Why

A bare codehub analyze should be the dependable foundation of every
agent workflow. Before this change, calling list_findings or
verdict on a fresh analyze returned empty — the SARIF hadn't been
written yet. Operators either knew to chain codehub analyze && codehub scan, or silently got the degraded experience. Folding scan into
analyze makes the MCP surface work on day one.

sbom and coverage follow the same logic: the cost of producing
them is negligible when the data is already in the graph, and every
downstream audit wants them. Coverage auto-detect is the key wrinkle —
we silently skip when no report exists so repos without tests don't
get warnings, and force-on (--coverage) still warns to catch setup
errors.

Commits

  1. feat(cli)!: make codehub analyze fast by default; summaries are opt-in
    — flips --summaries default from ON to OFF; adds
    CODEHUB_BEDROCK_SUMMARIES=1 env opt-in; fixes a latent bug where
    the CLI collapsed --summaries=true to undefined before
    forwarding.
  2. feat(cli)!: scan + sbom default on; coverage auto-detects reports
    — folds runScan() into runAnalyze as a best-effort step; flips
    --sbom default to ON; adds resolveCoverageEnabled +
    detectCoverageReport for silent auto-detect.

Precedence — summaries

CODEHUB_BEDROCK_DISABLED=1 flag=true flag=false CODEHUB_BEDROCK_SUMMARIES=1 result
yes any any any off
no yes any on
no no yes any off
no unset unset yes on
no unset unset no off (new default)

Changes

  • resolveSummariesEnabled() — truth table flipped, new env opt-in.
  • resolveSbomEnabled() — trivial default-on.
  • resolveScanEnabled() — trivial default-on.
  • resolveCoverageEnabled() + detectCoverageReport() — auto-detect
    lcov / cobertura / jacoco / coverage.py at the same candidate paths
    the ingestion phase uses.
  • runAnalyze — invokes runScan() (from ./scan.js) as a best-effort
    step at the end, logs findings count, writes .codehub/scan.sarif.
    Scanner failure logs-and-continues so analyze never regresses the
    graph over a flaky scanner.
  • CLI entry — forwards three-state sbom/coverage/scan opts, adds
    --no-sbom / --no-coverage / --no-scan.
  • Tests — 26 new unit tests (3 × sbom, 3 × scan, 7 × detectCoverageReport,
    4 × resolveCoverageEnabled, 9 × resolveSummariesEnabled).
  • Docs — reference/cli.md and guides/indexing-a-repo.md updated.

Verification

  • pnpm typecheck — green across 19 workspace projects.
  • pnpm lint — green. (The one info diagnostic is the stale Biome
    $schema URL fixed in build(deps): unify dependabot bumps for npm + GitHub Actions #109.)
  • pnpm -r test2,044 tests pass (was 2,027 on main; +26 new /
    -9 replaced).
  • Pre-push hook: typecheck + test passed. verdict flagged this PR
    as dual_review (2 symbols / 4 communities / 28 symbols affected) —
    expected for a breaking CLI change; surfaced here for reviewer
    awareness, not a CI failure.

Smoke test on a fresh 2-function TS repo (bare codehub analyze)

codehub scan: running 5 scanner(s): semgrep, betterleaks, osv-scanner, detect-secrets, grype
codehub ingest-sarif: 5 findings, 5 edges from .codehub/scan.sarif
codehub analyze: scan — 5 scanner(s), 5 finding(s), sarif=.codehub/scan.sarif
codehub analyze: smoke-repo — 7 nodes, 6 edges, graph f66a5d27, total ~10s

.codehub/ now contains: graph.lbug, temporal.duckdb, meta.json,
scan.sarif, sbom.cyclonedx.json, sbom.spdx.json, scan-state.json,
parse-cache/. Previously it contained only graph.*, meta.json, and
parse-cache/.

--no-scan --no-sbom reproduces the pre-flip fast path (~727 ms).

Opt-in Bedrock path:

CODEHUB_BEDROCK_SUMMARIES=1 codehub analyze /tmp/smoke-repo --max-summaries 0 --verbose
codehub analyze: summarize — considered=2, skippedUnconfirmed=2, cacheHits=0,
                 summarized=0, wouldHaveSummarized=0, failed=0

BREAKING CHANGES

  • Bare codehub analyze no longer runs the summarize phase. Workflows
    depending on implicit summarization must add --summaries or set
    CODEHUB_BEDROCK_SUMMARIES=1. CODEHUB_BEDROCK_DISABLED=1
    continues to work unchanged.
  • Bare codehub analyze now runs Priority-1 scanners and emits SBOMs.
    Pass --no-scan / --no-sbom for the pre-flip graph-only behavior.
    Workflows that previously chained codehub analyze && codehub scan
    still work; the chained scan is now redundant.

Test plan

  • Resolver truth tables covered by unit tests.
  • Coverage auto-detect covered for all 5 candidate paths + miss +
    priority order.
  • pnpm -r test — 2,044 passing, 0 fails.
  • Smoke-tested end-to-end: default, --no-scan --no-sbom,
    --coverage with report, --summaries,
    CODEHUB_BEDROCK_SUMMARIES=1, and kill-switch all behave as
    specified.

@theagenticguy theagenticguy changed the title feat(cli)!: make codehub analyze fast by default; summaries are opt-in feat(cli)!: make codehub analyze the one-command index (fast + scan + sbom + coverage-auto; summaries opt-in) May 14, 2026
`codehub analyze` with no flags now runs only the fast, local,
deterministic pipeline — tree-sitter parse, SCIP resolution, graph
composition, cochanges, ownership, and detectors. No Bedrock, no
network hop, no AWS credentials required.

The Bedrock-backed `summarize` phase is opt-in. Opt in one of:
- `--summaries`  (per-invocation)
- `CODEHUB_BEDROCK_SUMMARIES=1`  (environment / CI-wide)

`--no-summaries` and `CODEHUB_BEDROCK_DISABLED=1` still force the phase
off; the kill-switch continues to win against both opt-in forms.

Embeddings were already opt-in via `--embeddings`; this change aligns
the LLM phase with the same model.

Why: a bare `codehub analyze` should not block on a network hop, spend
on LLM tokens, or require AWS creds. Summaries can be expensive on
large repos (the auto cap is 10% of callables; hundreds of Bedrock
calls on a mid-sized monorepo). Making them opt-in matches the
indexing contract the rest of the CLI already follows (sbom, coverage,
embeddings, skills) and lets `codehub analyze` be the dependable
foundation of every agent workflow that doesn't need narrative text.

Changes:
- `resolveSummariesEnabled()` truth table flipped. Unknown flag + no
  env → false. New env opt-in `CODEHUB_BEDROCK_SUMMARIES=1`. Test suite
  refactored (9 tests, full coverage of the combined precedence).
- CLI entry now forwards `--summaries=true` to `runAnalyze` (was
  previously collapsed to `undefined`, which masked a dead code path).
- Help text, CLI reference table, indexing guide, and configuration
  reference all updated. Pipeline-level doc unchanged — it already
  documented the `PipelineOptions.summaries` default as `false`; only
  the CLI wrapper's historical default had drifted.

BREAKING CHANGE: bare `codehub analyze` no longer runs the summarize
phase. Workflows that depended on implicit summarization must add
`--summaries` or set `CODEHUB_BEDROCK_SUMMARIES=1`.
Extends the fast-default rework so a bare `codehub analyze` produces the
full local artifact set agents rely on:

- scan (NEW default ON): runs Priority-1 scanners at the end of analyze,
  writes `.codehub/scan.sarif`, and ingests findings into the graph.
  Makes `verdict`, `list_findings`, and `list_findings_delta` work on
  day one without a separate `codehub scan` step. Network-backed
  scanners (osv-scanner, grype, npm/pip audit) self-skip under
  `--offline`, so the on-default stays honest for air-gapped runs.
  Opt out with `--no-scan`; a scanner failure logs-and-continues so
  analyze never regresses the graph because of a flaky scanner.

- sbom (flipped to default ON): emitting CycloneDX + SPDX from the
  Dependency nodes the graph already has is cheap and universally
  wanted. Opt out with `--no-sbom`.

- coverage (flipped to default AUTO): probes `coverage/lcov.info`,
  `lcov.info`, `coverage.xml`, `build/reports/jacoco/test/
  jacocoTestReport.xml`, `coverage.json` in that order and enables the
  phase only when a report exists. Silent no-op otherwise (no spurious
  "no report found" warning on repos without tests). `--coverage`
  still force-enables and warns; `--no-coverage` force-disables.

New exports for tests: `resolveSbomEnabled`, `resolveScanEnabled`,
`resolveCoverageEnabled`, `detectCoverageReport`.

17 new unit tests (2044 total, was 2027):
- 3 × resolveSbomEnabled (default on, explicit on, --no-sbom)
- 3 × resolveScanEnabled (default on, explicit on, --no-scan)
- 7 × detectCoverageReport (5 candidate paths + miss + priority order)
- 4 × resolveCoverageEnabled (explicit true/false, undefined+none,
       undefined+report-found)

Smoke-tested end-to-end on a throwaway 2-function TS repo:
- default `analyze` → runs scan (5 scanners, SARIF written, findings
  ingested), emits sbom.cyclonedx.json + sbom.spdx.json, detects no
  coverage report → silent no-op. Total ~10s.
- `analyze --no-scan --no-sbom` → 727 ms; pre-flip fast path preserved.
- `analyze` with `lcov.info` in repo → coverage phase auto-engages.

Docs: CLI reference table + indexing guide updated with the new
defaults and the auto-detect candidate paths.

BREAKING CHANGE: bare `codehub analyze` now runs Priority-1 scanners
and emits SBOMs. Pass `--no-scan` / `--no-sbom` for the pre-flip
graph-only behavior. Workflows that previously invoked `codehub scan`
separately still work; the post-analyze scan just makes the separate
invocation optional.
@theagenticguy theagenticguy force-pushed the feat/analyze-fast-default branch from f06d1f2 to 154cd99 Compare May 14, 2026 14:29
@theagenticguy theagenticguy merged commit 62bff2f into main May 14, 2026
37 checks passed
@theagenticguy theagenticguy deleted the feat/analyze-fast-default branch May 14, 2026 16:05
@github-actions github-actions Bot mentioned this pull request May 14, 2026
theagenticguy pushed a commit that referenced this pull request May 15, 2026
🤖 Automated release via release-please
---


<details><summary>cli: 0.3.0</summary>

##
[0.3.0](cli-v0.2.3...cli-v0.3.0)
(2026-05-15)


### ⚠ BREAKING CHANGES

* **cli:** make `codehub analyze` the one-command index (fast + scan +
sbom + coverage-auto; summaries opt-in)
([#110](#110))
* **plugin:** the five slash commands (/probe, /verdict, /owners,
/audit-deps, /rename) shipped by the Claude Code plugin are gone with no
backward compatibility. Slash commands as a plugin surface are
deprecated; the same workflows are still available via:

### Features

* **cli:** make `codehub analyze` the one-command index (fast + scan +
sbom + coverage-auto; summaries opt-in)
([#110](#110))
([62bff2f](62bff2f))
* **plugin:** remove deprecated Claude Code slash commands
([5769fc1](5769fc1))
</details>

<details><summary>root: 0.4.0</summary>

##
[0.4.0](root-v0.3.2...root-v0.4.0)
(2026-05-15)


### ⚠ BREAKING CHANGES

* **cli:** make `codehub analyze` the one-command index (fast + scan +
sbom + coverage-auto; summaries opt-in)
([#110](#110))
* **plugin:** the five slash commands (/probe, /verdict, /owners,
/audit-deps, /rename) shipped by the Claude Code plugin are gone with no
backward compatibility. Slash commands as a plugin surface are
deprecated; the same workflows are still available via:

### Features

* **cli:** make `codehub analyze` the one-command index (fast + scan +
sbom + coverage-auto; summaries opt-in)
([#110](#110))
([62bff2f](62bff2f))
* **plugin:** remove deprecated Claude Code slash commands
([5769fc1](5769fc1))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant