Skip to content

feat!: WASM-only parser path; drop native tree-sitter from runtime#113

Merged
theagenticguy merged 3 commits into
mainfrom
feat/wasm-only-parser-path
May 15, 2026
Merged

feat!: WASM-only parser path; drop native tree-sitter from runtime#113
theagenticguy merged 3 commits into
mainfrom
feat/wasm-only-parser-path

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

Summary

Make npm install -g @opencodehub/cli@latest bulletproof on every common multi-Node-installer setup (mise / nvm / Homebrew / Volta / corepack) across Node 20/22/24 on Linux + macOS. Two compounding install failures share one root cause: the published cli pulled 13 native tree-sitter-* grammar packages plus tree-sitter@0.25.0 core through @opencodehub/ingestion, including tree-sitter-swift@0.7.1 which transitively depended on tree-sitter-cli@0.23.2 whose postinstall fetched a platform binary from GitHub releases. A 504 from that endpoint broke every cold-cache install.

This PR makes WASM the only parser path at the published boundary. All 15 grammar .wasm blobs plus the web-tree-sitter runtime wasm are vendored at packages/ingestion/vendor/wasms/. Native tree-sitter and the 14 grammar packages are removed from runtime AND devDependencies — they're not workspace deps anymore. Re-vendoring grammars is a one-shot operation documented in scripts/build-vendor-wasms.sh.

Verified locally via scripts/verify-global-install.sh local: 9/9 gates green. Install completes in 16s (was 84s with native; budget 60s). Zero ERESOLVE warnings. Zero GHCR fetches. Zero tree-sitter-cli postinstall.

Failing install transcript that triggered this work

npm install -g @opencodehub/cli@latest
...
npm warn ERESOLVE overriding peer dependency  (× 7 grammars)
...
npm error code 1
npm error path /home/.../node_modules/@opencodehub/cli/node_modules/tree-sitter-cli
npm error command failed
npm error command sh -c node install.js
npm error Downloading https://github.com/tree-sitter/tree-sitter/releases/download/v0.23.2/tree-sitter-linux-x64.gz
npm error Download failed
npm error status: 504

BREAKING CHANGES

  • Parser runtime is WASM-only. OCH_NATIVE_PARSER env var and --native-parser CLI flag are removed. Setting the env var emits a one-shot stderr advisory and is then deleted from process.env. Passing the flag exits non-zero with commander's unknown option error.
  • engines.node lowered to >=20.0.0 (native ABI requirement removed).
  • @opencodehub/cli 0.3.0 → 0.4.0
  • @opencodehub/ingestion 0.3.2 → 0.4.0
  • @opencodehub/pack 0.1.3 → 0.2.0 (its dep on ingestion@0.4.0 is breaking)
  • @opencodehub/cobol-proleap 0.1.3 → 0.2.0 (same reason)

Highlights

  • wasm-fallback.tswasm-runtime.ts. Resolver collapsed to a flat Record<LanguageId, string> map. Parser.init({ locateFile }) pins the vendored runtime WASM. parse-worker.ts 308 → 144 lines. grammar-registry.ts 337 → 200 lines. ~600 LOC of native dispatch code deleted.
  • complexity.ts ported to web-tree-sitter. Cyclomatic complexity now runs on every install (was silently zero on Node 24 default).
  • verdict.test.ts adds a tier-flip fixture confirming the complexity → verdict pipeline still drives risk-tier consumers (verdict.ts:101,688).
  • New scripts/verify-global-install.sh packs every publishable workspace package and runs 5 hard gates + 4 smoke commands.
  • New .github/workflows/verify-global-install.yml runs the 9-cell install matrix (Linux/macOS × Node 20/22/24 × mise/nvm/Homebrew/Volta).
  • ADR 0013 marked superseded; ADR 0015 captures the new shape.
  • CHANGELOG entries added; CLAUDE.md, root README.md, and 14 docs-site pages scrubbed of OCH_NATIVE_PARSER / --native-parser references.

Provenance

  • Multi-explorer plan: planning/bulletproof-npm-install/{explorer-architectural,explorer-speed,explorer-simple,plan.md}. Three independent Opus 4.7 agents converged on the same target shape — high-confidence signal the architectural call is correct.
  • Durable lesson persisted: .erpaval/solutions/best-practices/workspace-tarball-pack-all-publishables.md — workspace clis whose deps reference other workspace packages need ALL publishable tarballs supplied to npm install -g, otherwise npm pulls registry copies and masks install-graph regressions.

Test plan

  • pnpm -r build && pnpm -r test green workspace-wide (after dist clean)
  • bash scripts/verify-global-install.sh local 9/9 gates pass
  • node packages/ingestion/scripts/verify-vendor-wasms.mjs exits 0
  • pnpm pack -C packages/cli produces tarball with @opencodehub/ingestion@0.4.0 pin
  • npm install -g <tarballs> runs codehub --version / --help / analyze / query cleanly with no GHCR fetches and 16-second wall time
  • CI 9-cell matrix (.github/workflows/verify-global-install.yml) — first run on this PR
  • Tarball-pack gates (release-please) bump all 4 packages (cli, ingestion, pack, cobol-proleap) in the same release

Make `npm install -g @opencodehub/cli@latest` bulletproof on every common
multi-Node-installer setup (mise / nvm / Homebrew / Volta / corepack) across
Node 20/22/24 on Linux + macOS. Two compounding install failures share one
root cause: the published cli pulled 13 native `tree-sitter-*` grammar
packages plus `tree-sitter@0.25.0` core through `@opencodehub/ingestion`,
including `tree-sitter-swift@0.7.1` which transitively depended on
`tree-sitter-cli@0.23.2` whose postinstall fetched a platform binary from
GitHub releases. A 504 from that endpoint broke every cold-cache install.

This PR makes WASM the only parser path at the published boundary. All 15
grammar `.wasm` blobs plus the `web-tree-sitter` runtime wasm are vendored
at `packages/ingestion/vendor/wasms/` (28 MB; net consumer download is
smaller than today because native deps used to drag ~50 MB of `.cc` source
+ `.node` prebuilds). Native `tree-sitter` and the 14 grammar packages are
removed from runtime AND devDependencies — they're not workspace deps
anymore. Re-vendoring grammars is a one-shot operation documented in
`scripts/build-vendor-wasms.sh`.

Verified locally via `scripts/verify-global-install.sh local`:
9/9 gates green. Install completes in 16s (was 84s with native; budget 60s).
Zero ERESOLVE warnings. Zero GHCR fetches. Zero `tree-sitter-cli` postinstall.

BREAKING CHANGES

* Parser runtime is WASM-only. `OCH_NATIVE_PARSER` env var and
  `--native-parser` CLI flag are removed. Setting the env var emits a
  one-shot stderr advisory and is then deleted from `process.env`. Passing
  the flag exits non-zero with commander's "unknown option" error.
* `engines.node` lowered to `>=20.0.0` (native ABI requirement removed).
* `@opencodehub/cli` 0.3.0 → 0.4.0
* `@opencodehub/ingestion` 0.3.2 → 0.4.0
* `@opencodehub/pack` 0.1.3 → 0.2.0 (its dep on ingestion@0.4.0 is breaking)
* `@opencodehub/cobol-proleap` 0.1.3 → 0.2.0 (same reason)

Highlights

* `wasm-fallback.ts` → `wasm-runtime.ts`. Resolver collapsed to a flat
  `Record<LanguageId, string>` map. `Parser.init({ locateFile })` pins the
  vendored runtime WASM. `parse-worker.ts` 308 → 144 lines. `grammar-registry.ts`
  337 → 200 lines. ~600 LOC of native dispatch code deleted.
* `complexity.ts` ported to `web-tree-sitter`. Cyclomatic complexity now
  runs on every install (was silently zero on Node 24 default).
* `verdict.test.ts` adds a tier-flip fixture confirming the
  `complexity → verdict` pipeline still drives risk-tier consumers.
* `scripts/verify-global-install.sh` packs every publishable workspace
  package and runs 5 hard gates + 4 smoke commands. The new
  `.github/workflows/verify-global-install.yml` runs the 9-cell matrix.
* ADR 0013 marked superseded; ADR 0015 captures the new shape.
* CHANGELOG entries added; CLAUDE.md, root README.md, and 14 docs-site
  pages scrubbed of `OCH_NATIVE_PARSER` / `--native-parser` references.

Provenance

* Multi-explorer plan: planning/bulletproof-npm-install/{explorer-*.md, plan.md}
* Durable lesson:
  .erpaval/solutions/best-practices/workspace-tarball-pack-all-publishables.md

Test plan

* [x] `pnpm -r build && pnpm -r test` green workspace-wide
* [x] `bash scripts/verify-global-install.sh local` 9/9 gates pass
* [x] `node packages/ingestion/scripts/verify-vendor-wasms.mjs` exits 0
* [x] `pnpm pack -C packages/cli` produces tarball with @opencodehub/ingestion@0.4.0 pin
* [x] `npm install -g <tarball>` runs `codehub --version` / `--help` /
  `analyze` / `query` cleanly with no GHCR fetches and 16-second wall time
* [ ] CI 9-cell matrix (Linux/macOS × Node 20/22/24 × mise/nvm/Homebrew/Volta) — pending CI run
Comment thread packages/ingestion/scripts/verify-vendor-wasms.mjs Fixed
`feat!:` in the previous commit triggers release-please to open a release PR
on merge that bumps versions and writes the CHANGELOG entries automatically.
Manually-bumped `version` fields drift from `.release-please-manifest.json`
(which release-please reads for state), and manually-prepended CHANGELOG
sections duplicate what release-please will generate.

Reverts:
- `packages/cli/package.json` 0.4.0 → 0.3.0
- `packages/ingestion/package.json` 0.4.0 → 0.3.2
- `packages/pack/package.json` 0.2.0 → 0.1.3
- `packages/cobol-proleap/package.json` 0.2.0 → 0.1.3
- `CHANGELOG.md`, `packages/cli/CHANGELOG.md`, `packages/ingestion/CHANGELOG.md`
  prepended sections

Source code, tests, docs, ADRs, and vendored WASMs from the previous commit
are unchanged. Release-please's `node-workspace` plugin will propagate the
ingestion bump through pack + cobol-proleap + cli on the release PR.
- Apply biome format to grammar-registry.ts, wasm-runtime.ts,
  wasm-grammar-resolution.test.ts, complexity.ts (CI lint gate).
- Replace existsSync→statSync→openSync triplet in
  packages/ingestion/scripts/verify-vendor-wasms.mjs with a single
  openSync per file plus error-handling. Fixes CodeQL high-severity
  filesystem-race-condition finding (TOCTOU) on line 59.

No behavior change. Verify still passes the 16 vendored WASM files.
@theagenticguy theagenticguy merged commit 0a9e0cb into main May 15, 2026
42 of 46 checks passed
@theagenticguy theagenticguy deleted the feat/wasm-only-parser-path branch May 15, 2026 14:38
@github-actions github-actions Bot mentioned this pull request May 15, 2026
theagenticguy added a commit that referenced this pull request May 15, 2026
🤖 Automated release via release-please
---


<details><summary>analysis: 0.2.0</summary>

##
[0.2.0](analysis-v0.1.2...analysis-v0.2.0)
(2026-05-15)


### ⚠ BREAKING CHANGES

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))

### Features

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))
([0a9e0cb](0a9e0cb))
</details>

<details><summary>cli: 0.4.0</summary>

##
[0.4.0](cli-v0.3.0...cli-v0.4.0)
(2026-05-15)


### ⚠ BREAKING CHANGES

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))

### Features

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))
([0a9e0cb](0a9e0cb))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.2.0
    * @opencodehub/ingestion bumped to 0.4.0
    * @opencodehub/mcp bumped to 0.3.2
    * @opencodehub/pack bumped to 0.1.4
</details>

<details><summary>cobol-proleap: 0.1.4</summary>

##
[0.1.4](cobol-proleap-v0.1.3...cobol-proleap-v0.1.4)
(2026-05-15)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/ingestion bumped to 0.4.0
</details>

<details><summary>ingestion: 0.4.0</summary>

##
[0.4.0](ingestion-v0.3.2...ingestion-v0.4.0)
(2026-05-15)


### ⚠ BREAKING CHANGES

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))

### Features

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))
([0a9e0cb](0a9e0cb))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.2.0
    * @opencodehub/scip-ingest bumped to 0.2.1
</details>

<details><summary>mcp: 0.3.2</summary>

##
[0.3.2](mcp-v0.3.1...mcp-v0.3.2)
(2026-05-15)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.2.0
    * @opencodehub/pack bumped to 0.1.4
</details>

<details><summary>pack: 0.1.4</summary>

##
[0.1.4](pack-v0.1.3...pack-v0.1.4)
(2026-05-15)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.2.0
    * @opencodehub/ingestion bumped to 0.4.0
</details>

<details><summary>scip-ingest: 0.2.1</summary>

##
[0.2.1](scip-ingest-v0.2.0...scip-ingest-v0.2.1)
(2026-05-15)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.2.0
</details>

<details><summary>root: 0.5.0</summary>

##
[0.5.0](root-v0.4.0...root-v0.5.0)
(2026-05-15)


### ⚠ BREAKING CHANGES

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))

### Features

* WASM-only parser path; drop native tree-sitter from runtime
([#113](#113))
([0a9e0cb](0a9e0cb))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Laith Al-Saadoon <alsaadoonlaith@gmail.com>
theagenticguy added a commit that referenced this pull request May 15, 2026
## Summary

Top-level `permissions: contents: read` in
`.github/workflows/release-please.yml` was capping every job below it.
The `release` job (which fans out to `release.yml` via `workflow_call`
for the npm-publish step) declared `id-token: write` and `contents:
write` at its own job level, but those declarations are silently a no-op
when not in the top-level set.

Verified by inspecting the failing job's log header:
```
GITHUB_TOKEN Permissions
  Contents: read
  Metadata: read
```
`id-token: write` missing despite being declared at the job level in
release.yml.

This is the root cause of every recent npm-publish failure on this repo:
```
Skipped OIDC: ERR_PNPM_AUTH_TOKEN_EXCHANGE: Failed token exchange request
with body message: Unknown error (status code 404)
```

The 404 isn't from npm rejecting trust — it's from GitHub returning no
OIDC token because `id-token: write` wasn't actually granted to the
runner. Verified all 17 `@opencodehub/*` packages already have the
correct trust relationship (`release.yml` + `theagenticguy/opencodehub`)
configured on npmjs.com via `npm trust list`.

## Fix

Top-level `permissions:` is now the union of every permission used by
any job, including those reached transitively via `workflow_call`. Each
job continues to narrow to its own least-privilege subset, so
Scorecard's Token-Permissions check still passes.

## Test plan

- [x] `npm trust list @opencodehub/cli` → confirms trust relationship
exists with workflow `release.yml`
- [ ] Merge this PR. Push to main triggers `Release Please` workflow.
- [ ] release-please opens release PR with version bumps from PR #113's
`feat!:`
- [ ] Merge release PR → release.yml's npm-publish job → all 17 packages
publish with provenance.
- [ ] `npm view @opencodehub/cli version` returns `0.4.0`
- [ ] `npm install -g @opencodehub/cli@latest` works on a clean machine

## Provenance

Durable lesson persisted:
`.erpaval/solutions/conventions/workflow-call-permissions-ceiling.md`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants