Skip to content

Commit b473736

Browse files
committed
chore: drop ooxml-call dev harness; add data/README; surface structural tools in README
- Remove scripts/ooxml-call.ts: 23 query-layer tests cover the same dispatch path. The harness was load-bearing only while we were verifying e2e before tests existed. - Add data/README.md describing what each subpath under data/ is for (sources.json committed manifest, xsd-cache gitignored cache, behavior-notes future curated content). - Update README.md to list the structural tools alongside semantic search; both flavors share one MCP endpoint after the prod populate. - Update CLAUDE.md and scripts/ingest-xsd/README.md to drop ooxml:call references; smoke testing now points at tests/mcp-server/. 44/0 still.
1 parent 076f096 commit b473736

6 files changed

Lines changed: 46 additions & 66 deletions

File tree

CLAUDE.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ scripts/
3737
ingest-xsd/ ECMA XSDs -> schema graph (structural query corpus)
3838
sources-sync.ts data/sources.json -> reference_sources
3939
db-migrate.ts Apply db/migrations/*.sql in order
40-
ooxml-call.ts Local CLI harness for the structural MCP tools
4140
db/
4241
schema.sql PostgreSQL + pgvector + XSD schema graph
4342
migrations/ Numbered, idempotent SQL migrations

README.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,10 @@ The OOXML spec, explained by people who actually implemented it.
1010

1111
An interactive reference for ECMA-376 (Office Open XML) built by the [SuperDoc — DOCX editing and tooling](https://superdoc.dev) team. Every page combines XML structure, live rendered previews, and implementation notes that tell you what the spec doesn't.
1212

13-
- **Live previews** — Edit XML and see it render in real-time. Every example is a working document.
14-
- **Implementation notes** — Where Word diverges from the spec, what will break your code, and what to do about it.
15-
- **Semantic spec search** — 18,000+ spec chunks searchable by meaning via MCP server.
13+
- **Live previews** - Edit XML and see it render in real-time. Every example is a working document.
14+
- **Implementation notes** - Where Word diverges from the spec, what will break your code, and what to do about it.
15+
- **Semantic spec search** - 18,000+ spec chunks searchable by meaning via MCP server.
16+
- **Structural schema lookup** - Element children, attributes, types, enums, namespaces. Same MCP server, deterministic answers from the parsed XSDs.
1617

1718
## Why?
1819

@@ -22,13 +23,16 @@ We faced this at SuperDoc — building a document engine on native OOXML with no
2223

2324
## MCP Server
2425

25-
Search the ECMA-376 spec with AI. Ask questions in natural language, get answers grounded in the actual specification.
26+
Ask questions in natural language and get answers grounded in the spec, or query the schema graph for precise structural answers.
2627

2728
```bash
2829
claude mcp add --transport http ecma-spec https://api.ooxml.dev/mcp
2930
```
3031

31-
Works with Claude Code, Cursor, and any MCP-compatible client. Three tools: `search_ecma_spec` (semantic search), `get_section` (by ID), and `list_parts` (browse structure).
32+
Works with Claude Code, Cursor, and any MCP-compatible client. Two flavors of tools share one server:
33+
34+
- **Semantic** (over the spec PDF): `search_ecma_spec`, `get_section`, `list_parts`
35+
- **Structural** (over the parsed XSDs): `ooxml_lookup_element`, `ooxml_lookup_type`, `ooxml_children`, `ooxml_attributes`, `ooxml_enum`, `ooxml_namespace_info`
3236

3337
## Development
3438

data/README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# data/
2+
3+
Repository data root. Three categories live here:
4+
5+
- **`sources.json`** (committed): canonical source manifest. One entry per
6+
artifact (ECMA-376 PDFs, ECMA Part 4 XSD zip, future MS-OI29500, etc.) with
7+
url, edition, sha256, and a license note. `bun run sources:sync` upserts these
8+
rows into the `reference_sources` table. Edit by hand; the sync script reads
9+
it.
10+
11+
- **`xsd-cache/`** (gitignored): local XSD download cache. Populated by
12+
`bun run xsd:fetch`. Contents are not load-bearing for the schema graph
13+
itself - the graph lives in Postgres - they're just the source artifacts
14+
the ingest reads. Safe to delete; regenerated on the next fetch.
15+
16+
- **`behavior-notes/`** (committed when populated): curated YAML files
17+
documenting how Microsoft Office actually behaves vs. the spec. A future
18+
ingest will load these into the `behavior_notes` table so structural tool
19+
responses can carry "what Word actually does" alongside the schema-level
20+
answer. Empty until that workflow lands.
21+
22+
What does NOT live here:
23+
24+
- Generated build output: `dist/`, `dev/data/extracted/`, `dev/data/chunks/`,
25+
`dev/data/embedded/` (all under `dev/`, gitignored).
26+
- Database state: lives in Postgres; reproducible from the manifest +
27+
ingest scripts.

package.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@
2828
"pdf:setup": "pip install -r scripts/requirements.txt",
2929
"xsd:fetch": "bun scripts/ingest-xsd/fetch.ts",
3030
"xsd:ingest": "bun scripts/ingest-xsd/ingest.ts",
31-
"ooxml:call": "bun scripts/ooxml-call.ts",
3231
"test": "export TEST_DATABASE_URL=${TEST_DATABASE_URL:-postgresql://postgres:postgres@localhost:5432/ecma_spec} && bun test tests/db/ && bun test tests/ingest-xsd/ && bun test tests/mcp-server/"
3332
},
3433
"devDependencies": {

scripts/ingest-xsd/README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,15 @@ bun run xsd:ingest --schema-dir <path> --entrypoint <file> \
6666

6767
## Smoke-test the result
6868

69+
The query layer is exercised by `tests/mcp-server/ooxml-queries.test.ts`
70+
against the same fixtures the ingest tests use. Run with:
71+
6972
```bash
70-
bun run ooxml:call ooxml_children '{"qname":"w:tbl"}'
71-
bun run ooxml:call ooxml_attributes '{"qname":"w:pBdr"}'
72-
bun run ooxml:call ooxml_enum '{"qname":"w:ST_Jc"}'
73+
bun test tests/mcp-server/
7374
```
75+
76+
To hit the live MCP, deploy the Worker and call the tools through any
77+
MCP client. For local poking against the dev DB, write a small bun
78+
script that imports `runOoxmlTool` from
79+
`apps/mcp-server/src/ooxml-tools.ts` with a `postgres.js`-backed sql
80+
function.

scripts/ooxml-call.ts

Lines changed: 0 additions & 56 deletions
This file was deleted.

0 commit comments

Comments
 (0)