Skip to content

Commit 99ec4eb

Browse files
committed
fix(mcp): address review correctness gaps + tighten test isolation
Five real issues from review against the WML schema graph; one cheap DX win folded in. Issue 1 - lookupSymbol returned local-only symbols by qname. After per-owner scoping landed, lookupElement("w:tblGrid") could return either CT_TblGridBase or CT_TblGrid depending on which row postgres picked first. Fixed: lookupSymbol now filters parent_symbol_id IS NULL, so it only returns top-level symbols addressable by qname. Local elements are reachable through getChildren on their owning type. Issues 3 + 4 - getAttributes mishandled inheritance. complexContent/restriction inherits attribute uses per XSD §3.4.2.2; only `use="prohibited"` drops them. The previous code walked the base only for `extension`, so every WML *Change type built on a restriction reported zero base attrs (id, author, date silently missing). The walk order also had base attrs emitted first, making the older docstring claim about "derived wins" wrong in practice. Fixed: derived attrs (and their attributeGroup refs) emit first, then the base is walked for both extension AND restriction; seenAttrs dedup makes derived redeclarations win. Two new tests pin both behaviors with a CT_TrackedBase / CT_TrackedRestricted / CT_OverrideDerived fixture. Issue 5 - stale unscoped local symbols on dev DBs. The migration that introduced parent_symbol_id never purged the pre-migration parent-NULL collapsed rows, so re-ingest left them alongside the new per-owner symbols. Fixed: ingest now purges everything this source previously wrote at the start of the transaction, then rewrites. Non-cascading FKs (xsd_inheritance_edges.base_symbol_id and friends) are cleaned explicitly first. Idempotency test updated to reflect the new semantics: every stat equals first.X across re-runs and DB row counts stay stable. Issue 2 - tests TRUNCATE through DATABASE_URL. A developer with DATABASE_URL pointed at Neon could wipe their schema graph by running `bun test`. Fixed: tests now require TEST_DATABASE_URL (no fallback) and refuse to run unless the hostname is local. Shared guard at tests/test-db.ts; package.json test script defaults TEST_DATABASE_URL to local Postgres. DX - ooxml_children's group fallback inlined a 28-line copy of lookupSymbol. Replaced with a 4-line lookupSymbol("group", ...) call, and dropped a dead-code branch in getChildrenRecursive that re-set source="inherited" on entries the recursive call had already labeled. 44/0 across db / ingest / mcp-server.
1 parent 5013eec commit 99ec4eb

10 files changed

Lines changed: 217 additions & 81 deletions

File tree

apps/mcp-server/src/ooxml-queries.ts

Lines changed: 34 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,14 @@ export interface SymbolHit {
8787
sourceName: string | null;
8888
}
8989

90-
/** Look up a symbol by namespace + localName + kind in a given profile. */
90+
/**
91+
* Look up a top-level symbol by namespace + localName + kind in a given profile.
92+
*
93+
* Local element symbols (parent_symbol_id IS NOT NULL) are intentionally excluded:
94+
* an inline `<xsd:element name="X" type="...">` declared in two different
95+
* complexTypes is two distinct symbols whose identity depends on context. Reach
96+
* those through `getChildren(parentTypeSymbolId, profile)` instead.
97+
*/
9198
export async function lookupSymbol(
9299
sql: Sql,
93100
namespace: string,
@@ -105,6 +112,7 @@ export async function lookupSymbol(
105112
LEFT JOIN reference_sources src ON src.id = sp.source_id
106113
WHERE s.local_name = ${localName}
107114
AND s.kind = ${kind}
115+
AND s.parent_symbol_id IS NULL
108116
AND ns.uri = ${namespace}
109117
AND p.name = ${profile}
110118
LIMIT 1
@@ -331,15 +339,12 @@ async function getChildrenRecursive(
331339
const out: ChildEdge[] = [];
332340

333341
// Inheritance: extension prepends base content; restriction replaces it.
342+
// Recursing with isRoot=false sets source="inherited" inside the base call,
343+
// so we just push the entries through.
334344
const inherit = await getInheritanceEdge(sql, symbolId, profile);
335345
if (inherit && inherit.relation === "extension") {
336346
const base = await getChildrenRecursive(sql, inherit.baseId, profile, false);
337-
// Already-emitted entries in `base` already carry their owning type name;
338-
// flip their source to "inherited" relative to the root request.
339-
for (const c of base) {
340-
if (isRoot) c.source = "inherited";
341-
out.push(c);
342-
}
347+
out.push(...base);
343348
}
344349

345350
// Walk this type's own top-level compositors.
@@ -408,13 +413,17 @@ export interface AttrEntry {
408413
}
409414

410415
/**
411-
* Attributes on a type symbol. Walks inheritance per XSD semantics
412-
* (extension prepends base attrs; restriction replaces them) and recurses
413-
* through attributeGroup refs, including refs nested inside other
414-
* attributeGroups. Cycles are guarded by a visited-set.
416+
* Attributes on a type symbol, applying XSD §3.4.2.2 inheritance:
417+
* - extension: derived's own attribute uses are unioned with the base's.
418+
* - restriction: derived's attribute uses also union with the base's, with
419+
* the derived narrowing or prohibiting individual entries. Restriction
420+
* CANNOT silently drop a base attribute; only `use="prohibited"` does.
415421
*
416-
* Names are de-duplicated: a derived type's redeclaration of an inherited
417-
* attribute wins, so the first occurrence in walk order is what surfaces.
422+
* Walk order emits the derived type's own attributes first, then attributeGroup
423+
* refs the derived holds, then recurses into the base. Names are de-duplicated
424+
* by first occurrence, so a derived redeclaration wins and base attrs only
425+
* surface when the derived didn't override them. attributeGroup nesting is
426+
* walked recursively with a visited-set against cycles.
418427
*/
419428
export async function getAttributes(
420429
sql: Sql,
@@ -437,17 +446,11 @@ async function collectAttrsForType(
437446
seenAttrs: Set<string>,
438447
visitedGroups: Set<number>,
439448
): Promise<void> {
440-
// Per XSD: extension prepends base; restriction replaces. We always emit base
441-
// first when extending so derived declarations correctly override later.
442-
const inherit = await getInheritanceEdge(sql, symbolId, profile);
443-
if (inherit && inherit.relation === "extension") {
444-
await collectAttrsForType(sql, inherit.baseId, profile, false, out, seenAttrs, visitedGroups);
445-
}
446-
447449
const ownName = await getSymbolName(sql, symbolId);
448450

449451
// Direct attribute declarations on this symbol (whether complexType or
450-
// attributeGroup; both can carry xsd:attribute children).
452+
// attributeGroup; both can carry xsd:attribute children). Emit first so
453+
// derived redeclarations override base attrs found below.
451454
const directAttrs = await sql`
452455
SELECT a.local_name, a.attr_use, a.default_value, a.fixed_value, a.type_ref, a.order_index
453456
FROM xsd_attr_edges a
@@ -470,7 +473,8 @@ async function collectAttrsForType(
470473
});
471474
}
472475

473-
// attributeGroup refs hanging off this symbol; recurse into each.
476+
// attributeGroup refs the derived itself holds; recurse into each before
477+
// touching the base so a derived's group-bundled attr also wins.
474478
const agRefs = await sql`
475479
SELECT ge.group_symbol_id
476480
FROM xsd_group_edges ge
@@ -490,6 +494,14 @@ async function collectAttrsForType(
490494
visitedGroups,
491495
);
492496
}
497+
498+
// Inherited base attrs. Both extension and restriction inherit attribute uses
499+
// per XSD §3.4.2.2; restriction can override or prohibit but cannot drop
500+
// silently. Dedup by seenAttrs so the derived's redeclarations win.
501+
const inherit = await getInheritanceEdge(sql, symbolId, profile);
502+
if (inherit) {
503+
await collectAttrsForType(sql, inherit.baseId, profile, false, out, seenAttrs, visitedGroups);
504+
}
493505
}
494506

495507
async function collectAttrsFromAttributeGroup(

apps/mcp-server/src/ooxml-tools.ts

Lines changed: 10 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ import {
2121
getEnums,
2222
getNamespaceInfo,
2323
lookupElement,
24+
lookupSymbol,
2425
lookupSymbolByTypeRef,
2526
lookupType,
2627
type NamespaceInfo,
@@ -212,34 +213,15 @@ export async function runOoxmlTool(
212213
if (elementSym?.typeRef) {
213214
typeSym = await lookupSymbolByTypeRef(sql, elementSym.typeRef, profile);
214215
} else if (!elementSym) {
215-
// Fall back to looking for a group with this name (so EG_PContent works).
216-
const grp = await sql`
217-
SELECT s.id, s.local_name, s.kind, s.vocabulary_id, s.type_ref,
218-
ns.uri AS namespace_uri, p.name AS profile_name, src.name AS source_name
219-
FROM xsd_symbols s
220-
JOIN xsd_symbol_profiles sp ON sp.symbol_id = s.id
221-
JOIN xsd_namespaces ns ON ns.id = sp.namespace_id
222-
JOIN xsd_profiles p ON p.id = sp.profile_id
223-
LEFT JOIN reference_sources src ON src.id = sp.source_id
224-
WHERE s.local_name = ${q.qname.localName}
225-
AND s.kind = 'group'
226-
AND ns.uri = ${q.qname.namespace}
227-
AND p.name = ${profile}
228-
LIMIT 1
229-
`;
230-
const r = grp[0];
231-
if (r) {
232-
typeSym = {
233-
id: r.id as number,
234-
vocabularyId: r.vocabulary_id as string,
235-
localName: r.local_name as string,
236-
kind: r.kind as string,
237-
typeRef: r.type_ref as string | null,
238-
namespaceUri: r.namespace_uri as string,
239-
profileName: r.profile_name as string,
240-
sourceName: r.source_name as string | null,
241-
};
242-
}
216+
// Fall back to looking for a named xsd:group with this qname (so
217+
// EG_PContent and friends are reachable directly).
218+
typeSym = await lookupSymbol(
219+
sql,
220+
q.qname.namespace,
221+
q.qname.localName,
222+
"group",
223+
profile,
224+
);
243225
}
244226
}
245227
if (!typeSym) {

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
"xsd:fetch": "bun scripts/ingest-xsd/fetch.ts",
3030
"xsd:ingest": "bun scripts/ingest-xsd/ingest.ts",
3131
"ooxml:call": "bun scripts/ooxml-call.ts",
32-
"test": "bun test tests/db/ && bun test tests/ingest-xsd/ && bun test tests/mcp-server/"
32+
"test": "export TEST_DATABASE_URL=${TEST_DATABASE_URL:-postgresql://postgres:postgres@localhost:5432/ecma_spec} && bun test tests/db/ && bun test tests/ingest-xsd/ && bun test tests/mcp-server/"
3333
},
3434
"devDependencies": {
3535
"@biomejs/biome": "^2.3.13",

scripts/ingest-xsd/ingest.ts

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,40 @@ export async function ingestSchemaSet(opts: IngestSchemaSetOptions): Promise<Ing
9999
const profileId = await ensureProfile(sql, opts.profileName);
100100
const sourceId = await lookupSourceId(sql, opts.sourceName);
101101

102+
// Purge anything this source previously wrote so re-ingest is a clean
103+
// rewrite and tolerant of schema migrations that change the symbol
104+
// shape (e.g. the addition of parent_symbol_id for local-element
105+
// scoping).
106+
//
107+
// Several FKs reference xsd_symbols.id WITHOUT cascade
108+
// (xsd_inheritance_edges.base_symbol_id, xsd_child_edges.child_symbol_id,
109+
// xsd_attr_edges.attr_symbol_id, xsd_group_edges.group_symbol_id), so
110+
// those rows must be cleaned explicitly before the symbol delete. The
111+
// LHS FKs (parent_symbol_id, symbol_id) DO cascade, as does
112+
// xsd_symbol_profiles.symbol_id and behavior_notes.symbol_id. (When
113+
// curated behavior_notes start landing, switch to natural-key
114+
// reconciliation rather than cascade-delete.)
115+
await sql`
116+
DELETE FROM xsd_inheritance_edges
117+
WHERE base_symbol_id IN (SELECT symbol_id FROM xsd_symbol_profiles WHERE source_id = ${sourceId})
118+
`;
119+
await sql`
120+
DELETE FROM xsd_child_edges
121+
WHERE child_symbol_id IN (SELECT symbol_id FROM xsd_symbol_profiles WHERE source_id = ${sourceId})
122+
`;
123+
await sql`
124+
DELETE FROM xsd_attr_edges
125+
WHERE attr_symbol_id IN (SELECT symbol_id FROM xsd_symbol_profiles WHERE source_id = ${sourceId})
126+
`;
127+
await sql`
128+
DELETE FROM xsd_group_edges
129+
WHERE group_symbol_id IN (SELECT symbol_id FROM xsd_symbol_profiles WHERE source_id = ${sourceId})
130+
`;
131+
await sql`
132+
DELETE FROM xsd_symbols
133+
WHERE id IN (SELECT symbol_id FROM xsd_symbol_profiles WHERE source_id = ${sourceId})
134+
`;
135+
102136
// Pass 1: namespaces, symbols, profile memberships.
103137
const namespaceIds = new Map<string, number>();
104138
const symbolIds = new Map<string, number>(); // canonical (vocab|local|kind) -> id

tests/db/xsd-schema.test.ts

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,9 @@
1212
import { afterAll, beforeAll, beforeEach, expect, test } from "bun:test";
1313
import { createDbClient, type DbClient } from "../../packages/shared/src/db/index.ts";
1414

15-
const databaseUrl = process.env.DATABASE_URL;
16-
if (!databaseUrl) {
17-
throw new Error("Missing DATABASE_URL for integration tests");
18-
}
15+
import { getTestDatabaseUrl } from "../test-db.ts";
16+
17+
const databaseUrl = getTestDatabaseUrl();
1918

2019
let db: DbClient;
2120

tests/ingest-xsd/fixtures/main.xsd

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,5 +108,26 @@
108108
<xsd:element name="shared" type="xsd:string"/>
109109
</xsd:sequence>
110110
</xsd:complexType>
111+
<!-- Mirrors WML's *Change types: a complexContent/restriction that
112+
carries no attribute redeclarations should still inherit the base's
113+
attribute uses, not silently drop them. -->
114+
<xsd:complexType name="CT_TrackedBase">
115+
<xsd:attribute name="id" type="xsd:string" use="required"/>
116+
<xsd:attribute name="author" type="xsd:string"/>
117+
</xsd:complexType>
118+
<xsd:complexType name="CT_TrackedRestricted">
119+
<xsd:complexContent>
120+
<xsd:restriction base="CT_TrackedBase"/>
121+
</xsd:complexContent>
122+
</xsd:complexType>
123+
<!-- The restriction redeclares 'id' as optional and adds nothing else;
124+
'author' should still be inherited from the base. -->
125+
<xsd:complexType name="CT_OverrideDerived">
126+
<xsd:complexContent>
127+
<xsd:restriction base="CT_TrackedBase">
128+
<xsd:attribute name="id" type="xsd:string" use="optional"/>
129+
</xsd:restriction>
130+
</xsd:complexContent>
131+
</xsd:complexType>
111132
<xsd:element name="document" type="CT_Empty"/>
112133
</xsd:schema>

tests/ingest-xsd/ingest.test.ts

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,9 @@ const FIXTURES_DIR = join(import.meta.dir, "fixtures");
1515
const REAL_CACHE_DIR = "./data/xsd-cache/ecma-376-transitional";
1616
const realCacheReady = existsSync(join(REAL_CACHE_DIR, "wml.xsd"));
1717

18-
const databaseUrl = process.env.DATABASE_URL;
19-
if (!databaseUrl) {
20-
throw new Error("Missing DATABASE_URL for integration tests");
21-
}
18+
import { getTestDatabaseUrl } from "../test-db.ts";
19+
20+
const databaseUrl = getTestDatabaseUrl();
2221

2322
let db: DbClient;
2423

@@ -122,7 +121,9 @@ test("ingest writes inheritance edges for extension and restriction", async () =
122121
// ST_Jc restricts xsd:string (simpleType)
123122
// ST_OnOff restricts xsd:boolean
124123
// ST_String restricts xsd:string
125-
expect(stats.inheritanceEdgesInserted).toBe(6);
124+
// 6 from the original fixture + 2 new restrictions (CT_TrackedRestricted,
125+
// CT_OverrideDerived).
126+
expect(stats.inheritanceEdgesInserted).toBe(8);
126127
expect(stats.inheritanceUnresolved).toBe(0);
127128

128129
// Verify the CT_Extended → CT_Empty extension edge.
@@ -171,12 +172,14 @@ test("ingest is idempotent: re-running adds no new symbols/edges", async () => {
171172
db,
172173
});
173174

174-
expect(second.symbolsInserted).toBe(0);
175-
expect(second.symbolsExisting).toBeGreaterThan(0);
176-
expect(second.profileMembershipsInserted).toBe(0);
177-
expect(second.inheritanceEdgesInserted).toBe(0);
178-
// Content-model + attribute passes use delete-and-rewrite, so insert counts
179-
// equal the first run on every re-run; DB row counts stay stable.
175+
// Re-ingest purges everything this source previously wrote and re-creates
176+
// it, so every stat equals the first run and symbolsExisting stays at 0.
177+
// What matters for idempotency is that the DB row counts are stable across
178+
// runs (asserted below).
179+
expect(second.symbolsInserted).toBe(first.symbolsInserted);
180+
expect(second.symbolsExisting).toBe(0);
181+
expect(second.profileMembershipsInserted).toBe(first.profileMembershipsInserted);
182+
expect(second.inheritanceEdgesInserted).toBe(first.inheritanceEdgesInserted);
180183
expect(second.compositorsInserted).toBe(first.compositorsInserted);
181184
expect(second.childEdgesInserted).toBe(first.childEdgesInserted);
182185
expect(second.groupRefsInserted).toBe(first.groupRefsInserted);
@@ -332,14 +335,17 @@ test("ingest writes attributes, attributeGroup refs, and enum values", async ()
332335
});
333336

334337
// Fixture attributes:
335-
// CT_Para/bold (optional, type s:ST_OnOff)
336-
// CT_Extended/extra (optional, type xsd:string, under complexContent/extension)
337-
// AG_TableProps/cols (optional, type xsd:int)
338-
// CT_TableUser/caption (required, type xsd:string)
339-
// CT_RefTest/space (required, ref="s:space"; type/default copied from decl)
340-
// AG_Inner/innerAttr (optional, type xsd:string)
341-
// AG_Outer/outerAttr (optional, type xsd:string)
342-
expect(stats.attrEdgesInserted).toBe(7);
338+
// CT_Para/bold (optional, type s:ST_OnOff)
339+
// CT_Extended/extra (optional, type xsd:string, under complexContent/extension)
340+
// AG_TableProps/cols (optional, type xsd:int)
341+
// CT_TableUser/caption (required, type xsd:string)
342+
// CT_RefTest/space (required, ref="s:space"; type/default copied from decl)
343+
// AG_Inner/innerAttr (optional, type xsd:string)
344+
// AG_Outer/outerAttr (optional, type xsd:string)
345+
// CT_TrackedBase/id (required, type xsd:string)
346+
// CT_TrackedBase/author (optional, type xsd:string)
347+
// CT_OverrideDerived/id (optional override, type xsd:string)
348+
expect(stats.attrEdgesInserted).toBe(10);
343349
expect(stats.attrEdgesUnresolved).toBe(0);
344350

345351
// Fixture attributeGroup refs:

tests/ingest-xsd/parse-schema.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,10 @@ test("declarationsByQName indexes all top-level declarations across documents",
8585
const set = await parseSchemaSet({ schemaDir: FIXTURES_DIR, entrypoints: ["main.xsd"] });
8686

8787
const counts = countByKind(set.declarationsByQName);
88-
// main.xsd: 1 element, 13 complexType, 1 simpleType, 1 group, 3 attributeGroup
88+
// main.xsd: 1 element, 16 complexType, 1 simpleType, 1 group, 3 attributeGroup
8989
// shared.xsd: 2 simpleType, 1 attribute
9090
expect(counts.element).toBe(1);
91-
expect(counts.complexType).toBe(13);
91+
expect(counts.complexType).toBe(16);
9292
expect(counts.simpleType).toBe(3);
9393
expect(counts.group).toBe(1);
9494
expect(counts.attributeGroup).toBe(3);

0 commit comments

Comments
 (0)