Skip to content

Commit b313a9e

Browse files
committed
feat(xsd): default fetch URL + sha256 to data/sources.json
`bun run xsd:fetch` no longer requires --url. The script reads the URL and expected sha256 from data/sources.json's ecma-376-transitional entry by default; CLI flags and XSD_PART4_URL still override for testing a new edition before pinning it. Common case becomes a single command: bun run xsd:fetch The manifest is the canonical pin (already used to upsert reference_sources via sources:sync), so making it the default for the fetch script keeps a single source of truth instead of asking contributors to remember a long URL or paste it into a .env. Docs (CLAUDE.md, scripts/ingest-xsd/README.md) updated to show the short form and explain how to override.
1 parent b473736 commit b313a9e

3 files changed

Lines changed: 62 additions & 21 deletions

File tree

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ ECMA Part 4 zip → fetch+verify (sha256) → parse → ingest (single transacti
145145
```
146146

147147
```bash
148-
bun run xsd:fetch --url <part4-zip-url> --expected-sha256 <hex>
148+
bun run xsd:fetch # URL + sha256 from data/sources.json
149149
bun run xsd:ingest
150150
```
151151

scripts/ingest-xsd/README.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,22 @@ Transitional XSDs (`wml.xsd`, `dml-main.xsd`, `sml.xsd`, `pml.xsd`,
2424
`shared-*.xsd`, ...).
2525

2626
```bash
27-
bun run xsd:fetch \
28-
--url 'https://ecma-international.org/wp-content/uploads/ECMA-376-4_5th_edition_december_2016.zip' \
29-
--expected-sha256 'bd25da1109f73762356596918bf5ff8b74a1331642dba5f1c1d1dfc6bed34ecd'
27+
bun run xsd:fetch
3028
```
3129

32-
The script verifies the outer-zip sha256, extracts the inner zip, and lands
33-
the XSDs in `data/xsd-cache/ecma-376-transitional/`. The cache is gitignored;
30+
URL and sha256 are read from `data/sources.json`'s `ecma-376-transitional`
31+
entry (currently pinned to ECMA-376 5th edition, December 2016). The script
32+
verifies the outer-zip sha256, extracts the inner zip, and lands the XSDs
33+
in `data/xsd-cache/ecma-376-transitional/`. The cache is gitignored;
3434
nothing binary lands in the repo.
3535

36+
To test a new edition before pinning it:
37+
38+
```bash
39+
bun run xsd:fetch -- --url <other-url> # override URL
40+
bun run xsd:fetch -- --expected-sha256 <hex> # override hash
41+
```
42+
3643
## Ingest
3744

3845
```bash

scripts/ingest-xsd/fetch.ts

Lines changed: 49 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,20 @@
66
* which in turn contains the 26 Transitional XSDs (wml.xsd, dml-main.xsd,
77
* sml.xsd, pml.xsd, shared-*.xsd, and friends).
88
*
9+
* URL and sha256 are read from data/sources.json's ecma-376-transitional
10+
* entry by default. CLI flags and env vars override; useful for testing a
11+
* new edition before pinning it in the manifest.
12+
*
913
* Cache layout:
1014
* data/xsd-cache/
1115
* _staging/ (outer + inner zip extraction scratch)
1216
* ecma-376-transitional/ (final XSDs land here)
1317
*
1418
* Usage:
15-
* bun scripts/ingest-xsd/fetch.ts --url <part4-zip-url>
16-
* bun scripts/ingest-xsd/fetch.ts --url <url> --expected-sha256 <hex>
17-
*
18-
* Or via env:
19-
* XSD_PART4_URL=<url> bun scripts/ingest-xsd/fetch.ts
20-
*
21-
* After a successful fetch the script prints the outer-zip sha256;
22-
* paste it into data/sources.json under the ecma-376-transitional entry
23-
* to pin reproducibility.
19+
* bun run xsd:fetch (manifest default)
20+
* bun run xsd:fetch -- --url <other-url> (override URL)
21+
* bun run xsd:fetch -- --expected-sha256 <hex> (override hash)
22+
* XSD_PART4_URL=<url> bun run xsd:fetch (override via env)
2423
*/
2524

2625
import { createHash } from "node:crypto";
@@ -39,22 +38,57 @@ interface Args {
3938
innerZip: string;
4039
}
4140

42-
function parseArgs(): Args {
41+
interface SourceManifestEntry {
42+
name: string;
43+
url?: string;
44+
sha256?: string | null;
45+
}
46+
47+
interface SourceManifest {
48+
sources: SourceManifestEntry[];
49+
}
50+
51+
async function loadManifestDefault(): Promise<{ url: string | null; sha256: string | null }> {
52+
try {
53+
const raw = await Bun.file("./data/sources.json").text();
54+
const manifest = JSON.parse(raw) as SourceManifest;
55+
const ecma = manifest.sources?.find((s) => s.name === "ecma-376-transitional");
56+
return {
57+
url: ecma?.url ?? null,
58+
sha256: ecma?.sha256 ?? null,
59+
};
60+
} catch {
61+
return { url: null, sha256: null };
62+
}
63+
}
64+
65+
async function parseArgs(): Promise<Args> {
4366
const argv = process.argv.slice(2);
44-
let url = process.env.XSD_PART4_URL ?? "";
67+
let url: string | null = process.env.XSD_PART4_URL ?? null;
4568
let expectedSha256: string | null = null;
4669
let innerZip = DEFAULT_INNER_ZIP;
4770

4871
for (let i = 0; i < argv.length; i++) {
4972
const a = argv[i];
50-
if (a === "--url") url = argv[++i] ?? "";
73+
if (a === "--url") url = argv[++i] ?? null;
5174
else if (a === "--expected-sha256") expectedSha256 = argv[++i] ?? null;
5275
else if (a === "--inner-zip") innerZip = argv[++i] ?? DEFAULT_INNER_ZIP;
5376
}
5477

78+
// Fall back to the manifest for any unset values. data/sources.json is
79+
// the canonical pin; we treat it as the default config so the common case
80+
// is just `bun run xsd:fetch`.
81+
if (!url || !expectedSha256) {
82+
const fromManifest = await loadManifestDefault();
83+
if (!url) url = fromManifest.url;
84+
if (!expectedSha256) expectedSha256 = fromManifest.sha256;
85+
}
86+
5587
if (!url) {
56-
console.error("Missing --url (or XSD_PART4_URL env var).");
57-
console.error("Pass the canonical ECMA-376 5th edition Part 4 zip URL.");
88+
console.error(
89+
"No URL configured. Set 'url' on the ecma-376-transitional entry in data/sources.json,",
90+
);
91+
console.error("or pass --url / XSD_PART4_URL.");
5892
process.exit(1);
5993
}
6094
return { url, expectedSha256, innerZip };
@@ -100,7 +134,7 @@ function findFile(dir: string, name: string): string | null {
100134
}
101135

102136
async function main() {
103-
const args = parseArgs();
137+
const args = await parseArgs();
104138

105139
await rm(STAGING_DIR, { recursive: true, force: true });
106140
await rm(FINAL_DIR, { recursive: true, force: true });

0 commit comments

Comments
 (0)