diff --git a/docs/adr/SUTTA-007-pass-prompt-runner-layer.md b/docs/adr/SUTTA-007-pass-prompt-runner-layer.md index 978818b9..9a13704c 100644 --- a/docs/adr/SUTTA-007-pass-prompt-runner-layer.md +++ b/docs/adr/SUTTA-007-pass-prompt-runner-layer.md @@ -87,14 +87,39 @@ Merging would couple production error-handling concerns to benchmark flexibility --- -## Implementation Notes +## Amendment — Reversed by CONSOLIDATION (2026-05-16) -**Files:** -- `services/suttaStudioPassPrompts.ts` (~723 LOC) — prompt schemas, builders, types, parsing -- `services/suttaStudioPassRunners.ts` (~586 LOC) — per-pass async runner functions -- Primary consumer: `scripts/sutta-studio/benchmark.ts` +The "should not be merged" claim above was **reversed** by `docs/sutta-studio/CONSOLIDATION.md` +(landed across 2026-03-08 → 2026-05-16, PRs #62 and #63). Operational experience showed the +two stacks drifting in opposite directions — the bench-side prompts/schemas gained `wordRange` +and `refrainId` fields that production needed but the production schema didn't enforce; +production gained DPD wiring and structured-output telemetry the bench stack lacked. -**Deviations from proposal:** None — this ADR was written to document existing code, not to propose changes. The two files remain as described above, intentionally separate from `services/compiler/` (the production pipeline). +The cost of *not* merging turned out to be silent feature divergence at exactly the boundary +where alignment matters most (schema contracts and prompt content). The current arrangement: + +- All prompt builders, schemas, pass functions, and the LLM caller live under + `services/sutta-studio/` (canonical single location). +- The two files described here (`suttaStudioPassPrompts.ts`, `suttaStudioPassRunners.ts`) are + now thin re-export shims to be deleted in CONSOLIDATION Phase 4. +- Benchmark flexibility is preserved via an injectable `LLMCaller` parameter on the canonical + pass functions, not by maintaining a parallel implementation. + +The architectural principle "production error-handling concerns differ from benchmark +flexibility requirements" is still correct; it just doesn't require two separate codebases. +It requires injection seams. + +--- + +## Implementation Notes (original — now superseded by Amendment above) + +**Files (pre-CONSOLIDATION state, kept for historical context):** +- `services/suttaStudioPassPrompts.ts` (~723 LOC at the time of writing; now a 47-line shim) +- `services/suttaStudioPassRunners.ts` (~586 LOC at the time of writing; now a 35-line shim) +- Primary consumer: `scripts/sutta-studio/benchmark.ts` (still true; now consumes via the shims) + +**Deviations from proposal:** See Amendment above. The original ADR was written to document +existing code at a moment when the two stacks aligned; subsequent drift forced the merger. --- diff --git a/docs/features/SUTTA_STUDIO.md b/docs/features/SUTTA_STUDIO.md index a966dfb1..8d823db4 100644 --- a/docs/features/SUTTA_STUDIO.md +++ b/docs/features/SUTTA_STUDIO.md @@ -1,6 +1,11 @@ # Sutta Studio > Natural-language-to-structured-study-material compiler for Pali suttas +> +> **Authoritative architecture doc:** `docs/sutta-studio/FEATURES.md`. This +> file is the higher-level product overview kept for historical compatibility; +> file paths and pipeline shape in this overview match `FEATURES.md` but the +> details there are the source of truth. ## Overview @@ -94,12 +99,32 @@ The compiler runs **5 optional passes** that refine structure incrementally: ### Compiler Service -**File**: `services/suttaStudioCompiler.ts` (~1900 lines) - -- `compileSuttaStudioPacket(options)`: Main entry point -- Routes through provider adapters (OpenRouter, OpenAI, Gemini) -- Enforces 1-second minimum gap between LLM calls -- All errors logged via `logPipelineEvent()` for debugging +**Canonical location**: `services/sutta-studio/` (per CONSOLIDATION.md). + +Was: a single ~1900-line `services/suttaStudioCompiler.ts` monolith, decomposed +in March 2026. That filename is now a 3-line re-export shim; do not edit it. + +Current tree: + +- `services/sutta-studio/prompts/` — one builder per pass (skeleton, anatomist, + lexicographer, weaver, typesetter, phase, morphology) + `index.ts` re-exports +- `services/sutta-studio/passes/` — pure per-pass async functions with an + injectable `LLMCaller` seam (so benchmarks substitute their own caller) +- `services/sutta-studio/grounding/` — providers for contested terms, + commentarial glosses (Vism TEI), translator-bank lookups +- `services/sutta-studio/schemas.ts` — all 7 LLM response schemas (PR #62) +- `services/sutta-studio/llm.ts` — `callCompilerLLM`, `callCompilerLLMText`, + `resolveCompilerProvider` (PR #63) +- `services/sutta-studio/utils.ts` — boundary context, chunking, JSON parsing +- `services/sutta-studio/postPasses/syllabify.ts` — Pali syllabification + (post-LLM enrichment) +- `services/compiler/index.ts` — still concrete (the 773-line orchestrator); + CONSOLIDATION Phase 2d / PR D ports it to `services/sutta-studio/orchestrator.ts` + +The public entry point `compileSuttaStudioPacket(options)` is unchanged. +It still routes through provider adapters (OpenRouter, OpenAI, Gemini), +enforces a 1-second minimum gap between LLM calls, and logs all errors via +`logPipelineEvent()`. ### Zustand State @@ -168,9 +193,22 @@ The app uses Zustand for global state: | File | Purpose | |------|---------| -| `types/suttaStudio.ts` | Type definitions | -| `services/suttaStudioCompiler.ts` | Main compiler logic | +| `types/suttaStudio.ts` | Type definitions (single source of truth) | +| `services/sutta-studio/prompts/` | Per-pass prompt builders | +| `services/sutta-studio/passes/` | Per-pass pure functions + injectable `LLMCaller` | +| `services/sutta-studio/schemas.ts` | All 7 LLM response schemas | +| `services/sutta-studio/llm.ts` | LLM caller (provider resolve, logging, structured outputs) | +| `services/sutta-studio/grounding/` | Contested terms, commentarial glosses, translator bank | +| `services/sutta-studio/utils.ts` | Boundary context, chunking, JSON parsing | +| `services/sutta-studio/postPasses/syllabify.ts` | Pali syllabification post-pass | +| `services/compiler/index.ts` | Orchestrator (transitional; Phase 2d / PR D moves it) | +| `services/suttaStudioCompiler.ts` | Transitional shim — do not edit | | `config/suttaStudioPromptContext.ts` | Prompt context blocks | | `config/suttaStudioExamples.ts` | Example JSON for each pass | | `services/suttaStudioValidator.ts` | Validation logic | -| `docs/adr/SUTTA-003-sutta-studio-mvp.md` | Architecture Decision Record | +| `docs/sutta-studio/CONSOLIDATION.md` | Migration plan + per-phase status | +| `docs/sutta-studio/GROUNDING.md` | Grounding architecture + provider contracts | +| `docs/sutta-studio/FEATURES.md` | Current architecture (authoritative) | +| `docs/adr/SUTTA-003-sutta-studio-mvp.md` | Architecture Decision Record (MVP) | +| `docs/adr/SUTTA-007-pass-prompt-runner-layer.md` | ADR for runners (see Amendment) | +| `docs/adr/SUTTA-008-grounded-curation-data-layer.md` | ADR for grounding provenance | diff --git a/docs/sutta-studio/CONSOLIDATION.md b/docs/sutta-studio/CONSOLIDATION.md index 5dd9ed1a..e8a2a387 100644 --- a/docs/sutta-studio/CONSOLIDATION.md +++ b/docs/sutta-studio/CONSOLIDATION.md @@ -281,7 +281,7 @@ Phase 4 is when (and only when) we update consumers to import from the new canon ## What does NOT change in this refactor - **The prompt content itself** (other than gaining V2 amendments in one place instead of two). -- **The pipeline pass order** (Skeleton → Anatomist → Lexicographer → Weaver → Typesetter → Phase → Morphology). +- **The pipeline pass order** (Skeleton → Anatomist → Lexicographer → **Grounding** → Weaver → Typesetter → Phase → Morphology). Grounding was inserted between Lexicographer and Weaver in 2026-05-14 (task #47, GROUNDING.md Phase 2.5); the consolidation refactor preserves this order. - **The compiler's public API signatures.** `compileSuttaStudioPacket(options)` keeps the exact same options. - **Benchmark output format / leaderboard schema.** - **The CLAUDE.md / AGENTS.md multi-agent coordination rules.** diff --git a/docs/sutta-studio/GROUNDING.md b/docs/sutta-studio/GROUNDING.md index 6750fea8..c8ac461e 100644 --- a/docs/sutta-studio/GROUNDING.md +++ b/docs/sutta-studio/GROUNDING.md @@ -85,10 +85,14 @@ data/sutta-studio/grounding/ services/sutta-studio/grounding/ contestedTermProvider.ts # Reads contested-terms.json - commentarialProvider.ts # Reads commentarial-glosses.json - translatorBankProvider.ts # Wraps scBilaraVariants for per-verse lookups - urlMinter.ts # Reads url-templates, minted URLs on citations - index.ts # Unified facade + commentarialGlossProvider.ts # Reads commentarial-glosses.json (Eudoxos Vism TEI) + translatorBank.ts # Wraps scBilaraVariants for per-verse lookups + types.ts # GroundedClaim, GroundingProvider, MatchStrategy, Match + index.ts # Unified facade — buildDefaultProviders() + +# URL minting was not split into a dedicated module; it lives inline +# in services/providers/citationHelpers.ts. The translator-bank provider +# is wired into the compiler separately (not via buildDefaultProviders). services/sutta-studio/passes/ grounding.ts # Gap E — the new pass diff --git a/docs/sutta-studio/IR.md b/docs/sutta-studio/IR.md index 9dd6cd2b..c2522a14 100644 --- a/docs/sutta-studio/IR.md +++ b/docs/sutta-studio/IR.md @@ -1,9 +1,19 @@ # Sutta Studio IR (Deep Loom) - MVP Schema -> **Staleness warning:** This document describes the original MVP schema design. -> The authoritative TypeScript types live in `types/suttaStudio.ts` (298 LOC). -> When this doc and the types file conflict, the types file wins. -> Last verified against code: 2026-03-05. +> **⚠️ SUPERSEDED — historical reference only.** +> +> This document describes the original MVP schema (last verified 2026-03-05). +> Since then the IR has been substantially extended: grounding provenance +> (`Sense.epistemicBasis`, `sourceCitationIds`, `Provenance`, `ParallelRef`, +> `CompoundType`) per SUTTA-008; the grounding pass per the FEATURES.md +> pipeline; commentarial-gloss + translator-bank providers. Do not use this +> file as source. +> +> **Authoritative sources:** +> - TypeScript types: `types/suttaStudio.ts` +> - Architecture + pipeline: `docs/sutta-studio/FEATURES.md` +> - Provenance layer: `docs/adr/SUTTA-008-grounded-curation-data-layer.md` +> - Transmission graph: `docs/sutta-studio/TEXT_GRAPH.md` ## Goals - Represent Pali source text as canonical segments (stable IDs).