From 9a13ff8c7631c3fad39eb501540daf064ac55922 Mon Sep 17 00:00:00 2001 From: Aditya Date: Sat, 16 May 2026 21:58:13 -0400 Subject: [PATCH 1/2] docs(sutta-studio): close 2026-05-16 audit P1 findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The doc-audit skill scan of the Sutta Studio subsystem (2026-05-16, code hash 11e7bb2) surfaced one P0 (Phase 3 caller divergence — closed by PR #63) and 9 P1 findings. This PR closes 5 of those P1s. The remaining 4 are either already addressed inline by PR #63's CONSOLIDATION.md status rewrite or are larger structural follow-ups not appropriate to bundle here. Fixes in this PR: - GROUNDING.md §"File layout": corrected filenames to match reality — commentarialProvider → commentarialGlossProvider, translatorBankProvider → translatorBank, removed nonexistent urlMinter.ts (URL minting is inlined in services/providers/citationHelpers.ts), added types.ts entry, noted that translator-bank wires in separately, not via buildDefaultProviders. - IR.md: replaced soft "staleness warning" with stronger SUPERSEDED banner. The 2026-03-05 verification date is 2.5 months stale; in the interim SUTTA-008 ratified Sense.epistemicBasis / sourceCitationIds / Provenance / ParallelRef / CompoundType and the grounding pass landed. New banner directs readers to types/suttaStudio.ts, FEATURES.md, SUTTA-008, TEXT_GRAPH.md rather than relying on the MVP schema as documentation. - CONSOLIDATION.md §"What does NOT change": added Grounding step to the pass order. Earlier formulation omitted it because Grounding was inserted into the live compiler in 2026-05-14 (task #47), after CONSOLIDATION.md was originally drafted. - SUTTA-007 ADR: added an Amendment section per CONVENTIONS.md §9. The original "should not be merged" claim was reversed by CONSOLIDATION.md, which merged the two stacks because operational experience showed silent drift at schema-contract and prompt-content boundaries. ADR is immutable, so the original text is preserved and the amendment explains the reversal + current state (both files are now shims). - FEATURES.md §"Compiler Service" + §"Key Files Reference": rewrote to reflect the post-decomposition tree. The old §Compiler Service named a 1900-line file that's been a 3-line shim since March 2026; the §Key Files table omitted the entire services/sutta-studio/ canonical layer + the grounding providers + the CONSOLIDATION/GROUNDING/SUTTA-008 docs. Deferred to follow-up PRs: - Quote-corner integrations of the audit's P2/P3 findings (naming inconsistency in grounding/translatorBank.ts, postPasses/syllabify.ts GAP coverage, archived-status header on assembly-line-roadmap.md and sutta-studio-case-studies.md, P3 component coverage gaps). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../adr/SUTTA-007-pass-prompt-runner-layer.md | 37 +++++++++++--- docs/features/SUTTA_STUDIO.md | 51 +++++++++++++++---- docs/sutta-studio/CONSOLIDATION.md | 2 +- docs/sutta-studio/GROUNDING.md | 12 +++-- docs/sutta-studio/IR.md | 18 +++++-- 5 files changed, 96 insertions(+), 24 deletions(-) diff --git a/docs/adr/SUTTA-007-pass-prompt-runner-layer.md b/docs/adr/SUTTA-007-pass-prompt-runner-layer.md index 978818b9..9a13704c 100644 --- a/docs/adr/SUTTA-007-pass-prompt-runner-layer.md +++ b/docs/adr/SUTTA-007-pass-prompt-runner-layer.md @@ -87,14 +87,39 @@ Merging would couple production error-handling concerns to benchmark flexibility --- -## Implementation Notes +## Amendment — Reversed by CONSOLIDATION (2026-05-16) -**Files:** -- `services/suttaStudioPassPrompts.ts` (~723 LOC) — prompt schemas, builders, types, parsing -- `services/suttaStudioPassRunners.ts` (~586 LOC) — per-pass async runner functions -- Primary consumer: `scripts/sutta-studio/benchmark.ts` +The "should not be merged" claim above was **reversed** by `docs/sutta-studio/CONSOLIDATION.md` +(landed across 2026-03-08 → 2026-05-16, PRs #62 and #63). Operational experience showed the +two stacks drifting in opposite directions — the bench-side prompts/schemas gained `wordRange` +and `refrainId` fields that production needed but the production schema didn't enforce; +production gained DPD wiring and structured-output telemetry the bench stack lacked. -**Deviations from proposal:** None — this ADR was written to document existing code, not to propose changes. The two files remain as described above, intentionally separate from `services/compiler/` (the production pipeline). +The cost of *not* merging turned out to be silent feature divergence at exactly the boundary +where alignment matters most (schema contracts and prompt content). The current arrangement: + +- All prompt builders, schemas, pass functions, and the LLM caller live under + `services/sutta-studio/` (canonical single location). +- The two files described here (`suttaStudioPassPrompts.ts`, `suttaStudioPassRunners.ts`) are + now thin re-export shims to be deleted in CONSOLIDATION Phase 4. +- Benchmark flexibility is preserved via an injectable `LLMCaller` parameter on the canonical + pass functions, not by maintaining a parallel implementation. + +The architectural principle "production error-handling concerns differ from benchmark +flexibility requirements" is still correct; it just doesn't require two separate codebases. +It requires injection seams. + +--- + +## Implementation Notes (original — now superseded by Amendment above) + +**Files (pre-CONSOLIDATION state, kept for historical context):** +- `services/suttaStudioPassPrompts.ts` (~723 LOC at the time of writing; now a 47-line shim) +- `services/suttaStudioPassRunners.ts` (~586 LOC at the time of writing; now a 35-line shim) +- Primary consumer: `scripts/sutta-studio/benchmark.ts` (still true; now consumes via the shims) + +**Deviations from proposal:** See Amendment above. The original ADR was written to document +existing code at a moment when the two stacks aligned; subsequent drift forced the merger. --- diff --git a/docs/features/SUTTA_STUDIO.md b/docs/features/SUTTA_STUDIO.md index a966dfb1..f2cf7ed3 100644 --- a/docs/features/SUTTA_STUDIO.md +++ b/docs/features/SUTTA_STUDIO.md @@ -94,12 +94,32 @@ The compiler runs **5 optional passes** that refine structure incrementally: ### Compiler Service -**File**: `services/suttaStudioCompiler.ts` (~1900 lines) - -- `compileSuttaStudioPacket(options)`: Main entry point -- Routes through provider adapters (OpenRouter, OpenAI, Gemini) -- Enforces 1-second minimum gap between LLM calls -- All errors logged via `logPipelineEvent()` for debugging +**Canonical location**: `services/sutta-studio/` (per CONSOLIDATION.md). + +Was: a single ~1900-line `services/suttaStudioCompiler.ts` monolith, decomposed +in March 2026. That filename is now a 3-line re-export shim; do not edit it. + +Current tree: + +- `services/sutta-studio/prompts/` — one builder per pass (skeleton, anatomist, + lexicographer, weaver, typesetter, phase, morphology) + `index.ts` re-exports +- `services/sutta-studio/passes/` — pure per-pass async functions with an + injectable `LLMCaller` seam (so benchmarks substitute their own caller) +- `services/sutta-studio/grounding/` — providers for contested terms, + commentarial glosses (Vism TEI), translator-bank lookups +- `services/sutta-studio/schemas.ts` — all 7 LLM response schemas (PR #62) +- `services/sutta-studio/llm.ts` — `callCompilerLLM`, `callCompilerLLMText`, + `resolveCompilerProvider` (PR #63) +- `services/sutta-studio/utils.ts` — boundary context, chunking, JSON parsing +- `services/sutta-studio/postPasses/syllabify.ts` — Pali syllabification + (post-LLM enrichment) +- `services/compiler/index.ts` — still concrete (the 773-line orchestrator); + CONSOLIDATION Phase 2d / PR D ports it to `services/sutta-studio/orchestrator.ts` + +The public entry point `compileSuttaStudioPacket(options)` is unchanged. +It still routes through provider adapters (OpenRouter, OpenAI, Gemini), +enforces a 1-second minimum gap between LLM calls, and logs all errors via +`logPipelineEvent()`. ### Zustand State @@ -168,9 +188,22 @@ The app uses Zustand for global state: | File | Purpose | |------|---------| -| `types/suttaStudio.ts` | Type definitions | -| `services/suttaStudioCompiler.ts` | Main compiler logic | +| `types/suttaStudio.ts` | Type definitions (single source of truth) | +| `services/sutta-studio/prompts/` | Per-pass prompt builders | +| `services/sutta-studio/passes/` | Per-pass pure functions + injectable `LLMCaller` | +| `services/sutta-studio/schemas.ts` | All 7 LLM response schemas | +| `services/sutta-studio/llm.ts` | LLM caller (provider resolve, logging, structured outputs) | +| `services/sutta-studio/grounding/` | Contested terms, commentarial glosses, translator bank | +| `services/sutta-studio/utils.ts` | Boundary context, chunking, JSON parsing | +| `services/sutta-studio/postPasses/syllabify.ts` | Pali syllabification post-pass | +| `services/compiler/index.ts` | Orchestrator (transitional; Phase 2d / PR D moves it) | +| `services/suttaStudioCompiler.ts` | Transitional shim — do not edit | | `config/suttaStudioPromptContext.ts` | Prompt context blocks | | `config/suttaStudioExamples.ts` | Example JSON for each pass | | `services/suttaStudioValidator.ts` | Validation logic | -| `docs/adr/SUTTA-003-sutta-studio-mvp.md` | Architecture Decision Record | +| `docs/sutta-studio/CONSOLIDATION.md` | Migration plan + per-phase status | +| `docs/sutta-studio/GROUNDING.md` | Grounding architecture + provider contracts | +| `docs/sutta-studio/FEATURES.md` | Current architecture (authoritative) | +| `docs/adr/SUTTA-003-sutta-studio-mvp.md` | Architecture Decision Record (MVP) | +| `docs/adr/SUTTA-007-pass-prompt-runner-layer.md` | ADR for runners (see Amendment) | +| `docs/adr/SUTTA-008-grounded-curation-data-layer.md` | ADR for grounding provenance | diff --git a/docs/sutta-studio/CONSOLIDATION.md b/docs/sutta-studio/CONSOLIDATION.md index 5dd9ed1a..e8a2a387 100644 --- a/docs/sutta-studio/CONSOLIDATION.md +++ b/docs/sutta-studio/CONSOLIDATION.md @@ -281,7 +281,7 @@ Phase 4 is when (and only when) we update consumers to import from the new canon ## What does NOT change in this refactor - **The prompt content itself** (other than gaining V2 amendments in one place instead of two). -- **The pipeline pass order** (Skeleton → Anatomist → Lexicographer → Weaver → Typesetter → Phase → Morphology). +- **The pipeline pass order** (Skeleton → Anatomist → Lexicographer → **Grounding** → Weaver → Typesetter → Phase → Morphology). Grounding was inserted between Lexicographer and Weaver in 2026-05-14 (task #47, GROUNDING.md Phase 2.5); the consolidation refactor preserves this order. - **The compiler's public API signatures.** `compileSuttaStudioPacket(options)` keeps the exact same options. - **Benchmark output format / leaderboard schema.** - **The CLAUDE.md / AGENTS.md multi-agent coordination rules.** diff --git a/docs/sutta-studio/GROUNDING.md b/docs/sutta-studio/GROUNDING.md index 6750fea8..c8ac461e 100644 --- a/docs/sutta-studio/GROUNDING.md +++ b/docs/sutta-studio/GROUNDING.md @@ -85,10 +85,14 @@ data/sutta-studio/grounding/ services/sutta-studio/grounding/ contestedTermProvider.ts # Reads contested-terms.json - commentarialProvider.ts # Reads commentarial-glosses.json - translatorBankProvider.ts # Wraps scBilaraVariants for per-verse lookups - urlMinter.ts # Reads url-templates, minted URLs on citations - index.ts # Unified facade + commentarialGlossProvider.ts # Reads commentarial-glosses.json (Eudoxos Vism TEI) + translatorBank.ts # Wraps scBilaraVariants for per-verse lookups + types.ts # GroundedClaim, GroundingProvider, MatchStrategy, Match + index.ts # Unified facade — buildDefaultProviders() + +# URL minting was not split into a dedicated module; it lives inline +# in services/providers/citationHelpers.ts. The translator-bank provider +# is wired into the compiler separately (not via buildDefaultProviders). services/sutta-studio/passes/ grounding.ts # Gap E — the new pass diff --git a/docs/sutta-studio/IR.md b/docs/sutta-studio/IR.md index 9dd6cd2b..c2522a14 100644 --- a/docs/sutta-studio/IR.md +++ b/docs/sutta-studio/IR.md @@ -1,9 +1,19 @@ # Sutta Studio IR (Deep Loom) - MVP Schema -> **Staleness warning:** This document describes the original MVP schema design. -> The authoritative TypeScript types live in `types/suttaStudio.ts` (298 LOC). -> When this doc and the types file conflict, the types file wins. -> Last verified against code: 2026-03-05. +> **⚠️ SUPERSEDED — historical reference only.** +> +> This document describes the original MVP schema (last verified 2026-03-05). +> Since then the IR has been substantially extended: grounding provenance +> (`Sense.epistemicBasis`, `sourceCitationIds`, `Provenance`, `ParallelRef`, +> `CompoundType`) per SUTTA-008; the grounding pass per the FEATURES.md +> pipeline; commentarial-gloss + translator-bank providers. Do not use this +> file as source. +> +> **Authoritative sources:** +> - TypeScript types: `types/suttaStudio.ts` +> - Architecture + pipeline: `docs/sutta-studio/FEATURES.md` +> - Provenance layer: `docs/adr/SUTTA-008-grounded-curation-data-layer.md` +> - Transmission graph: `docs/sutta-studio/TEXT_GRAPH.md` ## Goals - Represent Pali source text as canonical segments (stable IDs). From aa47ed24c24fceca7d9c01bf1a2b0581fcc1d822 Mon Sep 17 00:00:00 2001 From: Aditya Date: Sat, 16 May 2026 22:26:30 -0400 Subject: [PATCH 2/2] docs(sutta-studio): mark SUTTA_STUDIO.md as historical, point at authoritative FEATURES.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per review feedback on PR #65: this file's own §Key Files Reference table declared docs/sutta-studio/FEATURES.md as the authoritative architecture doc, leaving the file the audit was editing in self-contradiction. Added a top-of-file banner directing readers to the canonical doc. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/features/SUTTA_STUDIO.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/features/SUTTA_STUDIO.md b/docs/features/SUTTA_STUDIO.md index f2cf7ed3..8d823db4 100644 --- a/docs/features/SUTTA_STUDIO.md +++ b/docs/features/SUTTA_STUDIO.md @@ -1,6 +1,11 @@ # Sutta Studio > Natural-language-to-structured-study-material compiler for Pali suttas +> +> **Authoritative architecture doc:** `docs/sutta-studio/FEATURES.md`. This +> file is the higher-level product overview kept for historical compatibility; +> file paths and pipeline shape in this overview match `FEATURES.md` but the +> details there are the source of truth. ## Overview