Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 31 additions & 6 deletions docs/adr/SUTTA-007-pass-prompt-runner-layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,14 +87,39 @@ Merging would couple production error-handling concerns to benchmark flexibility

---

## Implementation Notes
## Amendment — Reversed by CONSOLIDATION (2026-05-16)

**Files:**
- `services/suttaStudioPassPrompts.ts` (~723 LOC) — prompt schemas, builders, types, parsing
- `services/suttaStudioPassRunners.ts` (~586 LOC) — per-pass async runner functions
- Primary consumer: `scripts/sutta-studio/benchmark.ts`
The "should not be merged" claim above was **reversed** by `docs/sutta-studio/CONSOLIDATION.md`
(landed across 2026-03-08 → 2026-05-16, PRs #62 and #63). Operational experience showed the
two stacks drifting in opposite directions — the bench-side prompts/schemas gained `wordRange`
and `refrainId` fields that production needed but the production schema didn't enforce;
production gained DPD wiring and structured-output telemetry the bench stack lacked.

**Deviations from proposal:** None — this ADR was written to document existing code, not to propose changes. The two files remain as described above, intentionally separate from `services/compiler/` (the production pipeline).
The cost of *not* merging turned out to be silent feature divergence at exactly the boundary
where alignment matters most (schema contracts and prompt content). The current arrangement:

- All prompt builders, schemas, pass functions, and the LLM caller live under
`services/sutta-studio/` (canonical single location).
- The two files described here (`suttaStudioPassPrompts.ts`, `suttaStudioPassRunners.ts`) are
now thin re-export shims to be deleted in CONSOLIDATION Phase 4.
- Benchmark flexibility is preserved via an injectable `LLMCaller` parameter on the canonical
pass functions, not by maintaining a parallel implementation.

The architectural principle "production error-handling concerns differ from benchmark
flexibility requirements" is still correct; it just doesn't require two separate codebases.
It requires injection seams.

---

## Implementation Notes (original — now superseded by Amendment above)

**Files (pre-CONSOLIDATION state, kept for historical context):**
- `services/suttaStudioPassPrompts.ts` (~723 LOC at the time of writing; now a 47-line shim)
- `services/suttaStudioPassRunners.ts` (~586 LOC at the time of writing; now a 35-line shim)
- Primary consumer: `scripts/sutta-studio/benchmark.ts` (still true; now consumes via the shims)

**Deviations from proposal:** See Amendment above. The original ADR was written to document
existing code at a moment when the two stacks aligned; subsequent drift forced the merger.

---

Expand Down
56 changes: 47 additions & 9 deletions docs/features/SUTTA_STUDIO.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Sutta Studio

> Natural-language-to-structured-study-material compiler for Pali suttas
>
> **Authoritative architecture doc:** `docs/sutta-studio/FEATURES.md`. This
> file is the higher-level product overview kept for historical compatibility;
> file paths and pipeline shape in this overview match `FEATURES.md` but the
> details there are the source of truth.

## Overview

Expand Down Expand Up @@ -94,12 +99,32 @@ The compiler runs **5 optional passes** that refine structure incrementally:

### Compiler Service

**File**: `services/suttaStudioCompiler.ts` (~1900 lines)

- `compileSuttaStudioPacket(options)`: Main entry point
- Routes through provider adapters (OpenRouter, OpenAI, Gemini)
- Enforces 1-second minimum gap between LLM calls
- All errors logged via `logPipelineEvent()` for debugging
**Canonical location**: `services/sutta-studio/` (per CONSOLIDATION.md).

Was: a single ~1900-line `services/suttaStudioCompiler.ts` monolith, decomposed
in March 2026. That filename is now a 3-line re-export shim; do not edit it.

Current tree:

- `services/sutta-studio/prompts/` — one builder per pass (skeleton, anatomist,
lexicographer, weaver, typesetter, phase, morphology) + `index.ts` re-exports
- `services/sutta-studio/passes/` — pure per-pass async functions with an
injectable `LLMCaller` seam (so benchmarks substitute their own caller)
- `services/sutta-studio/grounding/` — providers for contested terms,
commentarial glosses (Vism TEI), translator-bank lookups
- `services/sutta-studio/schemas.ts` — all 7 LLM response schemas (PR #62)
- `services/sutta-studio/llm.ts` — `callCompilerLLM`, `callCompilerLLMText`,
`resolveCompilerProvider` (PR #63)
- `services/sutta-studio/utils.ts` — boundary context, chunking, JSON parsing
- `services/sutta-studio/postPasses/syllabify.ts` — Pali syllabification
(post-LLM enrichment)
- `services/compiler/index.ts` — still concrete (the 773-line orchestrator);
CONSOLIDATION Phase 2d / PR D ports it to `services/sutta-studio/orchestrator.ts`

The public entry point `compileSuttaStudioPacket(options)` is unchanged.
It still routes through provider adapters (OpenRouter, OpenAI, Gemini),
enforces a 1-second minimum gap between LLM calls, and logs all errors via
`logPipelineEvent()`.

### Zustand State

Expand Down Expand Up @@ -168,9 +193,22 @@ The app uses Zustand for global state:

| File | Purpose |
|------|---------|
| `types/suttaStudio.ts` | Type definitions |
| `services/suttaStudioCompiler.ts` | Main compiler logic |
| `types/suttaStudio.ts` | Type definitions (single source of truth) |
| `services/sutta-studio/prompts/` | Per-pass prompt builders |
| `services/sutta-studio/passes/` | Per-pass pure functions + injectable `LLMCaller` |
| `services/sutta-studio/schemas.ts` | All 7 LLM response schemas |
| `services/sutta-studio/llm.ts` | LLM caller (provider resolve, logging, structured outputs) |
| `services/sutta-studio/grounding/` | Contested terms, commentarial glosses, translator bank |
| `services/sutta-studio/utils.ts` | Boundary context, chunking, JSON parsing |
| `services/sutta-studio/postPasses/syllabify.ts` | Pali syllabification post-pass |
| `services/compiler/index.ts` | Orchestrator (transitional; Phase 2d / PR D moves it) |
| `services/suttaStudioCompiler.ts` | Transitional shim — do not edit |
| `config/suttaStudioPromptContext.ts` | Prompt context blocks |
| `config/suttaStudioExamples.ts` | Example JSON for each pass |
| `services/suttaStudioValidator.ts` | Validation logic |
| `docs/adr/SUTTA-003-sutta-studio-mvp.md` | Architecture Decision Record |
| `docs/sutta-studio/CONSOLIDATION.md` | Migration plan + per-phase status |
| `docs/sutta-studio/GROUNDING.md` | Grounding architecture + provider contracts |
| `docs/sutta-studio/FEATURES.md` | Current architecture (authoritative) |
| `docs/adr/SUTTA-003-sutta-studio-mvp.md` | Architecture Decision Record (MVP) |
| `docs/adr/SUTTA-007-pass-prompt-runner-layer.md` | ADR for runners (see Amendment) |
| `docs/adr/SUTTA-008-grounded-curation-data-layer.md` | ADR for grounding provenance |
2 changes: 1 addition & 1 deletion docs/sutta-studio/CONSOLIDATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ Phase 4 is when (and only when) we update consumers to import from the new canon
## What does NOT change in this refactor

- **The prompt content itself** (other than gaining V2 amendments in one place instead of two).
- **The pipeline pass order** (Skeleton → Anatomist → Lexicographer → Weaver → Typesetter → Phase → Morphology).
- **The pipeline pass order** (Skeleton → Anatomist → Lexicographer → **Grounding** → Weaver → Typesetter → Phase → Morphology). Grounding was inserted between Lexicographer and Weaver in 2026-05-14 (task #47, GROUNDING.md Phase 2.5); the consolidation refactor preserves this order.
- **The compiler's public API signatures.** `compileSuttaStudioPacket(options)` keeps the exact same options.
- **Benchmark output format / leaderboard schema.**
- **The CLAUDE.md / AGENTS.md multi-agent coordination rules.**
Expand Down
12 changes: 8 additions & 4 deletions docs/sutta-studio/GROUNDING.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,14 @@ data/sutta-studio/grounding/

services/sutta-studio/grounding/
contestedTermProvider.ts # Reads contested-terms.json
commentarialProvider.ts # Reads commentarial-glosses.json
translatorBankProvider.ts # Wraps scBilaraVariants for per-verse lookups
urlMinter.ts # Reads url-templates, minted URLs on citations
index.ts # Unified facade
commentarialGlossProvider.ts # Reads commentarial-glosses.json (Eudoxos Vism TEI)
translatorBank.ts # Wraps scBilaraVariants for per-verse lookups
types.ts # GroundedClaim, GroundingProvider, MatchStrategy, Match
index.ts # Unified facade — buildDefaultProviders()

# URL minting was not split into a dedicated module; it lives inline
# in services/providers/citationHelpers.ts. The translator-bank provider
# is wired into the compiler separately (not via buildDefaultProviders).

services/sutta-studio/passes/
grounding.ts # Gap E — the new pass
Expand Down
18 changes: 14 additions & 4 deletions docs/sutta-studio/IR.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
# Sutta Studio IR (Deep Loom) - MVP Schema

> **Staleness warning:** This document describes the original MVP schema design.
> The authoritative TypeScript types live in `types/suttaStudio.ts` (298 LOC).
> When this doc and the types file conflict, the types file wins.
> Last verified against code: 2026-03-05.
> **⚠️ SUPERSEDED — historical reference only.**
>
> This document describes the original MVP schema (last verified 2026-03-05).
> Since then the IR has been substantially extended: grounding provenance
> (`Sense.epistemicBasis`, `sourceCitationIds`, `Provenance`, `ParallelRef`,
> `CompoundType`) per SUTTA-008; the grounding pass per the FEATURES.md
> pipeline; commentarial-gloss + translator-bank providers. Do not use this
> file as source.
>
> **Authoritative sources:**
> - TypeScript types: `types/suttaStudio.ts`
> - Architecture + pipeline: `docs/sutta-studio/FEATURES.md`
> - Provenance layer: `docs/adr/SUTTA-008-grounded-curation-data-layer.md`
> - Transmission graph: `docs/sutta-studio/TEXT_GRAPH.md`

## Goals
- Represent Pali source text as canonical segments (stable IDs).
Expand Down
Loading