Skip to content

feat(liturgy): Metta Sutta depth pass — prosodic splits + per-morpheme depth + mn10 register#77

Merged
anantham merged 23 commits into
mainfrom
feat/opus-metta-depth
May 20, 2026
Merged

feat(liturgy): Metta Sutta depth pass — prosodic splits + per-morpheme depth + mn10 register#77
anantham merged 23 commits into
mainfrom
feat/opus-metta-depth

Conversation

@anantham
Copy link
Copy Markdown
Owner

Summary

The Metta Sutta now reads at MAPLE Heart Sutra depth. All 10 verses split into 4 prosodic segments each (40 total), with full word-by-word alignment to all three witnesses (Amaravati, Sujato, Thanissaro) and per-morpheme tooltips in mn10's plain-prose register.

What changed

  • Prosodic splits: each of the 10 verses cut into its natural 4 chant-phrases at the comma/semicolon boundaries. The eye lands on one Pāli phrase at a time with its three English renderings beneath it.
  • Per-word alignTo arrays for every witness in every segment. The user's previous complaint that arrows fanned out wherever the alignment was missing is resolved — every English token has either a Pāli anchor or an explicit -1 (glue word).
  • Morpheme decomposition for compound Pāli words. New roots surfaced across the sutta: √kṛ (do), √śam (calm), √vac (speak), √i (go), √tuṣ (content), √bhṛ (bear), √vṛt (proceed), √gṛdh (greedy), √car (act), √jñā (know), √vad (speak — against), √bhū (be), √as (be), √dṛś (see), √vas (dwell), √iṣ (seek), √kub (cheat), √man (think), √ruṣ (anger), √han (strike), √rakṣ (protect), √mā (measure), √sthā (stand), √sad (sit), √śī (lie), √smṛ (remember — sati), √hṛ (carry), √pad (attain), √nī (lead), √gam (go).
  • Plain-register glosses following CURATION_PROTOCOL.md §3.4. No "gerundive ending", "instrumental case ending", "past participle". Replaced with concrete teaching using everyday analogies:
    • -yam tail: "like English -able in doable, but with a must flavour"
    • -aṁ tail: "pronounced as a soft nasal close, like um in hum"
    • -ena tail: "the doer of the action — English wedges in 'by'; Pāli changes the word's tail"
    • -eyya tail: "would/should (do this) — the wishing voice"
    • -esu tail: "in/among, with a plural"
  • Glue-word opacity (renderer change): English tokens with alignTo === -1 render at 0.55 opacity. mn10's "ghost word" pattern, eased a notch.
  • Per-morpheme arrow filter (renderer change): hovering a specific morpheme of a Pāli word narrows the arrow display to that morpheme's lines, not the whole word's fan.
  • Per-morpheme underline gaps (renderer change): each morpheme gets px-[2px] so adjacent underlines don't merge visually. Same trick mn10 uses on its segment spans.

Test plan

  • /liturgy/maple/metta-sutta — verify all 10 verses display 4 segments each
  • Hover Pāli words and morphemes — confirm tooltips read as plain prose (no jargon)
  • Hover an English glue word ("the", "by", "is") — confirm it's dimmed at 0.55 opacity vs full-strength content words
  • Hover a morpheme within a compound (e.g. kar in karaṇī of v1, or bhāv in bhāvaye of v7/v8) — confirm only that morpheme's arrows show
  • Confirm the alignment-audit test suite (1058 passing including 200+ new entries for metta segments)

🤖 Generated with Claude Code

anantham added 13 commits May 19, 2026 14:56
…ic split

User feedback (Metta Sutta verse 1): when hovering a Pāli word the
arrows fanned out to every aligned English token regardless of which
morpheme the cursor was on. The per-morpheme tooltip felt decoupled
from the arrow you were trying to read. Two changes:

1. Renderer (TripleScriptWitness): when the hovered element carries
   data-morpheme-idx (the inner HoverSpan of a morpheme-split word),
   narrow the arrow filter to lines whose Line.morphemeIdx matches.
   The alignment-line computer already anchors arrow endpoints to
   morpheme positions when authored; this just exposes that scoping
   to the hover handler.

2. Metta Sutta verse 1: split the existing single bundled segment
   (Karaṇīyamatthakusalena ... anatimānī) into four prosodic phrases
   (v1a-v1d) matching the natural rhythm. Each phrase carries its
   own three-witness set (Amaravati / Sujato / Thanissaro) with
   word-by-word alignTo and full word data — pronunciation,
   etymology, gloss, morphemes (root verbs √kṛ, √śam, √vac, √man,
   √i, √as surfaced), and DPD citations. Compound karaṇīyam-attha-
   kusalena is hyphen-tokenised so each component hovers separately.
…kenization

Two Playwright-verified bugs in verse 1:

1. kusalena's morphemes (kusala + ena = 9 chars) didn't reconstruct
   the 8-char surface, so the splitter returned null and the word fell
   back to whole-word hover. Switch to kusal + ena (the sandhi-shortened
   stem) so the split round-trips and each piece becomes hoverable.

2. The default Latin tokenizer splits on the apostrophe in c'assa,
   yielding 'c' and 'assa' as two pali tokens. My alignTo treated
   c'assa as one token at idx 1, so 'mudu' and 'anatimānī' references
   pointed at the wrong pali surface. Add a tokens hint on the v1d
   pi-Latn script variant to keep c'assa as one hover unit, matching
   the alignTo indexing.
Two pieces, both addressing user feedback to learn from mn10 + the
CURATION_PROTOCOL.md Plain-Register Check:

1. Renderer: EnglishLine now accepts alignTo and dims any English token
   whose alignTo entry is -1 to opacity 0.55. These are 'glue words' —
   English scaffolding that has no Pāli counterpart (mn10 calls them
   'ghost words' and renders them at 0.3; we settle higher because
   liturgy glue is more often unavoidable syntax than fully supplied
   content). The eye now lands on content words that actually map back.

2. Metta verse 1: rewrite morpheme + word glosses in plain prose
   register. CURATION_PROTOCOL.md §3.4 calls out 'gerundive ending',
   'instrumental case ending', 'past participle', 'accusative singular
   neuter' as diagnostic failure-tone words. Replaced with the mn10
   model — concrete teaching using everyday analogies (e.g. -yam tail
   = 'like English -able in doable, but with a must flavour'; -aṁ tail
   = 'pronounced like um in hum'; -ena tail = 'English wedges in by
   to do the same job; Pāli changes the word's tail').

   Also: attha gloss expanded to the full sense range the user
   supplied (benefit, welfare, good, purpose, aim, meaning) and the
   v1a Amaravati alignTo fixed so 'in' → -1 (glue) instead of 1 (attha)
   — removes the duplicate arrow you flagged from attha to 'in' AND
   'goodness'.
…attern

Adjacent morpheme HoverSpans inside a Pāli word rendered their dotted
underlines butting against each other, so the eye saw one continuous
underline beneath the whole word. mn10 puts a 2px horizontal padding
on each segment so the underlines visibly break — kar · aṇī · yam
reads as three connected pieces. Borrowed verbatim.
Same model as verse 1: split into v2a-d at the natural caesurae,
alignTo per witness, full word data (pronunciation, etymology, gloss)
with morphemes broken out for every compound. New roots surfaced: √tuṣ
(content), √bhṛ (bear/support), √vṛt (proceed), √gṛdh (greedy).

Glosses written in the mn10 plain-prose register — no 'instrumental
case ending' or 'past participle'. Where a grammar concept needs
explanation (the -esu plural locative ending in kulesu), the technical
machinery gets glossed in the same breath: 'the -esu tail marks
in/among with a plural — like English in those families.'
v3a-d: 'Let them not do the slightest thing the wise would reprove
... may all beings be happy at heart.' New roots surfaced: √car
(act, conduct), √jñā (know — viññū), √vad (speak — upavadeyyuṁ),
√bhū (be — bhavantu/hontu), √as (be — sattā). The wishing voice
*hontu* / *bhavantu* introduced — the heart of the metta sutta
starts in this verse.
v4a-d sweeps every creature: 'Whatever living beings there are — none
excepted, weak or strong; long or large, medium, short, fine, thick.'
New compound *pāṇabhūtatthī* broken out (breath + existing + being-one).
*rassakāṇukathūlā* split into rassakā + ṇukā + thūlā so the three sizes
are individually hoverable.
… + refrain

v5a-d: 'seen or unseen, near or far, born or seeking-to-come-into-
being; may all beings be happy at heart'. New roots: √dṛś (see —
diṭṭhā/addiṭṭhā), √vas (dwell — vasanti), √iṣ (seek — sambhavesī).
v5d repeats the *sabbe sattā bhavantu sukhitattā* refrain (same
shape as v3d) so readers learn the rhythm.
…uffering

v6a-d: the moral-conduct stanza. New roots: √kub (cheat — nikubbetha),
√man (think — nātimaññetha, echoing anatimānī from v1), √ruṣ (anger —
byārosanā), √han (strike — paṭighasaññā via paṭi-gha strike-back =
aversion), √iṣ (wish — dukkhamiccheyya). The Pāli technical pair
*byārosanā* (hot anger) and *paṭighasaññā* (cold aversion) glossed
with their Buddhist-psychology context.
…f boundless heart

v7a-d: the famous *yathā… evampi…* simile. 'As a mother with her own
life protects her one-and-only child, even so let one cultivate a
boundless heart toward all beings.' New roots: √rakṣ (protect —
anurakkhe), √mā (measure — aparimāṇaṁ), √bhū causative (bhāveti, the
technical term for meditative *cultivation*). The four-piece compound
*eka-putta-(m)anu-rakkhe* broken out so each piece is hoverable.
…ections

v8a-d: 'And loving-kindness for the whole world; cultivate the
boundless heart above, below, and across; uncrowded, without grudge,
without enemy.' The directional sweep *uddhaṁ adho ca tiriyañca* and
the three negations *asambādhaṁ averaṁ asapattaṁ* each broken out.
v8b repeats the *mānasaṁ bhāvaye aparimāṇaṁ* refrain from v7d.
… abiding

v9a-d: 'Standing, walking, sitting, or lying down — as long as one is
alert — let one resolve on this mindfulness; this is what they call
the divine abiding, here.' New roots: √sthā (stand — tiṭṭhaṁ, also
adhiṭṭheyya the 'standing-upon, resolving'), √sad (sit — nisinno),
√śī (lie — sayāno), √smṛ (remember — satiṁ, mindfulness), √hṛ (carry,
dwell — vihāraṁ). The four bodily postures of monastic life
catalogued so the heart-cultivation has no resting place.
…, liberation

v10a-d: 'Not falling into views, virtuous and perfected in vision,
having dispelled greed for sense-pleasures, one never again lies in
a womb — so it is said.' Pairs *diṭṭhi* (held view, from √dṛś) with
*dassana* (direct seeing, same root) — same word turned from object
to instrument. Closing *iti* glossed as 'the Pāli closing
quotation-marks'. The whole metta sutta now at heart-sutra depth.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lexicon-forge Ready Ready Preview, Comment May 20, 2026 4:54pm

…pairing

The alignment-arrow renderer assigned English tokens to morphemes
purely by position (i-th English word mapped to a word → i-th morpheme
of that word). When English reorders a word's morphemes the arrows
cross: kusalena = kusal (skilled) + ena (by-an-agent), but Amaravati
renders it 'one ... skilled' — so the heuristic sent kusal's arrow to
'one' and ena's to 'skilled', both wrong.

Add an optional Witness.morphemeAlignTo array, parallel to alignTo.
morphemeAlignTo[i] names the morpheme index that English token i should
anchor to; null/absent falls back to the positional heuristic.
computeAlignmentLines honours it when present.

Authored for Metta verse 1a across all three witnesses:
- karaṇīyam: 'done' → √kṛ root (morpheme 0); 'is/what/should/be' → -yam
- kusalena: 'skilled' → kusal stem (0); 'one' → -ena ending (1)

Playwright-verified: hovering kusal now arrows to 'skilled', ena to
'one'. Other verses still use the heuristic — extend morphemeAlignTo
where a crossing is spotted.
Followed up the v1a fix with a full line-by-line pass: every witness
where two-plus English tokens mapped to a morpheme-bearing Pāli word
now carries an explicit morphemeAlignTo, so the arrows anchor on the
semantically correct morpheme instead of the positional guess.

41 witnesses annotated. The genuine reversals fixed include:
- gabbhaseyya: 'lie' → seyya (lying), 'womb' → gabbha
- brahmametaṁ: 'sublime/divine' → brahma, 'this' → metaṁ
- dukkhamiccheyya: 'harm/suffer' → dukkham, 'wish' → iccheyya
- sambhavesī: 'seeking' → esī, 'birth' → bhav
- sallahukavutti: 'living' → vutti, 'lightly' → sallahuka
- idhamāhu: 'they call' → māhu, 'here' → idha
- evampi: 'even' → pi, 'so' → evam
- ekaputtamanurakkhe: 'protect' → rakkhe, 'child' → putta, 'only' → eka

Plus many minor cases where a grammatical English word (of, for, in,
with, and) was landing on a meaning-stem morpheme — now redirected to
the ending morpheme or the stem as the sense requires.

Validated: all 41 morphemeAlignTo arrays length-match their alignTo
and every index is in range. Playwright-confirmed on v6d (dukkham now
arrows to 'harm', not 'wish'). 1095 liturgy tests pass.
The relational-arrow SVG was painting on top of the transliteration
line, so the romanization was crossed by a green curve and hard to
read. Establish an explicit three-layer z-order on the segment:

  SVG alignment edges   z-0
  transliteration line  z-[5]  (+ bg-slate-950 to mask the edge)
  chant words           z-10

The transliteration now sits above the edge (edge tucks behind it,
re-emerging below) but still below the chant words — so word tooltips,
which live in the word's z-10 stacking context, continue to paint
above the transliteration. Same bg-mask trick the word spans already
use to hide edge strokes behind themselves.
…them

Four words had morpheme texts that didn't concatenate back to the
surface form, so splitByMorphemes returned null and the word fell
back to a single whole-word hover instead of per-morpheme spans:

- nātimaññetha: na+ati → nā+ti (the two short a's merge to long ā)
- nāññamaññassa: na+aññamañña → nā+ññamañña (same a+a → ā merge)
- rassakāṇukathūlā: ṇukā → ṇuka (final vowel shortens before thūlā)
- anupaggamma: gamma → ggamma (g doubles at the prefix-join)

Each morpheme gloss now explains the sandhi in plain register so the
reader understands why the piece is spelled the way it is. All four
words now render as separately-hoverable morphemes — verified
nātimaññetha splits into nā · ti · maññetha.
The gear-icon settings popover had no dismiss path except toggling the
gear again — clicking the chant body left it stuck open. Add a
mousedown listener (closes when the click lands outside the popover's
wrapper) plus Escape-to-close, both wired only while the popover is
open and torn down when it closes.
anantham added 2 commits May 20, 2026 09:37
…umbers

Cross-reference glosses leaked internal segment IDs ('same word as
v7d', 'same root as *bhavantu* in v3d') into reader-facing tooltips —
meaningless to anyone reading the chant. Replaced all 26 with 'verse N'
phrasing. The IDs only ever made sense to the curator.
…he joins

QC sweep after the Metta Sutta pass found the same silent bug class
(morphemes don't reconstruct the surface → splitByMorphemes returns
null → word degrades to a single whole-word hover) in four more
chants. Nine words fixed:

  morning-chants  kāmesu              kāma → kām
  sho-sai         śāsanānām           ana → an
  heart-sutra     Āryāvalokiteśvaro   Ārya → Āry
  heart-sutra     cittāvaraṇa         citta → citt
  heart-sutra     viparyāsātikrānto   atikrānto → ātikrānto
  heart-sutra     Tryadhvavyavasthitāḥ  tri → try
  bodhi-heart-sutra: same three as heart-sutra (it mirrors that file)

Every fix is a Pāli/Sanskrit sandhi at the morpheme join — two vowels
merging (a+a→ā, a+ā→ā), a vowel shortening, or i→y before a vowel. The
morpheme gloss now names the sound-change so the spelling makes sense.

Audited exhaustively: all morphemes in every triple-script-witness
segment across both sanghas now reconstruct their surface form.
Also confirmed: no internal segment-ID leaks ('v7d'-style) in any
chant other than Metta (now fixed).
The Metta QC pass surfaced two failure modes that neither crash nor
fail a render — they just quietly degrade the reader, so nothing
caught them until a human noticed. New test file makes both loud:

1. Morpheme reconstruction — every word's morphemes[] (and per-script
   scriptMorphemes[]) must concatenate back to the surface form. If
   they don't, the renderer's splitByMorphemes returns null and the
   word silently loses its per-morpheme hovers/arrows. scriptMorphemes
   comparison strips token separators (space, Tibetan tsek) since the
   renderer splits per-token.

2. Segment-ID leaks — reader-facing gloss/etymology text must never
   contain the curator's internal segment-ID shorthand (v1a, v7d) —
   it's meaningless to a reader. Caught with /\bv\d+[a-z]\b/.

Runs across every word in every triple-script-witness segment in both
sanghas — a new chant is covered automatically. All green (the data
was fully fixed in the preceding commits).

Not covered: jargon glosses ('gerundive', 'accusative singular') —
that's a visible-not-silent issue with 84 known hits deferred to a
dedicated plain-register pass; a jargon-guard test should land
alongside that fix, not before it (would just be 84 red tests).
…hant glosses + add regression guard

Earlier I deferred this as 'an 84-hit multi-hour pass'. That was
procrastination — the issue was identified, so it gets fixed.

Rewrote every grammar-jargon gloss/etymology across the whole liturgy
into the plain-prose register CURATION_PROTOCOL §3.4 prescribes —
*show* the grammatical idea, don't *name* it:
  'accusative (object of "I go to")'  →  'the object of "I go to"'
  'past participle of √vac'             →  'the "X-ed" form of √vac'
  'ablative ending'                     →  'the "-from" ending'
  'genitive plural'                     →  'the "of-those" ending'
  '(nominative)' appended as noise      →  dropped

Files touched: heart-sutra, bodhi-heart-sutra, morning-chants,
sho-sai, ti-sarana, bodhicitta-dedication, metta-sutta, om-mani,
way-of-compassion. Audit: 0 jargon hits remain anywhere.

Plus a third check in liturgy-data-quality.test.ts — a jargon
tripwire. It is not an absolute ban (the pay-rent rule still permits
a glossed term that earns its place); a legitimately-earned term goes
in JARGON_ALLOWLIST with a rationale. The guard already proved itself:
its case-insensitive scan caught 'Optative' in way-of-compassion's
bhāvaye gloss that a case-sensitive grep had missed.
Distils the Metta depth pass + cross-chant QC sweep into a failure-mode
taxonomy, organised by how each error announces itself — silent / loud
/ judgment. For every class: what it is, why it happens, where it was
found, the guard now in place, and the rule a future auto-generation
pipeline must follow to avoid producing it.

The through-line: the two dangerous classes (morpheme reconstruction,
internal-ID leak) were silent — survived from authoring until a human
hovered the wrong word. The fix was to make them loud. A generator
must run, on its own output, every invariant the renderer silently
assumes, and degrade cleanly when it cannot satisfy one.
@anantham anantham merged commit f51fdff into main May 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant