Skip to content

feat(lesson): read-aloud (TTS) for lessons#2

Merged
astrapi69 merged 9 commits into
mainfrom
feature/lesson-tts-read-aloud
Jun 3, 2026
Merged

feat(lesson): read-aloud (TTS) for lessons#2
astrapi69 merged 9 commits into
mainfrom
feature/lesson-tts-read-aloud

Conversation

@astrapi69

Copy link
Copy Markdown
Owner

Lesson read-aloud (TTS)

Wires text-to-speech into the Lesson viewer, built entirely on the existing voice lib (lib/voice/speech-synthesis + voicePref). Every TTS surface self-hides when the browser lacks speechSynthesis or the user disabled TTS in Voice Settings, so flows that don't use it are unaffected.

What's included

Commit Feature
C1 useReadAloud engine hook + inline ReadAloudButton (lucide speaker icon, pulse, language-aware voice, speed multiplier); speak() gains an onBoundary option
C2 Wired into theory steps + all 5 exercise prompts via the dispatcher (ttsLang + codeMode); suppressed on code/formula content; markdownToSpeech strips markdown/code for clean speech
C3 "Auto read-aloud" header toggle (persisted) — speaks each step on display in the lesson's target language
C4 Inline 0.5 / 0.75 / 1 / 1.25× speed controls (shown only during playback, remembered, restart-at-new-rate) + no-voice warning
C5 Follow-along word highlight via onboundary (.tts-active accent wash; static underline under prefers-reduced-motion); theory swaps to a spanned read-along view while reading
C6 lesson.tts.* i18n in all 8 catalogs (real umlauts in de) + R keyboard shortcut (ignored in inputs)
C7 Continuous theory reading: "Read all" reads a run of consecutive theory steps as one utterance and auto-advances the viewer at each boundary, stopping at the next exercise
C8 Floating mini-player (prev step / play-pause / next step / stop + "Step X of N theory steps") — step-based skip; pause/resume added to the engine
tests Integration (page-level, all 5 exercise types + code suppression) + a Dexie smoke spec for the read-aloud surface

Design decisions (flagging)

  1. No per-tile speaker buttons inside Matching/Picture tiles — those tiles are <button>s, and nesting a button is invalid HTML and would hijack the tile click. Pronunciation is carried by the prompt-level control, theory read-aloud, and auto-read. A non-button affordance (long-press, or a speaker in post-answer feedback) is a clean follow-up if wanted.
  2. Continuous mode uses step-level advance, not word-level highlight (word highlight stays a single-step feature) — keeps the concatenated-utterance offsets simple and the viewer behavior predictable.
  3. Mini-player ships step-based skip first (per the request's recommendation) — more useful for learning than an arbitrary 10s jump, and the Web Speech API can't seek. Time-based seek deferred.

Verification

  • 3128 Vitest tests pass, tsc clean, npm run build + Dexie build clean, backend i18n audit green.
  • The Dexie smoke spec compiles + wires (vite preview starts, runner reaches browser launch) but could not be executed in the authoring environment because the chromium binary can't be downloaded (network-restricted). Run it via make test-dexie-smoke — it joins the existing Dexie-mode gate.

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV


Generated by Claude Code

claude added 9 commits June 2, 2026 18:38
Foundation for lesson read-aloud, built on the existing voice lib:

- speech-synthesis.ts: speak() gains an onBoundary option so callers
  can follow along word-by-word (used by the highlight in C5).
- useReadAloud: the lesson-level TTS engine — resolves the voice from
  the saved preference name or the closest match for the requested
  lang, applies saved rate/pitch x an inline speed multiplier
  (0.5/0.75/1/1.25x, remembered in localStorage), exposes speaking /
  activeId / boundaryIndex / voiceAvailable, and stops on unmount.
- ReadAloudButton: a compact speaker-icon (lucide Volume2/Square)
  play/stop toggle for one piece of text. Same visibility gates as
  SpeechButton (no support / TTS off / empty text -> no render);
  lang-aware voice; pulses while speaking (pulse disabled under
  prefers-reduced-motion).
- CSS for the button + accent pulse + reduced-motion guard.

Tests: speed persistence + offered set; button visibility gates,
speak/stop toggle, lang propagation, and rate x speed. aria-labels
use lesson.tts.* keys (full i18n lands in C6; t() fallbacks cover it).

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
- ExerciseDispatcher threads ttsLang (the lesson target language) +
  codeMode to every renderer via the shared controlled-props
  contract; FreeText/Cloze keep their own codeMode too.
- All 5 exercise renderers render a prompt-level ReadAloudButton when
  a ttsLang is supplied, suppressed for code/formula content. Review +
  AdaptiveLesson pass no ttsLang, so they stay TTS-free.
- Theory steps gain a "Read aloud" control above the body; the body's
  Markdown is projected to clean speech text via the new
  markdownToSpeech helper (fenced code dropped, syntax stripped).
- CSS: .exercise-prompt-row (prompt + button on one line) and
  .lesson-theory-tts.

Per-tile speaker buttons inside the clickable Matching/Picture tiles
are intentionally NOT added — they are <button>s, and nesting a
button is invalid HTML and would hijack the tile click. The
prompt-level control + the summary answers (later commit) carry the
pronunciation value instead.

Tests: prompt button renders with ttsLang for free_text/matching/
cloze; suppressed under codeMode; absent without ttsLang. Existing
exercise + lesson + two-phase suites stay green.

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
- useReadAloud gains persisted auto-read prefs (readLessonAutoRead /
  writeLessonAutoRead; off by default).
- Lesson page drives the lesson-level engine: when auto-read is on it
  speaks each step on display in the lesson's target language —
  theory body (markdown stripped via markdownToSpeech) and exercise
  prompt; code/formula exercises are skipped. A ref guard makes the
  effect safe to re-run without re-speaking the same step; turning
  auto-read off stops playback and resets the guard.
- A pill toggle ("Auto read-aloud", aria-pressed) renders in the
  controls row under the progress bar when TTS is supported.

Tests (with a mocked speechSynthesis): toggle renders; theory body +
exercise prompt are spoken on display in the target language with
markdown stripped; nothing speaks when off; toggle flips aria-pressed
and persists. Existing lesson/exercise suites stay green (they run
without a synth mock, so the TTS UI stays hidden there).

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
- useReadAloud mirrors speed + speaking into refs and remembers the
  last spoken text, so setSpeed() can restart the current stream at
  the new rate immediately (the Web Speech API has no live rate
  change). speak() now reads the rate from the speed ref.
- Lesson controls row shows a 0.5/0.75/1/1.25x speed group ONLY while
  a stream is playing; the choice persists (readLessonSpeed) and the
  active speed is aria-pressed. Voice selection is already language-
  aware (the lang prop -> pickVoice); when the target language has no
  installed voice the engine reports voiceAvailable=false and the row
  surfaces a friendly "no voice for {language}" notice (playback still
  runs with the engine default).

Tests: speed control hidden while idle, shown during playback with all
four speeds; picking a speed persists it, restarts the read at the new
rate, and marks the active button aria-pressed.

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
- New ReadAlongText: renders the plain speech text as word spans and
  applies .tts-active to the word whose char range contains the
  engine's onboundary charIndex (exported tokenizeForReadAlong +
  activeTokenIndex are pure + unit-tested).
- TheoryStep is now engine-driven: its Read aloud / Stop button calls
  the shared useReadAloud engine (so BOTH manual clicks and auto-read
  emit word boundaries), keyed by a per-step utterance id. While that
  step is being read the rich Markdown is swapped for the follow-along
  view; Markdown returns when idle.
- Auto-read uses the same theory-{id} utterance id so the highlight
  also tracks during auto-read.
- CSS: .tts-active accent wash with a 120ms transition; under
  prefers-reduced-motion the wash + transition are dropped for a
  static accent underline.

Tests: tokenizer + activeTokenIndex ranges; active-word render;
manual theory button reads + swaps in the follow-along view + flips to
Stop; auto-read renders the follow-along view. Existing lesson suite
green (Markdown still shows when not reading).

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
- New lesson.tts.* keys in all 8 catalogs (read_aloud, stop,
  auto_read, speed, no_voice {language}, reading), real umlauts in de;
  synced to the frontend JSON. The button/toggle/speed/no-voice UI now
  resolves real strings instead of English fallbacks.
- Keyboard shortcut: pressing "R" (no modifier, not in an input /
  textarea / select / contenteditable) toggles read-aloud of the
  current step via the engine. Auto-read + shortcut share one
  currentStepSpeech() payload builder (theory body / non-code prompt,
  with the theory-{id} utterance id so the highlight tracks).

Tests: R reads the current step + swaps in the follow-along, second R
stops; R is ignored while typing in an input. Backend i18n audit green.

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
- New pure helpers collectTheoryRun (concatenate a run of consecutive
  theory steps into ONE utterance text + per-step char offsets) and
  runStepForChar (map a boundary charIndex back to a step index).
- Lesson "Read all" button (shown only on a theory step that begins a
  run of >=2) speaks the whole run as one utterance and auto-advances
  the viewer as the engine crosses each step boundary, stopping at the
  next exercise. Clicking again (or the run ending) stops + clears.
- New lesson.tts.read_all in all 8 catalogs (real umlauts in de).

Tests: markdownToSpeech (headings/emphasis/inline-code stripped, code
blocks dropped, links/images collapsed, empty cases); collectTheoryRun
run boundaries + offsets; runStepForChar mapping; Read all renders only
when a run exists, speaks the concatenation, and auto-advances on a
boundary event.

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
Ships the recommended step-based controls first (time-based seek
deferred — more useful for learning + the Web Speech API can't seek):

- useReadAloud gains pause()/resume()/paused (the engine stays
  "speaking" while paused).
- New theoryBlockAround helper: the contiguous theory block around a
  step + its 1-based position/total, for the "Step X of N theory
  steps" readout and prev/next availability.
- LessonTtsMiniPlayer: a floating bottom bar shown while the engine is
  active — previous theory step (re-read) / play-pause / next theory
  step / stop + position readout. Pure + presentational.
- Lesson wires it: prev/next call readTheoryStepAt (navigate + re-read
  with the theory-{id} utterance so the follow-along highlight tracks);
  play/pause toggles the engine; stop ends playback.
- New lesson.tts.{play,pause,prev_step,next_step,step_position} in all
  8 catalogs (real umlauts in de). CSS for the floating pill bar.

Tests: theoryBlockAround block/position/null cases; mini-player render
+ all four callbacks + edge disabling + paused state; Lesson page hides
the player until reading, shows it with the block position, next
re-reads the next step, play/pause toggles pause. Full suite 3121
green; build clean; i18n audit green.

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
Integration (Vitest, page level): renders the REAL Lesson page (real
ExerciseDispatcher + renderers; only useLesson/getStorage/synth mocked)
at each exercise step and pins that the prompt read-aloud button is
threaded through for all 5 exercise types, suppressed for a code
exercise, and absent entirely when TTS is disabled in Settings.

Smoke (Playwright, Dexie build, no backend): downloads fr-a1-from-en,
opens 01-greetings and exercises the read-aloud surface end to end —
theory control reads + swaps in the follow-along + shows the floating
mini-player; mini-player Stop ends playback; the "R" shortcut toggles
read-aloud; auto-read speaks each step on display. speechSynthesis is
injected via addInitScript so the run doesn't depend on installed
voices (which would otherwise end utterances immediately).

Note: the smoke spec compiles + wires (vite preview starts, the runner
reaches browser launch) but could not be executed in this environment
because the chromium binary can't be downloaded (network-restricted).
Run it with `make test-dexie-smoke` (it joins the existing Dexie gate).

https://claude.ai/code/session_019JZ1Ridnhg4hmv6AcSVdcV
@astrapi69 astrapi69 merged commit dd35f62 into main Jun 3, 2026
8 of 10 checks passed
@astrapi69 astrapi69 deleted the feature/lesson-tts-read-aloud branch June 3, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants