Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .squad/agents/kaylee/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,35 @@ Wash's mobile observability memo identifies capturing Blazor WebView JavaScript

5. **Bootstrap icon + styling conventions:** Existing pages use `bi-*` icons consistently, `form-control-ss` for inputs, `card-ss` for sections, `ss-body2`/`ss-title3` typography.


## Learnings (continued)

- 2026-04-24: **Import UI Pattern Scout** — Completed planning investigation for Zoe's Import page design. Surveyed existing admin/management pages (Settings, ResourceAdd, ResourceEdit, Resources, Onboarding). Key findings: (1) Blazor-only platform (no MauiReactor parity yet); (2) Settings uses lightweight card sections, ResourceAdd shows multi-card + file import + preview table pattern, Resources shows polished list/lookup with search + filter + dual view modes + Virtualize; (3) File picker abstraction ready (IFilePickerService + WebFilePickerService for Blazor, MauiFilePickerService for MAUI); (4) Form validation is explicit null checks → Toast (no DataAnnotations UI); (5) Import already in top nav (NavMenu.razor line 60, icon bi-box-arrow-in-down); (6) Import.razor exists (28.7 KB YouTube-only template, reusable structure). Documented in `.squad/decisions/inbox/kaylee-import-ui-scout.md`.

---

## 2026-04-24 — Import UI Pattern Scout (Multi-Agent Session)

Surveyed UI patterns, form conventions, file picker abstractions, navigation structure for import feature architecture.

**Key findings:**
- Form patterns: Bootstrap cards + sections (card-ss, form-control-ss, theme typography)
- Multi-step flows: Settings (tabbed), Import (URL → Transcript → Polish → Save), ResourceAdd (card + file + preview)
- File import pattern in ResourceAdd: InputFile + delimiter radios + editable preview table
- Resource lookup: Resources.razor shows search + filter + dual views + Virtualize
- File picker abstraction: IFilePickerService (Blazor: WebFilePickerService, MAUI: MauiFilePickerService)
- Navigation: Import already in top-level nav (bi-box-arrow-in-down)

**Recommendations to Zoe:**
- Keep Import in top-level nav (good positioning)
- Create new `/import-content` page (separate from YouTube)
- Reuse preview table from ResourceAdd.razor
- Use InputFile for file upload
- Toast for notifications

**Reusable components:** PageHeader, card-ss, form-control-ss, preview table pattern

**Coordinated with:** Zoe (architecture), Wash (data layer), River (AI), Copilot

**Next:** Implementation team uses patterns to build ImportContent.razor page.

111 changes: 111 additions & 0 deletions .squad/agents/river/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,94 @@
- Grading philosophy for sentence shortcut: grade for CONTEXTUAL USAGE (using word naturally in a sentence), never for definition-recitation ("X means Y")
- The `userMeaning` template variable in GradeSentence.scriban-txt maps to "which I mean to express..." — passing meta-instructions here biases AI grading toward definition patterns

---

## 2026-04-26 — Import Feature AI Strategy Design

**Session:** Data Import AI Strategy — Planning phase (no code)
**Status:** 📋 Design Complete — Awaiting Zoe architecture plan
**Deliverable:** `.squad/decisions/inbox/river-import-ai-design.md`

**Key Learnings:**

### Reuse-first approach wins
- **60% template reuse** achieved: `ExtractVocabularyFromTranscript.scriban-txt` is 90% reusable for vocabulary import (just swap "transcript" context for "imported data" context), `GetTranslations.scriban-txt` provides translation-fill pattern, `CleanTranscript.scriban-txt` provides cleanup logic for transcript segmentation
- **VocabularyExtractionResponse DTO** is PERFECT for import — already has [Description] attributes, TOPIK level, LexicalUnitType, RelatedTerms, Tags. Zero new DTO needed for vocabulary import (Task 3).
- Reuse reduces risk, token cost, and maintenance burden

### Heuristics-first routing saves tokens
- **80%+ of imports** are CSV/TSV/JSON with clear structure (Anki, Quizlet, spreadsheet exports) → deterministic parsing (regex, CSV lib, JSON deserialize)
- **20%** are messy (free-form text, transcripts, ambiguous delimiters) → need AI
- Routing rule: heuristics first (>= 0.85 confidence), AI fallback if inconclusive
- Token savings: ~500-1000 tokens per import (~$0.001-0.002 per import avoided)

### Confidence thresholds create UX safety valve
- **>= 0.85** = auto-proceed (high confidence)
- **0.70-0.84** = show UI confirmation with AI reasoning, proceed if Captain approves
- **< 0.70** = show warning + manual format selection fallback
- Permissive philosophy: extract good, flag bad (UnparseableLines), NEVER fail entire import

### Chunking strategy for large imports
- **Vocabulary:** batch 200-300 rows per call (balance latency vs token count)
- **Phrases:** batch 100-150 phrases per call (phrases are longer than vocab words)
- **Transcripts:** chunk 2000-3000 chars per call (avoid context window overflow, maintain coherence)
- Parallel calls: cap at 3 concurrent to avoid rate limits

### Five distinct AI tasks identified
1. **Format inference** (when Captain skips format field) → `ImportFormatInferenceResponse` (DetectedFormat, Delimiter, HasHeaderRow, ColumnRoles, Confidence, Notes)
2. **Content classification** (Vocabulary vs Phrases vs Transcript) → `ImportContentClassificationResponse` (ContentType, Confidence, Reasoning)
3. **Vocabulary extraction** → REUSE `VocabularyExtractionResponse` DTO
4. **Phrase extraction** → `PhraseExtractionResponse` (Entries, UnparseableLines)
5. **Transcript segmentation** → `TranscriptExtractionResponse` (Segments, optional ExtractedVocabulary)

Each task has clear input → output DTO → confidence signal.

### [Description] attributes > JSON formatting
- Microsoft.Extensions.AI uses [Description] attributes automatically for prompt context
- NO manual JSON formatting in Scriban templates (library handles serialization/deserialization)
- ONLY use [JsonPropertyName] when AI must output specific field name that differs from C# convention
- This pattern already proven in existing codebase (`VocabularyExtractionResponse`, `ExtractedVocabularyItem`)

### Translation-fill preserves permissiveness
- If Captain provides only target language terms (one column) → AI generates missing native-language translations
- Never reject for missing data → auto-fill gracefully
- Follows project philosophy: "permissive grading, accept variations, fill missing, never reject"

### Four new Scriban templates needed
1. `ImportFormatInference.scriban-txt` (Task 1 — format detection)
2. `ImportContentClassification.scriban-txt` (Task 2 — content type classification)
3. `ImportPhraseExtraction.scriban-txt` (Task 4 — hybrid of ExtractVocabularyFromTranscript + GetTranslations)
4. `ImportTranscriptSegmentation.scriban-txt` (Task 5 — CleanTranscript + segmentation + speaker/timestamp detection)

Templates will be written AFTER Zoe's architecture plan is approved (next phase).

### Cost estimates
- **Format inference:** ~$0.001 per import (negligible)
- **Content classification:** ~$0.002 per import (negligible)
- **Vocabulary extraction (300 rows):** ~$0.01-0.03 per import
- **Transcript (10k chars):** ~$0.03-0.05 per import
- **Total per-import:** $0.01-0.10 depending on size (acceptable)

### Open questions for Captain
1. Transcript vocabulary extraction: always or optional checkbox?
2. Duplicate handling: skip, update, create new, or ask each time?
3. LexicalUnitType override during import review?
4. Batch import limit (hard cap to avoid UI freeze)?

Documented in design doc section "Open Questions for Captain".

### References examined
- `AiService.cs` — SendPrompt<T> pattern (lines 45-74)
- `VocabularyExtractionResponse.cs` — [Description] attribute pattern (lines 1-94)
- `ExtractVocabularyFromTranscript.scriban-txt` — extraction rules, permissiveness (lines 1-75)
- `GetTranslations.scriban-txt` — translation generation pattern (lines 1-24)
- `CleanTranscript.scriban-txt` — transcript cleanup logic
- `SmartResourceService.cs` — LearningResource wiring pattern (no AI usage, but architectural reference)

**Next:** Zoe's architecture plan → River writes 4 Scriban templates → Wash implements ImportService + UI → Jayne writes E2E tests

---

- AI prompts are Scriban templates in `src/SentenceStudio.AppLib/Resources/Raw/*.scriban-txt`
- AI grading uses `AiService.SendPrompt<T>()` with structured JSON responses
- Grading philosophy: VERY permissive — accept associations, contrasts, feelings, moods, cultural links
Expand Down Expand Up @@ -365,3 +453,26 @@ Full analysis and design rationale captured in:
2. **Testing:** After next video import, verify AI returns proper `lexicalUnitType` + `relatedTerms` in response JSON. Check DB to confirm values flow through.
3. **Monitoring:** Log LexicalUnitType distribution (Word/Phrase/Sentence ratio) to validate AI classifies sensibly. If too many Unknown, refine prompt guidance.


---

## 2026-04-24 — Import AI Strategy (Multi-Agent Session)

Designed AI strategy for new data import feature: 5 tasks (format inference, content classification, vocabulary/phrase/transcript extraction), heuristic-first approach, structured DTOs via `SendPrompt<T>`.

**Key decisions:**
- Heuristics-first, AI fallback: deterministic checks fast/free; AI only when inconclusive (< 0.7 confidence)
- Permissive grading: accept reasonable variations, never reject for spelling
- 5 prompt tasks with confidence thresholds (>= 0.85 auto-proceed, < 0.85 show UI confirmation)
- Structured DTOs: ImportFormatInferenceResponse, ImportContentClassificationResponse, reuse VocabularyExtractionResponse
- All prompts in `.scriban-txt` templates, no manual JSON formatting (Captain's rule)

**Prompt templates to build:**
- Format Inference (detect delimiter, column roles, header presence)
- Content Classification (Vocabulary vs Phrases vs Transcript)
- Reuse existing ExtractVocabularyFromTranscript for extraction tasks

**Coordinated with:** Zoe (architecture), Wash (data layer), Kaylee (UI), Copilot

**Next:** Implement prompt templates. Integration into `ContentImportService` by implementation team.

24 changes: 24 additions & 0 deletions .squad/agents/scribe/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,27 @@ Agent Scribe initialized and ready for work.
Initial setup complete.
- For Azure publish logs, capture environment + region, successful resource list, public webapp URL, Aspire dashboard URL, and note custom-domain follow-up separately from deploy status.
- **BlazorHybrid NavigateTo patterns (2026-04-18):** `LoginPage.razor` and `RegisterPage.razor` use `forceLoad: true`, which works for Web (cookie-backed) but breaks MAUI (loses in-memory auth state). Platform-gating needed via `isWeb` pattern. See `NavMenu.razor:106-107` for existing pattern to borrow.

---

## 2026-04-24 — Data Import Architecture Session Orchestration

Scribed multi-agent session for data-import-architecture-plan. Deployed Zoe (architect), Wash (data scout), River (AI strategy), Kaylee (UI scout). Recorded Captain directive on scope separation.

**Orchestration tasks completed:**
1. ✅ 5 orchestration logs (Zoe, Wash, River, Kaylee, Copilot directive)
2. ✅ Session log with key outcomes + next steps
3. ✅ Merged 6 inbox decisions into decisions.md, deleted inbox files
4. ✅ Updated all 4 agent histories with import-plan context
5. ✅ Decisions.md now 169KB (will archive if >20KB trigger next run)
6. ✅ Git staged for commit with proper trailers

**Key outcomes recorded:**
- MVP architecture: `/import-content` page, ContentImportService in Shared, no new DB tables
- Placement: separate Video Subscriptions from generic import (per Captain directive)
- AI strategy: heuristics-first, 5 prompt templates, confidence thresholds
- UI patterns: reuse ResourceAdd card, InputFile, preview table
- Data integrity: dedup by TargetLanguageTerm (case-insensitive trimmed)

**Next:** Implementation team begins service + UI build. River engineers prompts.

84 changes: 84 additions & 0 deletions .squad/agents/wash/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -1558,3 +1558,87 @@ None. Exact match to Captain's specification.


- 2026-04-24: **DX24 LexicalUnitType Hotfix (Production Emergency)** — Captain installed Release iOS build (feat/vocab Word-vs-Phrase, commit ff0bb25) to DX24 (iPhone) but app errored on every activity page with "no such column: LexicalUnitType". Root cause: `SyncService.InitializeDatabaseAsync` has a catch-all at lines 227-230 that logs MigrateAsync exceptions and continues, so the new migration `20260423213242_AddLexicalUnitTypeAndConstituents` failed silently on device. Established pattern from `AddMissingVocabularyWordLanguageColumn.cs`: SQLite migrations for mobile must be idempotent via `PatchMissingColumnsAsync` because MigrateAsync failures are swallowed. Fix: (1) Made SQLite migration Up() empty with doc comment explaining snapshot-only advancement + PatchMissingColumnsAsync pre-migration patching pattern. (2) Extended `PatchMissingColumnsAsync` to add `LexicalUnitType INTEGER NOT NULL DEFAULT 0` column to VocabularyWord if missing AND create `PhraseConstituent` table with all 3 indexes (FK1, FK2, unique composite) using `CREATE TABLE IF NOT EXISTS` + `CREATE INDEX IF NOT EXISTS` for idempotency. Pattern confirmed: future SQLite migrations adding columns/tables must pair migration file with PatchMissingColumnsAsync entries at landing time or risk silent schema drift on mobile. Build green (Shared Release). Decision: `.squad/decisions/inbox/wash-dx24-lexical-patch.md`.

---

## CRITICAL RULE: SQLite Migration Defense-in-Depth (2026-04-24)

**Context:** DX24 vocab-page crash (NULL at ordinal 8, then LexicalUnitType missing column).

**Pattern:** Defensive ALTER TABLE patches MUST include DEFAULT clauses for non-nullable entity properties, PLUS an idempotent backfill UPDATE for databases patched before the fix shipped.

**Example:**
```csharp
// In migration: add NOT NULL DEFAULT when possible
migrationBuilder.AddColumn<int>(
name: "ExposureCount",
table: "VocabularyProgress",
nullable: false,
defaultValue: 0);

// In SyncService.PatchMissingColumnsAsync: idempotent backfill
var result = await connection.ExecuteAsync(
"UPDATE VocabularyProgress SET ExposureCount = 0 WHERE ExposureCount IS NULL");
_logger.LogWarning($"Patched {result} rows: ExposureCount NULL → 0");
```

**When migration can't use DEFAULT** (e.g., computed column, complex logic):
1. Still add the column with a safe default (NULL if nullable, 0 if int, empty string if text, etc.)
2. Implement idempotent post-migration backfill in `PatchMissingColumnsAsync` using `WHERE ... IS NULL` or `WHERE ... = <old_value>`
3. Log at WARNING level with row count
4. Non-fatal on error (log + continue, user app must not crash)

**Why this matters:**
- SQLite on iOS/Android can have legacy migration history seeded without schema applied
- `MigrateAsync` failures are caught + logged in `SyncService.InitializeDatabaseAsync` (non-fatal, degraded mode)
- Silent schema drift means NULLs in non-nullable EF entity properties → `SqliteException` on every query
- User sees crash only when navigating to a page that queries the incomplete schema (very late in session)

**For all future SQLite migrations (mobile):**
1. Make the SQLite migration Up() idempotent (either empty with doc comment explaining PatchMissingColumnsAsync handles it, or use SQL that works on repeat)
2. Add corresponding entry to `PatchMissingColumnsAsync` at the same time the migration lands
3. Use IF NOT EXISTS / pragma checks for idempotency
4. Reference: `AddMissingVocabularyWordLanguageColumn.cs` (empty Up() + patch pattern) and commit c9b1d0a (ExposureCount example)

**Verification:** Always test on Mac Catalyst Debug build (via `scripts/validate-mobile-migrations.sh`) and device before merge.

### 2026-05-30: Bulk Import Data Layer Patterns (Scout for File Import Feature)

**Context:** Pre-architecture scouting for Zoe to design a file import feature for vocabulary lists (CSV/text). No implementation — read-only investigation of existing patterns.

**Key discoveries:**

1. **Dedup inconsistency:** `VideoImportPipelineService` uses case-sensitive exact match on `TargetLanguageTerm` (line 368), but `LearningResourceRepository` utilities use case-insensitive trimmed comparison (line 940). This creates duplicates when same word appears with different casing across imports. **Recommendation:** Standardize to case-insensitive trimmed dedup in service layer (both YouTube and file imports).

2. **Shared vocabulary model:** `VocabularyWord` has NO `UserProfileId` — vocabulary is shared across users. Per-user data lives in `VocabularyProgress` (created lazily on first practice, NOT at import time). `ResourceVocabularyMapping` provides many-to-many between user's `LearningResource` and shared `VocabularyWord` pool.

3. **Batch import status tracking pattern:** `VideoImport` entity tracks pipeline state with enum statuses (`Pending`, `FetchingTranscript`, etc.). Background execution via `Task.Run`, caller polls `/api/imports/{id}` for progress. Pattern is reusable for file imports if status UI is needed.

4. **Repository transaction pattern:** `SaveResourceAsync` (LearningResourceRepository:210-250+) handles resource + vocabulary in single transaction: (a) detach nav props, (b) check existing resource, (c) save resource, (d) dedup words via `GetWordByTargetTermAsync`, (e) create mappings, (f) SaveChanges, (g) trigger sync. Use this pattern for file import to maintain data integrity.

5. **File picker ready to use:** `IFilePickerService` / `MauiFilePickerService` abstraction exists and is production-tested. Returns `Stream` for parsing. Static parser `VocabularyWord.ParseVocabularyWords()` exists but is NOT wired to repository/persistence — new service layer needed.

**Delivered:** `.squad/decisions/inbox/wash-import-scout-findings.md` for Zoe's architecture proposal.

---

## 2026-04-24 — Import Data Layer Scout (Multi-Agent Session)

Conducted data layer survey for new import feature. Identified YouTube pipeline as template, found file import UI/service gap, discovered dedup inconsistency, confirmed no schema changes needed.

**Key findings:**
- YouTube pipeline pattern in `VideoImportPipelineService` (dedup by TargetLanguageTerm)
- File import UI missing, but parser utility `VocabularyWord.ParseVocabularyWords()` exists
- Dedup inconsistency: case-sensitive in pipeline vs. case-insensitive in repo utilities
- MVP reuses all existing tables (LearningResource, VocabularyWord, ResourceVocabularyMapping)
- Migration gotcha: multi-TFM requires temporary single-TFM switch for `dotnet ef`

**Recommendations to Zoe:**
- Standardize dedup to case-insensitive trimmed
- New `VocabularyImportService` following YouTube pattern
- Optional `FileImport` entity for status tracking

**Coordinated with:** Zoe (architecture), River (AI), Kaylee (UI), Copilot

**Next:** Implementation uses findings for service + DB layer.

Loading
Loading