davidortinau · davidortinau · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026 · Apr 27, 2026
diff --git a/.squad/agents/kaylee/history.md b/.squad/agents/kaylee/history.md
@@ -266,3 +266,35 @@ Wash's mobile observability memo identifies capturing Blazor WebView JavaScript
 
 5. **Bootstrap icon + styling conventions:** Existing pages use `bi-*` icons consistently, `form-control-ss` for inputs, `card-ss` for sections, `ss-body2`/`ss-title3` typography.
 
+
+## Learnings (continued)
+
+- 2026-04-24: **Import UI Pattern Scout** — Completed planning investigation for Zoe's Import page design. Surveyed existing admin/management pages (Settings, ResourceAdd, ResourceEdit, Resources, Onboarding). Key findings: (1) Blazor-only platform (no MauiReactor parity yet); (2) Settings uses lightweight card sections, ResourceAdd shows multi-card + file import + preview table pattern, Resources shows polished list/lookup with search + filter + dual view modes + Virtualize; (3) File picker abstraction ready (IFilePickerService + WebFilePickerService for Blazor, MauiFilePickerService for MAUI); (4) Form validation is explicit null checks → Toast (no DataAnnotations UI); (5) Import already in top nav (NavMenu.razor line 60, icon bi-box-arrow-in-down); (6) Import.razor exists (28.7 KB YouTube-only template, reusable structure). Documented in `.squad/decisions/inbox/kaylee-import-ui-scout.md`.
+
+---
+
+## 2026-04-24 — Import UI Pattern Scout (Multi-Agent Session)
+
+Surveyed UI patterns, form conventions, file picker abstractions, navigation structure for import feature architecture.
+
+**Key findings:**
+- Form patterns: Bootstrap cards + sections (card-ss, form-control-ss, theme typography)
+- Multi-step flows: Settings (tabbed), Import (URL → Transcript → Polish → Save), ResourceAdd (card + file + preview)
+- File import pattern in ResourceAdd: InputFile + delimiter radios + editable preview table
+- Resource lookup: Resources.razor shows search + filter + dual views + Virtualize
+- File picker abstraction: IFilePickerService (Blazor: WebFilePickerService, MAUI: MauiFilePickerService)
+- Navigation: Import already in top-level nav (bi-box-arrow-in-down)
+
+**Recommendations to Zoe:**
+- Keep Import in top-level nav (good positioning)
+- Create new `/import-content` page (separate from YouTube)
+- Reuse preview table from ResourceAdd.razor
+- Use InputFile for file upload
+- Toast for notifications
+
+**Reusable components:** PageHeader, card-ss, form-control-ss, preview table pattern
+
+**Coordinated with:** Zoe (architecture), Wash (data layer), River (AI), Copilot
+
+**Next:** Implementation team uses patterns to build ImportContent.razor page.
+
diff --git a/.squad/agents/river/history.md b/.squad/agents/river/history.md
@@ -20,6 +20,94 @@
 - Grading philosophy for sentence shortcut: grade for CONTEXTUAL USAGE (using word naturally in a sentence), never for definition-recitation ("X means Y")
 - The `userMeaning` template variable in GradeSentence.scriban-txt maps to "which I mean to express..." — passing meta-instructions here biases AI grading toward definition patterns
 
+---
+
+## 2026-04-26 — Import Feature AI Strategy Design
+
+**Session:** Data Import AI Strategy — Planning phase (no code)  
+**Status:** 📋 Design Complete — Awaiting Zoe architecture plan  
+**Deliverable:** `.squad/decisions/inbox/river-import-ai-design.md`
+
+**Key Learnings:**
+
+### Reuse-first approach wins
+- **60% template reuse** achieved: `ExtractVocabularyFromTranscript.scriban-txt` is 90% reusable for vocabulary import (just swap "transcript" context for "imported data" context), `GetTranslations.scriban-txt` provides translation-fill pattern, `CleanTranscript.scriban-txt` provides cleanup logic for transcript segmentation
+- **VocabularyExtractionResponse DTO** is PERFECT for import — already has [Description] attributes, TOPIK level, LexicalUnitType, RelatedTerms, Tags. Zero new DTO needed for vocabulary import (Task 3).
+- Reuse reduces risk, token cost, and maintenance burden
+
+### Heuristics-first routing saves tokens
+- **80%+ of imports** are CSV/TSV/JSON with clear structure (Anki, Quizlet, spreadsheet exports) → deterministic parsing (regex, CSV lib, JSON deserialize)
+- **20%** are messy (free-form text, transcripts, ambiguous delimiters) → need AI
+- Routing rule: heuristics first (>= 0.85 confidence), AI fallback if inconclusive
+- Token savings: ~500-1000 tokens per import (~$0.001-0.002 per import avoided)
+
+### Confidence thresholds create UX safety valve
+- **>= 0.85** = auto-proceed (high confidence)
+- **0.70-0.84** = show UI confirmation with AI reasoning, proceed if Captain approves
+- **< 0.70** = show warning + manual format selection fallback
+- Permissive philosophy: extract good, flag bad (UnparseableLines), NEVER fail entire import
+
+### Chunking strategy for large imports
+- **Vocabulary:** batch 200-300 rows per call (balance latency vs token count)
+- **Phrases:** batch 100-150 phrases per call (phrases are longer than vocab words)
+- **Transcripts:** chunk 2000-3000 chars per call (avoid context window overflow, maintain coherence)
+- Parallel calls: cap at 3 concurrent to avoid rate limits
+
+### Five distinct AI tasks identified
+1. **Format inference** (when Captain skips format field) → `ImportFormatInferenceResponse` (DetectedFormat, Delimiter, HasHeaderRow, ColumnRoles, Confidence, Notes)
+2. **Content classification** (Vocabulary vs Phrases vs Transcript) → `ImportContentClassificationResponse` (ContentType, Confidence, Reasoning)
+3. **Vocabulary extraction** → REUSE `VocabularyExtractionResponse` DTO
+4. **Phrase extraction** → `PhraseExtractionResponse` (Entries, UnparseableLines)
+5. **Transcript segmentation** → `TranscriptExtractionResponse` (Segments, optional ExtractedVocabulary)
+
+Each task has clear input → output DTO → confidence signal.
+
+### [Description] attributes > JSON formatting
+- Microsoft.Extensions.AI uses [Description] attributes automatically for prompt context
+- NO manual JSON formatting in Scriban templates (library handles serialization/deserialization)
+- ONLY use [JsonPropertyName] when AI must output specific field name that differs from C# convention
+- This pattern already proven in existing codebase (`VocabularyExtractionResponse`, `ExtractedVocabularyItem`)
+
+### Translation-fill preserves permissiveness
+- If Captain provides only target language terms (one column) → AI generates missing native-language translations
+- Never reject for missing data → auto-fill gracefully
+- Follows project philosophy: "permissive grading, accept variations, fill missing, never reject"
+
+### Four new Scriban templates needed
+1. `ImportFormatInference.scriban-txt` (Task 1 — format detection)
+2. `ImportContentClassification.scriban-txt` (Task 2 — content type classification)
+3. `ImportPhraseExtraction.scriban-txt` (Task 4 — hybrid of ExtractVocabularyFromTranscript + GetTranslations)
+4. `ImportTranscriptSegmentation.scriban-txt` (Task 5 — CleanTranscript + segmentation + speaker/timestamp detection)
+
+Templates will be written AFTER Zoe's architecture plan is approved (next phase).
+
+### Cost estimates
+- **Format inference:** ~$0.001 per import (negligible)
+- **Content classification:** ~$0.002 per import (negligible)
+- **Vocabulary extraction (300 rows):** ~$0.01-0.03 per import
+- **Transcript (10k chars):** ~$0.03-0.05 per import
+- **Total per-import:** $0.01-0.10 depending on size (acceptable)
+
+### Open questions for Captain
+1. Transcript vocabulary extraction: always or optional checkbox?
+2. Duplicate handling: skip, update, create new, or ask each time?
+3. LexicalUnitType override during import review?
+4. Batch import limit (hard cap to avoid UI freeze)?
+
+Documented in design doc section "Open Questions for Captain".
+
+### References examined
+- `AiService.cs` — SendPrompt<T> pattern (lines 45-74)
+- `VocabularyExtractionResponse.cs` — [Description] attribute pattern (lines 1-94)
+- `ExtractVocabularyFromTranscript.scriban-txt` — extraction rules, permissiveness (lines 1-75)
+- `GetTranslations.scriban-txt` — translation generation pattern (lines 1-24)
+- `CleanTranscript.scriban-txt` — transcript cleanup logic
+- `SmartResourceService.cs` — LearningResource wiring pattern (no AI usage, but architectural reference)
+
+**Next:** Zoe's architecture plan → River writes 4 Scriban templates → Wash implements ImportService + UI → Jayne writes E2E tests
+
+---
+
 - AI prompts are Scriban templates in `src/SentenceStudio.AppLib/Resources/Raw/*.scriban-txt`
 - AI grading uses `AiService.SendPrompt<T>()` with structured JSON responses
 - Grading philosophy: VERY permissive — accept associations, contrasts, feelings, moods, cultural links
@@ -365,3 +453,26 @@ Full analysis and design rationale captured in:
 2. **Testing:** After next video import, verify AI returns proper `lexicalUnitType` + `relatedTerms` in response JSON. Check DB to confirm values flow through.
 3. **Monitoring:** Log LexicalUnitType distribution (Word/Phrase/Sentence ratio) to validate AI classifies sensibly. If too many Unknown, refine prompt guidance.
 
+
+---
+
+## 2026-04-24 — Import AI Strategy (Multi-Agent Session)
+
+Designed AI strategy for new data import feature: 5 tasks (format inference, content classification, vocabulary/phrase/transcript extraction), heuristic-first approach, structured DTOs via `SendPrompt<T>`.
+
+**Key decisions:**
+- Heuristics-first, AI fallback: deterministic checks fast/free; AI only when inconclusive (< 0.7 confidence)
+- Permissive grading: accept reasonable variations, never reject for spelling
+- 5 prompt tasks with confidence thresholds (>= 0.85 auto-proceed, < 0.85 show UI confirmation)
+- Structured DTOs: ImportFormatInferenceResponse, ImportContentClassificationResponse, reuse VocabularyExtractionResponse
+- All prompts in `.scriban-txt` templates, no manual JSON formatting (Captain's rule)
+
+**Prompt templates to build:**
+- Format Inference (detect delimiter, column roles, header presence)
+- Content Classification (Vocabulary vs Phrases vs Transcript)
+- Reuse existing ExtractVocabularyFromTranscript for extraction tasks
+
+**Coordinated with:** Zoe (architecture), Wash (data layer), Kaylee (UI), Copilot
+
+**Next:** Implement prompt templates. Integration into `ContentImportService` by implementation team.
+
diff --git a/.squad/agents/scribe/history.md b/.squad/agents/scribe/history.md
@@ -18,3 +18,27 @@ Agent Scribe initialized and ready for work.
 Initial setup complete.
 - For Azure publish logs, capture environment + region, successful resource list, public webapp URL, Aspire dashboard URL, and note custom-domain follow-up separately from deploy status.
 - **BlazorHybrid NavigateTo patterns (2026-04-18):** `LoginPage.razor` and `RegisterPage.razor` use `forceLoad: true`, which works for Web (cookie-backed) but breaks MAUI (loses in-memory auth state). Platform-gating needed via `isWeb` pattern. See `NavMenu.razor:106-107` for existing pattern to borrow.
+
+---
+
+## 2026-04-24 — Data Import Architecture Session Orchestration
+
+Scribed multi-agent session for data-import-architecture-plan. Deployed Zoe (architect), Wash (data scout), River (AI strategy), Kaylee (UI scout). Recorded Captain directive on scope separation.
+
+**Orchestration tasks completed:**
+1. ✅ 5 orchestration logs (Zoe, Wash, River, Kaylee, Copilot directive)
+2. ✅ Session log with key outcomes + next steps
+3. ✅ Merged 6 inbox decisions into decisions.md, deleted inbox files
+4. ✅ Updated all 4 agent histories with import-plan context
+5. ✅ Decisions.md now 169KB (will archive if >20KB trigger next run)
+6. ✅ Git staged for commit with proper trailers
+
+**Key outcomes recorded:**
+- MVP architecture: `/import-content` page, ContentImportService in Shared, no new DB tables
+- Placement: separate Video Subscriptions from generic import (per Captain directive)
+- AI strategy: heuristics-first, 5 prompt templates, confidence thresholds
+- UI patterns: reuse ResourceAdd card, InputFile, preview table
+- Data integrity: dedup by TargetLanguageTerm (case-insensitive trimmed)
+
+**Next:** Implementation team begins service + UI build. River engineers prompts.
+
diff --git a/.squad/agents/wash/history.md b/.squad/agents/wash/history.md
@@ -1558,3 +1558,87 @@ None. Exact match to Captain's specification.
 
 
 - 2026-04-24: **DX24 LexicalUnitType Hotfix (Production Emergency)** — Captain installed Release iOS build (feat/vocab Word-vs-Phrase, commit ff0bb25) to DX24 (iPhone) but app errored on every activity page with "no such column: LexicalUnitType". Root cause: `SyncService.InitializeDatabaseAsync` has a catch-all at lines 227-230 that logs MigrateAsync exceptions and continues, so the new migration `20260423213242_AddLexicalUnitTypeAndConstituents` failed silently on device. Established pattern from `AddMissingVocabularyWordLanguageColumn.cs`: SQLite migrations for mobile must be idempotent via `PatchMissingColumnsAsync` because MigrateAsync failures are swallowed. Fix: (1) Made SQLite migration Up() empty with doc comment explaining snapshot-only advancement + PatchMissingColumnsAsync pre-migration patching pattern. (2) Extended `PatchMissingColumnsAsync` to add `LexicalUnitType INTEGER NOT NULL DEFAULT 0` column to VocabularyWord if missing AND create `PhraseConstituent` table with all 3 indexes (FK1, FK2, unique composite) using `CREATE TABLE IF NOT EXISTS` + `CREATE INDEX IF NOT EXISTS` for idempotency. Pattern confirmed: future SQLite migrations adding columns/tables must pair migration file with PatchMissingColumnsAsync entries at landing time or risk silent schema drift on mobile. Build green (Shared Release). Decision: `.squad/decisions/inbox/wash-dx24-lexical-patch.md`.
+
+---
+
+## CRITICAL RULE: SQLite Migration Defense-in-Depth (2026-04-24)
+
+**Context:** DX24 vocab-page crash (NULL at ordinal 8, then LexicalUnitType missing column).
+
+**Pattern:** Defensive ALTER TABLE patches MUST include DEFAULT clauses for non-nullable entity properties, PLUS an idempotent backfill UPDATE for databases patched before the fix shipped.
+
+**Example:**
+```csharp
+// In migration: add NOT NULL DEFAULT when possible
+migrationBuilder.AddColumn<int>(
+    name: "ExposureCount",
+    table: "VocabularyProgress",
+    nullable: false,
+    defaultValue: 0);
+
+// In SyncService.PatchMissingColumnsAsync: idempotent backfill
+var result = await connection.ExecuteAsync(
+    "UPDATE VocabularyProgress SET ExposureCount = 0 WHERE ExposureCount IS NULL");
+_logger.LogWarning($"Patched {result} rows: ExposureCount NULL → 0");
+```
+
+**When migration can't use DEFAULT** (e.g., computed column, complex logic):
+1. Still add the column with a safe default (NULL if nullable, 0 if int, empty string if text, etc.)
+2. Implement idempotent post-migration backfill in `PatchMissingColumnsAsync` using `WHERE ... IS NULL` or `WHERE ... = <old_value>`
+3. Log at WARNING level with row count
+4. Non-fatal on error (log + continue, user app must not crash)
+
+**Why this matters:**
+- SQLite on iOS/Android can have legacy migration history seeded without schema applied
+- `MigrateAsync` failures are caught + logged in `SyncService.InitializeDatabaseAsync` (non-fatal, degraded mode)
+- Silent schema drift means NULLs in non-nullable EF entity properties → `SqliteException` on every query
+- User sees crash only when navigating to a page that queries the incomplete schema (very late in session)
+
+**For all future SQLite migrations (mobile):**
+1. Make the SQLite migration Up() idempotent (either empty with doc comment explaining PatchMissingColumnsAsync handles it, or use SQL that works on repeat)
+2. Add corresponding entry to `PatchMissingColumnsAsync` at the same time the migration lands
+3. Use IF NOT EXISTS / pragma checks for idempotency
+4. Reference: `AddMissingVocabularyWordLanguageColumn.cs` (empty Up() + patch pattern) and commit c9b1d0a (ExposureCount example)
+
+**Verification:** Always test on Mac Catalyst Debug build (via `scripts/validate-mobile-migrations.sh`) and device before merge.
+
+### 2026-05-30: Bulk Import Data Layer Patterns (Scout for File Import Feature)
+
+**Context:** Pre-architecture scouting for Zoe to design a file import feature for vocabulary lists (CSV/text). No implementation — read-only investigation of existing patterns.
+
+**Key discoveries:**
+
+1. **Dedup inconsistency:** `VideoImportPipelineService` uses case-sensitive exact match on `TargetLanguageTerm` (line 368), but `LearningResourceRepository` utilities use case-insensitive trimmed comparison (line 940). This creates duplicates when same word appears with different casing across imports. **Recommendation:** Standardize to case-insensitive trimmed dedup in service layer (both YouTube and file imports).
+
+2. **Shared vocabulary model:** `VocabularyWord` has NO `UserProfileId` — vocabulary is shared across users. Per-user data lives in `VocabularyProgress` (created lazily on first practice, NOT at import time). `ResourceVocabularyMapping` provides many-to-many between user's `LearningResource` and shared `VocabularyWord` pool.
+
+3. **Batch import status tracking pattern:** `VideoImport` entity tracks pipeline state with enum statuses (`Pending`, `FetchingTranscript`, etc.). Background execution via `Task.Run`, caller polls `/api/imports/{id}` for progress. Pattern is reusable for file imports if status UI is needed.
+
+4. **Repository transaction pattern:** `SaveResourceAsync` (LearningResourceRepository:210-250+) handles resource + vocabulary in single transaction: (a) detach nav props, (b) check existing resource, (c) save resource, (d) dedup words via `GetWordByTargetTermAsync`, (e) create mappings, (f) SaveChanges, (g) trigger sync. Use this pattern for file import to maintain data integrity.
+
+5. **File picker ready to use:** `IFilePickerService` / `MauiFilePickerService` abstraction exists and is production-tested. Returns `Stream` for parsing. Static parser `VocabularyWord.ParseVocabularyWords()` exists but is NOT wired to repository/persistence — new service layer needed.
+
+**Delivered:** `.squad/decisions/inbox/wash-import-scout-findings.md` for Zoe's architecture proposal.
+
+---
+
+## 2026-04-24 — Import Data Layer Scout (Multi-Agent Session)
+
+Conducted data layer survey for new import feature. Identified YouTube pipeline as template, found file import UI/service gap, discovered dedup inconsistency, confirmed no schema changes needed.
+
+**Key findings:**
+- YouTube pipeline pattern in `VideoImportPipelineService` (dedup by TargetLanguageTerm)
+- File import UI missing, but parser utility `VocabularyWord.ParseVocabularyWords()` exists
+- Dedup inconsistency: case-sensitive in pipeline vs. case-insensitive in repo utilities
+- MVP reuses all existing tables (LearningResource, VocabularyWord, ResourceVocabularyMapping)
+- Migration gotcha: multi-TFM requires temporary single-TFM switch for `dotnet ef`
+
+**Recommendations to Zoe:**
+- Standardize dedup to case-insensitive trimmed
+- New `VocabularyImportService` following YouTube pattern
+- Optional `FileImport` entity for status tracking
+
+**Coordinated with:** Zoe (architecture), River (AI), Kaylee (UI), Copilot
+
+**Next:** Implementation uses findings for service + DB layer.
+