Epic: E4 — Testing + Polishing + TRA
Role: PM + Engineer
User Story
As a PM + Engineer, I want to define and run a guardrail test suite for Glow CI's LLM outputs so that we can catch and refine failure modes before pilot launch.
Context
Guardrail testing focuses on two categories of LLM failure specific to Glow CI:
- Relevance failure — surfacing the wrong content to the wrong teacher at the wrong moment (e.g. wellbeing guidance triggered for an academic context)
- Sensitivity failure — surfacing content that is technically correct but contextually inappropriate given the student's situation (e.g. heavy intervention guidance for a mild case)
Test cases should be defined with the Knowledge Base Steering Committee and West Zone Sups, who have domain knowledge of what appropriate vs inappropriate outputs look like.
Acceptance Criteria
Dependencies
📄 PRD: E4 — Glow CI PRD
Epic: E4 — Testing + Polishing + TRA
Role: PM + Engineer
User Story
As a PM + Engineer, I want to define and run a guardrail test suite for Glow CI's LLM outputs so that we can catch and refine failure modes before pilot launch.
Context
Guardrail testing focuses on two categories of LLM failure specific to Glow CI:
Test cases should be defined with the Knowledge Base Steering Committee and West Zone Sups, who have domain knowledge of what appropriate vs inappropriate outputs look like.
Acceptance Criteria
Dependencies
📄 PRD: E4 — Glow CI PRD