[4.1] LLM Guardrails Testing & Refinement

**Epic:** E4 — Testing + Polishing + TRA
**Role:** PM + Engineer

## User Story
As a PM + Engineer, I want to define and run a guardrail test suite for Glow CI's LLM outputs so that we can catch and refine failure modes before pilot launch.

## Context
Guardrail testing focuses on two categories of LLM failure specific to Glow CI:
1. **Relevance failure** — surfacing the wrong content to the wrong teacher at the wrong moment (e.g. wellbeing guidance triggered for an academic context)
2. **Sensitivity failure** — surfacing content that is technically correct but contextually inappropriate given the student's situation (e.g. heavy intervention guidance for a mild case)

Test cases should be defined with the Knowledge Base Steering Committee and West Zone Sups, who have domain knowledge of what appropriate vs inappropriate outputs look like.

## Acceptance Criteria
- [ ] Guardrail test cases written covering relevance failure and sensitivity failure scenarios
- [ ] Test cases reviewed and signed off by Knowledge Base Steering Committee / West Zone Sups
- [ ] All test cases passing before pilot launch (0 out-of-scope or inappropriate responses)
- [ ] Refinement loop documented: how failures get escalated and prompt/retrieval tuned

## Dependencies
- Requires: #54 — AI Evaluations setup (evals platform used to run guardrail tests)
- Requires: E1 RAG pipeline and E2 MicroFE to be functional for end-to-end test runs

---
📄 **PRD:** [E4 — Glow CI PRD](https://github.com/String-sg/tw-context-intelligence/blob/main/Glow%20CI%20PRD.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.1] LLM Guardrails Testing & Refinement #58

User Story

Context

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[4.1] LLM Guardrails Testing & Refinement #58

Description

User Story

Context

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions