[1.3] Knowledge base ingestion pipeline (GCS → chunking + embedding → Vertex AI Vector Search)

**Epic:** E1 — TW RAG + Model Service
**Priority:** P0
**Role:** Engineer

## User Story
As an Engineer, I want to build an end-to-end ingestion pipeline that reads guidance materials from GCS, chunks and embeds them, and indexes them in Vertex AI Vector Search, so that the RAG layer can retrieve relevant passages based on student/teacher context.

## Context
Combines stories 1.7 (GCS ingestion connection) and 1.8 (chunking + embedding pipeline). GCS is the source-of-truth for guidance materials. Documents are read from the bucket, chunked with configurable size/overlap, embedded via Vertex AI, and indexed in Vector Search using `@google-cloud/aiplatform` (Vertex RAG Engine). Chunk size and embedding model choices affect retrieval quality — tune during implementation.

## Acceptance Criteria
- [ ] Different document types (PDF, Word, PPTX) successfully read from the GCS bucket
- [ ] Files imported into Vertex RAG Engine corpus
- [ ] Documents chunked with configurable chunk size and overlap
- [ ] Chunks embedded using Vertex AI embeddings model and indexed in Vertex AI Vector Search
- [ ] End-to-end pipeline tested: GCS document → Vector Search index
- [ ] Retrieval quality tested with sample queries representative of student/teacher contexts
- [ ] Connection and processing errors handled gracefully with logging

## Dependencies
- Blocked by: #11 (1.6) — cloud storage setup
- Unblocks: #16 (1.11) — context assembly layer

> *Combines #12 (1.7) and #13 (1.8)*

---
📄 **PRD:** [Part 1 — Glow CI PRD](https://github.com/String-sg/tw-context-intelligence/blob/main/Glow%20CI%20PRD.md#part-1-technical-integration--contextual-data--rag--llm)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.3] Knowledge base ingestion pipeline (GCS → chunking + embedding → Vertex AI Vector Search) #64

User Story

Context

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[1.3] Knowledge base ingestion pipeline (GCS → chunking + embedding → Vertex AI Vector Search) #64

Description

User Story

Context

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions