Skip to content

RAG-01 | PII-Mask + Embed + Index #34

Description

@Manish281005

Task: RAG-01 | PII-Mask + Embed + Index

Estimated Hours: 3h | Owner: MK | Priority: P0 | Status: Completed

Definition:
Retrieve last 180 days of sent mail via MS Graph delta query, apply PII masking (Presidio), embed using text-embedding-ada-002, and index into Azure AI Search (or ChromaDB fallback). Creates foundation for RAG precedent retrieval.

Note

Real GPT-4o text-embedding-ada-002, Graph API sent-email retrieval, and batch processing are fully wired up with deterministic fallback support.

Acceptance Criteria:

  • Graph delta query retrieves ≥50 sent emails
  • PII masking applied to body before embedding (names, emails, phone numbers, cards)
  • Embeddings generated using ada-002 model
  • Index created with schema: {email_id, sender, timestamp, subject, masked_body, embedding}
  • Index searchable by subject + masked_body (Searchable by cosine similarity matching)
  • Batch process: 50 emails at a time
  • Indexing completes in <5 minutes for typical dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions