Skip to content

feat: add NDCG metric for context ranking evaluation#247

Open
hashwnath wants to merge 1 commit into
TonicAI:mainfrom
hashwnath:feat/230-ndcg-metric
Open

feat: add NDCG metric for context ranking evaluation#247
hashwnath wants to merge 1 commit into
TonicAI:mainfrom
hashwnath:feat/230-ndcg-metric

Conversation

@hashwnath
Copy link
Copy Markdown

Summary

Closes #230

  • Adds NDCGMetric class that evaluates the ranking quality of retrieved contexts using Normalized Discounted Cumulative Gain (NDCG)
  • Uses the existing context_relevancy_call LLM helper to determine binary relevance of each context item, then computes NDCG based on position ordering
  • Follows the same class pattern as RetrievalPrecisionMetric (same requirements, prompt, and calculate_metric structure)

How it works

NDCG measures whether relevant contexts appear earlier in the retrieved list:

  • DCG = sum(rel_i / log2(i + 1)) for each position i (1-indexed)
  • IDCG = DCG of the ideal ranking (all relevant items sorted first)
  • NDCG = DCG / IDCG, ranging from 0.0 to 1.0

A score of 1.0 means relevant contexts are optimally ranked. A lower score means relevant items appear later than ideal.

Edge cases handled

  • Empty context list raises ValueError
  • All irrelevant contexts returns 0.0 (IDCG is 0)
  • Single context item works correctly

Test plan

  • Unit test for compute_ndcg static method with known mathematical values (perfect ranking, worst ranking, mixed)
  • Integration tests via ValidateScorer following existing parametrized test patterns (all relevant, all irrelevant, single item cases)
  • test_ndcg_compute_ndcg passes locally

Generated with Claude Code

Add NDCGMetric that measures the ranking quality of retrieved contexts
using Normalized Discounted Cumulative Gain. This evaluates whether
relevant contexts appear earlier in the retrieved context list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support ndcg metric for contexts ranking

1 participant