feat: add NDCG metric for context ranking evaluation by hashwnath · Pull Request #247 · TonicAI/tonic_validate

hashwnath · 2026-06-01T02:29:44Z

Summary

Closes #230

Adds NDCGMetric class that evaluates the ranking quality of retrieved contexts using Normalized Discounted Cumulative Gain (NDCG)
Uses the existing context_relevancy_call LLM helper to determine binary relevance of each context item, then computes NDCG based on position ordering
Follows the same class pattern as RetrievalPrecisionMetric (same requirements, prompt, and calculate_metric structure)

How it works

NDCG measures whether relevant contexts appear earlier in the retrieved list:

DCG = sum(rel_i / log2(i + 1)) for each position i (1-indexed)
IDCG = DCG of the ideal ranking (all relevant items sorted first)
NDCG = DCG / IDCG, ranging from 0.0 to 1.0

A score of 1.0 means relevant contexts are optimally ranked. A lower score means relevant items appear later than ideal.

Edge cases handled

Empty context list raises ValueError
All irrelevant contexts returns 0.0 (IDCG is 0)
Single context item works correctly

Test plan

Unit test for compute_ndcg static method with known mathematical values (perfect ranking, worst ranking, mixed)
Integration tests via ValidateScorer following existing parametrized test patterns (all relevant, all irrelevant, single item cases)
test_ndcg_compute_ndcg passes locally

Generated with Claude Code

Add NDCGMetric that measures the ranking quality of retrieved contexts using Normalized Discounted Cumulative Gain. This evaluates whether relevant contexts appear earlier in the retrieved context list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add NDCG metric for context ranking evaluation#247

feat: add NDCG metric for context ranking evaluation#247
hashwnath wants to merge 1 commit into
TonicAI:mainfrom
hashwnath:feat/230-ndcg-metric

hashwnath commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hashwnath commented Jun 1, 2026

Summary

How it works

Edge cases handled

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant