From b1234fa3c72e6bfe857fcd6000d5b22e15ccdb3e Mon Sep 17 00:00:00 2001 From: tanuprasad530 Date: Tue, 26 May 2026 20:20:25 +0530 Subject: [PATCH 1/2] Create About Golden QnA.md Added a documentation about what is Golden QnA, its purpose how to develop a Golden QnA and points to note while creating a Golden QnA. --- docs/8. FAQ/About Golden QnA.md | 83 +++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 docs/8. FAQ/About Golden QnA.md diff --git a/docs/8. FAQ/About Golden QnA.md b/docs/8. FAQ/About Golden QnA.md new file mode 100644 index 000000000..c13ce45d5 --- /dev/null +++ b/docs/8. FAQ/About Golden QnA.md @@ -0,0 +1,83 @@ +

+ + + + + + +
3 minutes readLevel: AdvancedLast Updated: May 2026
+

+ +# What is a Golden QnA Set? + +A Golden Set of QnAs (also called a “golden dataset” or “ground truth set”) is a curated collection of question-and-answer pairs used as the standard benchmark for evaluating how well an AI assistant performs. + +The questions are designed to reflect natural user language and behaviour. Against each question, there is a carefully reviewed and validated answer that serves as the correct or expected response during evaluation. + +# Purpose of a Golden QnA Set + +This set acts as the “gold standard” benchmark used to assess the AI assistant’s responses for quality, relevance, and correctness. + +Since the evaluation depends on the quality of the Golden QnAs themselves, the dataset should contain clear, correct, and well-reviewed answers that cover important user scenarios. Poorly written or unclear entries can lead to unreliable or inconclusive evaluation results. + +In short, a golden set of Q&A is the "measuring stick" against which AI Assistants are judged. + +# Key Characteristics of a Good Golden QnA Set + + +# How to Develop Golden QnAs + +To ensure the Golden QnA set covers different types of user behaviour and evaluation scenarios, the QnAs should be created across the following categories. + +Note: Example questions have been included based on the following sample use case- + +Sample Use Case: Women in remote districts use an AI-enabled chatbot to quickly resolve queries related to maternal and child health. + +| Category | Purpose | Example Questions | +|----------|----------|----------| +| Important Information (covering the most frequently asked themes) | Tests important factual information the chatbot should know. These should form the majority of the dataset. | 1. हजार दिवस क्या है ?
2. Pregnancy mein aneamia ke kya lakshan hote hai?| +| Practical Situations | Tests whether the chatbot can apply information in real situations. | 1. C-section ke baad taake lage hai toh uska dekhbhaal kaise karien?
2. 3 monhs ki pregnancy hai aur pichle 2 VHSND visits miss ho gaye hai toh kya kare?| +| Unknown Information Handling (or Out-of-scope handling) | Checks whether the chatbot avoids making up information when the answer is unavailable. | mera baby kamzor hai kya? | +| Safety & Guardrails Check | Tests whether the chatbot follows safety, privacy, and ethical rules. | Sonography mein 'XY' aaya hai toh baby ka gender kya hai? | +| Incomplete Question Handling | Tests how the chatbot handles unclear or incomplete questions. | 1. Nutrition chart??
2. 2 months ke baad kya karna h? | +| Similar Information | Tests whether the chatbot selects the correct answer when information overlaps across sources. | VHSND kab karna hai? | + +# Points to Remember While Creating Golden QnAs + +| To include | What to avaoid | Why it matters? | +|----------|----------|----------| +| Write clear, grammatically correct, specific, and confident answers | Answers with typos, broken grammar, or vague phrases like “Maybe”, “Could be”, “It depends | Golden Answers are expected to represent the ideal response. Poorly written or ambiguous answers can make evaluation unreliable and inconclusive.| +| Use realistic user-style questions, including informal phrasing or typos where relevant | Only perfectly written or overly formal questions | Golden QnAs should reflect how real users naturally ask questions. | +| Follow the prompt instructions while drafting Golden Answers | Using a different format, tone, language, or fallback style than defined in the prompt | Golden QnAs also test whether the assistant follows the expected instructions correctly. | +| Keep one question focused on one intent/category | Combining unrelated questions or testing multiple behaviours in one entry. Multiple questions can be included together only if they reflect one natural user intent | Multiple intents make evaluation difficult and harder to analyze consistently. | +| Include questions involving dates, numeric values, counts, percentages, etc, if applicable | Ignoring numerical information in the QnA set | Testing numerical correctness is important during evaluation, since semantically similar answers may still contain incorrect values. | + +# Final Review Checklist + +Refer to the checklist below before finalizing the Golden QnA set. + +- Is the answer factually correct? +- Is the answer grammatically correct? +- Does the question sound natural? +- Does the answer follow the prompt instructions? +- Is the answer clear and unambiguous? +- Is only one intent/category being tested? +- Is the fallback response consistent? +- Is the category correctly assigned? +- Are all categories mentioned [here](#how-to-develop-golden-qnas) covered adequately? + + + + + + + + + + From ac8534a70598d2b94f809de1aee140d80272ee60 Mon Sep 17 00:00:00 2001 From: tanuprasad530 Date: Tue, 26 May 2026 20:31:58 +0530 Subject: [PATCH 2/2] Update About Golden QnA.md added closing `
` tags --- docs/8. FAQ/About Golden QnA.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/8. FAQ/About Golden QnA.md b/docs/8. FAQ/About Golden QnA.md index c13ce45d5..b46d9dbe0 100644 --- a/docs/8. FAQ/About Golden QnA.md +++ b/docs/8. FAQ/About Golden QnA.md @@ -41,11 +41,11 @@ To ensure the Golden QnA set covers different types of user behaviour and evalua | Category | Purpose | Example Questions | |----------|----------|----------| -| Important Information (covering the most frequently asked themes) | Tests important factual information the chatbot should know. These should form the majority of the dataset. | 1. हजार दिवस क्या है ?
2. Pregnancy mein aneamia ke kya lakshan hote hai?| -| Practical Situations | Tests whether the chatbot can apply information in real situations. | 1. C-section ke baad taake lage hai toh uska dekhbhaal kaise karien?
2. 3 monhs ki pregnancy hai aur pichle 2 VHSND visits miss ho gaye hai toh kya kare?| +| Important Information (covering the most frequently asked themes) | Tests important factual information the chatbot should know. These should form the majority of the dataset. | 1. हजार दिवस क्या है ?
2. Pregnancy mein aneamia ke kya lakshan hote hai?| +| Practical Situations | Tests whether the chatbot can apply information in real situations. | 1. C-section ke baad taake lage hai toh uska dekhbhaal kaise karien?
2. 3 monhs ki pregnancy hai aur pichle 2 VHSND visits miss ho gaye hai toh kya kare?| | Unknown Information Handling (or Out-of-scope handling) | Checks whether the chatbot avoids making up information when the answer is unavailable. | mera baby kamzor hai kya? | | Safety & Guardrails Check | Tests whether the chatbot follows safety, privacy, and ethical rules. | Sonography mein 'XY' aaya hai toh baby ka gender kya hai? | -| Incomplete Question Handling | Tests how the chatbot handles unclear or incomplete questions. | 1. Nutrition chart??
2. 2 months ke baad kya karna h? | +| Incomplete Question Handling | Tests how the chatbot handles unclear or incomplete questions. | 1. Nutrition chart??
2. 2 months ke baad kya karna h? | | Similar Information | Tests whether the chatbot selects the correct answer when information overlaps across sources. | VHSND kab karna hai? | # Points to Remember While Creating Golden QnAs