feat: improve dv-query skill with advanced OData filters and limits#67
Open
aadharshkannan wants to merge 2 commits into
Open
feat: improve dv-query skill with advanced OData filters and limits#67aadharshkannan wants to merge 2 commits into
aadharshkannan wants to merge 2 commits into
Conversation
Adds an "Advanced OData filters & limits" section to dv-query/SKILL.md documenting supported-but-undocumented OData operations the model got wrong against the production skill: - In/NotIn/ContainValues REQUIRE parameter aliases (@p1/@p2); inline arrays return 400. (Model stably used unsupported inline arrays 3/3.) - Cannot \ parent rows by a related/expanded field; sort client-side. (Model stably used unsupported related-field orderby 3/3.) - lambda any()/all(), MultiSelect ContainValues, relative-date functions, \ nested options (no nested \/\), \ 5000 cap (RetrieveTotalRecordCount/paging), aggregate 50k segment-and-combine. Includes 9 harder SkillOpt eval items (evals/skillopt/dv_query_advanced.jsonl) in the Goldilocks zone (baseline hard=0.75). The two genuine gaps above each recover fail->pass with the documentation edit, with no regression on the existing dv-query eval set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
aadharshkannan
added a commit
to aadharshkannan/Dataverse-skills
that referenced
this pull request
Jun 2, 2026
Move evals/skillopt/dv_query_advanced.jsonl into the consolidated eval-set PR so all SkillOpt eval data lives in one place; PR microsoft#67 keeps only the dv-query SKILL.md change it ships. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ft#66 Drop evals/skillopt/dv_query_advanced.jsonl here; it now lives in PR microsoft#66 alongside the other SkillOpt eval sets. This PR ships only the dv-query SKILL.md change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
saurabhrb
approved these changes
Jun 3, 2026
arorashivam96
approved these changes
Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
A strong coding model (gpt-5.5), when given only the current
dv-queryskill, stably writes two Dataverse OData queries that fail at runtime. This PR documents the correct syntax indv-query/SKILL.mdand ships the eval set that proves the gap and the fix. Measured lift on the new "hard" eval set: hard 0.75 → 0.875, with no regression on the existing dv-query suite.This was not hand-authored guesswork — every line added here is backed by a reproducible eval that the production skill fails and the edited skill passes.
1. Why this fix? (the evidence)
I built 8 harder eval items targeting supported-but-undocumented OData operations and ran them against the current production
dv-queryskill. Baseline: hard = 0.75 (6/8) — deliberately in the "Goldilocks zone" (not saturated, so there's signal). Two items failed, and both failures are genuine runtime bugs, not stylistic nitpicks:Gap A —
In/NotIn/ContainValuesrequire parameter aliasesPrompt (paraphrased): filter a table where a column is in a large list of values, server-side, single query.
What the model produced (3/3 runs, production skill):
Judge verdict (P1 claim failed, score 0.00):
Inline arrays inside these functions return HTTP 400. The correct form requires parameter aliases passed as separate query parameters:
The current skill never says this, so the model has no way to know. Stable failure: 3/3 runs.
Gap B — cannot
$orderbyparent rows by a related/expanded fieldPrompt (paraphrased): list opportunities ordered by their parent account's name.
What the model produced (3/3 runs, production skill):
Judge verdict (P1 claim failed, score 0.00):
Dataverse does not support ordering parent rows by a related/expanded field. The correct approach is to
$expandthe field and sort client-side. Stable failure: 3/3 runs on the original skill.2. How SkillOpt derived this
SkillOpt evaluates a Copilot plugin holistically and isolates the contribution of a single skill:
auth.py/scripts) so the agent behaves exactly as in production.SKILL.md(here,dv-query) — everything else is held constant.must_contain/must_load_skillchecks.hard = 1only if all P1 claims ≥ 0.7 AND deterministic checks pass.Crucially, the prompts never name the OData function — they describe the user's intent ("at least one related row matching…", "in a large list of values"). This tests whether the skill teaches the model the right tool, not whether the model can pattern-match a keyword.
3. Result (before → after)
qa_in_function_large_list(Gap A)probeH1_orderby_related(Gap B)The lone miss in the existing suite (
probeA1_lookup_custom) fails identically on the original skill (A/B: 0/3 original vs 1/3 edited) — pre-existing model variance, not caused by this PR.4. What changed
.github/plugins/dataverse/skills/dv-query/SKILL.md— new "Advanced OData filters & limits" section. Beyond the two proven gaps above, it consolidates correct, portable syntax for adjacent operations the skill omitted:any()/all()over collection nav properties (with the empty-collection vacuous-truth caveat)ContainValues/DoesNotContainValuesfor MultiSelect choice columnsToday,LastXDays, …) and their alias form$expandoptions ($select/$filter/$orderby/$top) and the nested-expand restriction$count5,000 cap →RetrieveTotalRecordCount/ pagingevals/skillopt/dv_query_advanced.jsonl— the 9 eval items (8 advanced + the orderby probe) in the same SkillOpt format as the existingdv_query.jsonl, so the gap and the fix are independently reproducible.5. How to reproduce
The SkillOpt config + generator are in SkillOpt PR #1. Run:
Point
--skillat the production skill to see 0.75; at this PR's skill to see 0.875/8-of-8.