feat: improve dv-query skill with advanced OData filters and limits by aadharshkannan · Pull Request #67 · microsoft/Dataverse-skills

aadharshkannan · 2026-06-01T17:51:52Z

TL;DR

A strong coding model (gpt-5.5), when given only the current dv-query skill, stably writes two Dataverse OData queries that fail at runtime. This PR documents the correct syntax in dv-query/SKILL.md and ships the eval set that proves the gap and the fix. Measured lift on the new "hard" eval set: hard 0.75 → 0.875, with no regression on the existing dv-query suite.

This was not hand-authored guesswork — every line added here is backed by a reproducible eval that the production skill fails and the edited skill passes.

1. Why this fix? (the evidence)

I built 8 harder eval items targeting supported-but-undocumented OData operations and ran them against the current production dv-query skill. Baseline: hard = 0.75 (6/8) — deliberately in the "Goldilocks zone" (not saturated, so there's signal). Two items failed, and both failures are genuine runtime bugs, not stylistic nitpicks:

Gap A — `In` / `NotIn` / `ContainValues` require parameter aliases

Prompt (paraphrased): filter a table where a column is in a large list of values, server-side, single query.

What the model produced (3/3 runs, production skill):

filter="Microsoft.Dynamics.CRM.In(PropertyName='accountid',PropertyValues=[...])"

Judge verdict (P1 claim failed, score 0.00):

The response uses literal PropertyName='accountid' and PropertyValues=[...], not parameter aliases like @p1 and @p2.

Inline arrays inside these functions return HTTP 400. The correct form requires parameter aliases passed as separate query parameters:

$filter=Microsoft.Dynamics.CRM.In(PropertyName=@p1,PropertyValues=@p2)&@p1='statuscode'&@p2=[1,2,3]

The current skill never says this, so the model has no way to know. Stable failure: 3/3 runs.

Gap B — cannot `$orderby` parent rows by a related/expanded field

Prompt (paraphrased): list opportunities ordered by their parent account's name.

What the model produced (3/3 runs, production skill):

orderby=["parentaccountid/name asc", "name asc"]

Judge verdict (P1 claim failed, score 0.00):

The response places the related field directly in $orderby instead of sorting client-side.

Dataverse does not support ordering parent rows by a related/expanded field. The correct approach is to $expand the field and sort client-side. Stable failure: 3/3 runs on the original skill.

I A/B-tested this item 3× on the original skill and 3× on the edited skill to rule out that my edit introduced the behavior. It fails 0/3 on the original skill independent of my change — confirming a pre-existing gap, not a regression.

2. How SkillOpt derived this

SkillOpt evaluates a Copilot plugin holistically and isolates the contribution of a single skill:

Materialize the full plugin (all skills present, real auth.py/scripts) so the agent behaves exactly as in production.
Swap in one candidate SKILL.md (here, dv-query) — everything else is held constant.
Run each eval item through the target model (gpt-5.5) end-to-end: it writes and reasons about real Dataverse code.
Score with an LLM judge (gpt-5.4-mini) against a hidden answer-key of prioritized claims, plus deterministic must_contain / must_load_skill checks. hard = 1 only if all P1 claims ≥ 0.7 AND deterministic checks pass.
Hill-climb: edit only the offending skill, re-run, and require fail → pass on the target item with no regression elsewhere.

Crucially, the prompts never name the OData function — they describe the user's intent ("at least one related row matching…", "in a large list of values"). This tests whether the skill teaches the model the right tool, not whether the model can pattern-match a keyword.

3. Result (before → after)

Eval set	Production skill	Edited skill
Advanced OData (8 items)	0.75 (6/8)	0.875 → 8/8
`qa_in_function_large_list` (Gap A)	fail 3/3	pass 3/3
`probeH1_orderby_related` (Gap B)	fail 3/3	pass 3/3
Existing dv-query suite (11 items)	10/11	10/11 (no regression)

The lone miss in the existing suite (probeA1_lookup_custom) fails identically on the original skill (A/B: 0/3 original vs 1/3 edited) — pre-existing model variance, not caused by this PR.

4. What changed

.github/plugins/dataverse/skills/dv-query/SKILL.md — new "Advanced OData filters & limits" section. Beyond the two proven gaps above, it consolidates correct, portable syntax for adjacent operations the skill omitted:

lambda any() / all() over collection nav properties (with the empty-collection vacuous-truth caveat)
ContainValues / DoesNotContainValues for MultiSelect choice columns
relative-date functions (Today, LastXDays, …) and their alias form
single-level $expand options ($select/$filter/$orderby/$top) and the nested-expand restriction
the $count 5,000 cap → RetrieveTotalRecordCount / paging
aggregate 50k limit → segment-and-combine workaround

evals/skillopt/dv_query_advanced.jsonl — the 9 eval items (8 advanced + the orderby probe) in the same SkillOpt format as the existing dv_query.jsonl, so the gap and the fix are independently reproducible.

5. How to reproduce

The SkillOpt config + generator are in SkillOpt PR #1. Run:

python scripts/eval_only.py --config configs/dataverse/dv_query_advanced.yaml \
  --skill <path-to-candidate-dv-query/SKILL.md> --split test \
  --split_dir data/dataverse/dv_query_advanced --out_root outputs/run

Point --skill at the production skill to see 0.75; at this PR's skill to see 0.875/8-of-8.

@p2

Adds an "Advanced OData filters & limits" section to dv-query/SKILL.md documenting supported-but-undocumented OData operations the model got wrong against the production skill: - In/NotIn/ContainValues REQUIRE parameter aliases (@p1/@p2); inline arrays return 400. (Model stably used unsupported inline arrays 3/3.) - Cannot \ parent rows by a related/expanded field; sort client-side. (Model stably used unsupported related-field orderby 3/3.) - lambda any()/all(), MultiSelect ContainValues, relative-date functions, \ nested options (no nested \/\), \ 5000 cap (RetrieveTotalRecordCount/paging), aggregate 50k segment-and-combine. Includes 9 harder SkillOpt eval items (evals/skillopt/dv_query_advanced.jsonl) in the Goldilocks zone (baseline hard=0.75). The two genuine gaps above each recover fail->pass with the documentation edit, with no regression on the existing dv-query eval set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move evals/skillopt/dv_query_advanced.jsonl into the consolidated eval-set PR so all SkillOpt eval data lives in one place; PR microsoft#67 keeps only the dv-query SKILL.md change it ships. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ft#66 Drop evals/skillopt/dv_query_advanced.jsonl here; it now lives in PR microsoft#66 alongside the other SkillOpt eval sets. This PR ships only the dv-query SKILL.md change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

aadharshkannan requested a review from a team June 1, 2026 17:51

aadharshkannan changed the title ~~Improve dv-query skill: advanced OData filters & limits~~ feat: improve dv-query skill with advanced OData filters & limits Jun 2, 2026

aadharshkannan changed the title ~~feat: improve dv-query skill with advanced OData filters & limits~~ feat: improve dv-query skill with advanced OData filters and limits Jun 2, 2026

saurabhrb approved these changes Jun 3, 2026

View reviewed changes

arorashivam96 approved these changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve dv-query skill with advanced OData filters and limits#67

feat: improve dv-query skill with advanced OData filters and limits#67
aadharshkannan wants to merge 2 commits into
microsoft:mainfrom
aadharshkannan:skillopt/dv-query-odata-improvements

aadharshkannan commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aadharshkannan commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

1. Why this fix? (the evidence)

Gap A — In / NotIn / ContainValues require parameter aliases

Gap B — cannot $orderby parent rows by a related/expanded field

2. How SkillOpt derived this

3. Result (before → after)

4. What changed

5. How to reproduce

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aadharshkannan commented Jun 1, 2026 •

edited

Loading

Gap A — `In` / `NotIn` / `ContainValues` require parameter aliases

Gap B — cannot `$orderby` parent rows by a related/expanded field