BAD PR 1 (DO NOT MERGE): regress dv-data SKILL bulk-create guidance by saurabhrb · Pull Request #73 · microsoft/Dataverse-skills

saurabhrb · 2026-06-02T20:55:45Z

Intentional regression — branched off PR #72 (harness). Replaces SDK bulk-create section in dv-data SKILL with per-record loop antipattern. Expected to FAIL: data_002_bulk_create + data_003_skill_contract. testFile=dv_data.biceval.json. Closes unmerged after evidence captured.

…/query/solution; convert dv_data to new format Adds 3-4 tests per skill using the new generic deterministic-evaluator format introduced in LocalEvalRunner: each test file now enables CortexConfigurations:Common/DeterministicAssertionEvaluator with settings.supported_verbs=CONTAINS,NOT_CONTAINS,SKILL_LOADED alongside the correctness.prompty semantic judge. Ports the dev/evalsV0 baseline tests (connect_001, metadata_001, overview_001, query_001, solution_001) and adds 2-3 natural follow-up tests per skill covering env-file contract, no-hardcoded-secrets, schema create + lookup relationship, filtered/aggregate reads, and solution unpack/import/routing-trap. dv_data.biceval.json is updated in place to add the deterministic evaluator.

…t graded Per LER PR-393 author guidance: DeterministicAssertionEvaluator only grades verb-prefixed assertions (CONTAINS/NOT_CONTAINS/SKILL_LOADED), and correctness.prompty scores against expected_response without seeing individual assertions. Without LMChecklist, natural-language assertions (those without a verb prefix) are unscored. Adds LMChecklist (Common/SEVAL/LMChecklist.prompty) to all six test files using the exact name + passing_score=3 + priority=1 shape the LER author published. Loader registration is already proven against the new format in Dataverse-skills PR #71's draft validation runs (LocalEvalRunner builds 20289122 and 20290419).

…ionService for local-eval mode Bad-PR validation harness. Routes around the pre-existing LER->BICEP ASCII-headers bug by forcing local evaluation: LMChecklist.prompty grades each assertion (verb-prefixed and natural-language alike) via CAPI/AOAI locally. Drops the DeterministicAssertionEvaluator entry because in DisableEvaluationService mode it would fall through to a local file lookup and crash (same root issue LER PR #416 fixes for the service-enabled path). Harness PR is draft and closes unmerged; PR #70 (test files) and PR #416 (LER fixes) are untouched.

…iles for local-eval mode

…ing range

Intentional regression for bad-PR gate validation. Replaces the SDK bulk-create section (CreateMultiple + adaptive chunking helpers) with per-record-loop antipattern and 'no batch API' phrasing. Expected to fail: data_002_bulk_create (PRIORITY_1 bulk/list assertion) and data_003_skill_contract (PRIORITY_1 skill-content assertion). All other tests should still pass.

saurabhrb · 2026-06-03T03:24:00Z

Closing -- validation harness / demo PR, no longer needed. Real coverage now lives in PR #70 (deterministic eval tests) and PR #68 (Cursor + auth unification).

Saurabh Badenkal added 6 commits June 1, 2026 22:38

harness: drop DeterministicAssertionEvaluator from remaining 5 test f…

44c07c0

…iles for local-eval mode

harness: lower LMChecklist passing_score 3 to 1 to match its 0/1 scor…

b705f6f

…ing range

saurabhrb force-pushed the users/saurabhrb/badpr-1-dv-data-regression branch from ae22659 to b566a81 Compare June 2, 2026 22:09

saurabhrb mentioned this pull request Jun 3, 2026

add: eval tests using new deterministic-evaluator format (3-4 per skill) #70

Open

saurabhrb closed this Jun 3, 2026

saurabhrb deleted the users/saurabhrb/badpr-1-dv-data-regression branch June 3, 2026 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BAD PR 1 (DO NOT MERGE): regress dv-data SKILL bulk-create guidance#73

BAD PR 1 (DO NOT MERGE): regress dv-data SKILL bulk-create guidance#73
saurabhrb wants to merge 6 commits into
mainfrom
users/saurabhrb/badpr-1-dv-data-regression

saurabhrb commented Jun 2, 2026

Uh oh!

saurabhrb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saurabhrb commented Jun 2, 2026

Uh oh!

saurabhrb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant