Skip to content

[Draft - DO NOT MERGE] validate bullets-fix sufficiency for LER ASCII-headers bug (#418)#79

Closed
saurabhrb wants to merge 6 commits into
mainfrom
users/saurabhrb/validate-bullets-fix
Closed

[Draft - DO NOT MERGE] validate bullets-fix sufficiency for LER ASCII-headers bug (#418)#79
saurabhrb wants to merge 6 commits into
mainfrom
users/saurabhrb/validate-bullets-fix

Conversation

@saurabhrb

Copy link
Copy Markdown
Contributor

[Draft - DO NOT MERGE] Validate bullets-fix is sufficient for LER ASCII-headers bug (#418)

Throwaway harness. Single purpose: answer "does the consumer-side bullets-fix on PR #70 alone resolve LER issue #418, or do we also need the LER-side header sanitization to ship?"

Setup

This branch combines everything needed to exercise the BICEP HTTP path end-to-end:

Knob Value Why
LocalEvalRunner ref refs/heads/users/sbadenkal/service-only-deterministic-assertion (PR #416 branch) Needed so DeterministicAssertionEvaluator is on the ServiceOnlyEvaluatorNames allow-list. Without #416 the run dies at evaluator load before reaching the BICEP send.
useBuildFromSource true + sourceRepoPath: '$(Build.SourcesDirectory)/LocalEvalRunner' Builds LER from PR #416 branch.
Pipeline name: bullets replaced with ASCII - (same as PR #70's fix) Removes the only known consumer-side non-ASCII source.
--feature DisableEvaluationService removed This is the whole point — let the LER → BICEP POST /offlineEvaluation/async actually happen.
dv_data.biceval.json DeterministicAssertionEvaluator entry restored (with settings.supported_verbs) Force the service-only-evaluator routing so BICEP request body actually contains the new format.

Expected outcomes (binary)

  • Pipeline green (or only assertion-level failures, no exceptions) → bullets were the only non-ASCII source. PR add: eval tests using new deterministic-evaluator format (3-4 per skill) #70 + LER #416 alone unblock the gate. LER #418 stays valuable as defense-in-depth for other consumers but isn't on the critical path for Dataverse-skills.
  • Same Request headers must contain only ASCII characters exception → a non-ASCII byte is leaking from somewhere other than runCorrelationId (likely a default User-Agent, framework header, or something injected upstream). LER #418 must ship before Dataverse-skills can use the BICEP path.

Either result is publishable evidence — I'll post it on PR #70's description and on issue #418 as a comment.

After validation

Branch and PR close unmerged; main of Dataverse-skills never sees these .azdo changes. PR #70 carries only the production-safe bullets-fix.

Saurabh Badenkal added 6 commits June 1, 2026 22:38
…/query/solution; convert dv_data to new format

Adds 3-4 tests per skill using the new generic deterministic-evaluator format introduced in LocalEvalRunner: each test file now enables CortexConfigurations:Common/DeterministicAssertionEvaluator with settings.supported_verbs=CONTAINS,NOT_CONTAINS,SKILL_LOADED alongside the correctness.prompty semantic judge. Ports the dev/evalsV0 baseline tests (connect_001, metadata_001, overview_001, query_001, solution_001) and adds 2-3 natural follow-up tests per skill covering env-file contract, no-hardcoded-secrets, schema create + lookup relationship, filtered/aggregate reads, and solution unpack/import/routing-trap. dv_data.biceval.json is updated in place to add the deterministic evaluator.
…t graded

Per LER PR-393 author guidance: DeterministicAssertionEvaluator only grades verb-prefixed assertions (CONTAINS/NOT_CONTAINS/SKILL_LOADED), and correctness.prompty scores against expected_response without seeing individual assertions. Without LMChecklist, natural-language assertions (those without a verb prefix) are unscored. Adds LMChecklist (Common/SEVAL/LMChecklist.prompty) to all six test files using the exact name + passing_score=3 + priority=1 shape the LER author published. Loader registration is already proven against the new format in Dataverse-skills PR #71's draft validation runs (LocalEvalRunner builds 20289122 and 20290419).
…ionService for local-eval mode

Bad-PR validation harness. Routes around the pre-existing LER->BICEP ASCII-headers bug by forcing local evaluation: LMChecklist.prompty grades each assertion (verb-prefixed and natural-language alike) via CAPI/AOAI locally. Drops the DeterministicAssertionEvaluator entry because in DisableEvaluationService mode it would fall through to a local file lookup and crash (same root issue LER PR #416 fixes for the service-enabled path). Harness PR is draft and closes unmerged; PR #70 (test files) and PR #416 (LER fixes) are untouched.
…d + BICEP HTTP path enabled

Empirical test for LER issue #418. Strips U+2022 bullets from pipeline name (matches PR #70 fix), restores DeterministicAssertionEvaluator entry in dv_data.biceval.json, and removes the DisableEvaluationService feature flag so the LER -> BICEP POST happens for real. If pipeline passes -> consumer-side bullets fix is sufficient and LER #418 becomes defense-in-depth. If pipeline fails with the same ASCII headers exception -> the non-ASCII byte is coming from a header other than the correlation ID and LER #418 must ship.
@saurabhrb saurabhrb closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant