fix: close out the eight-issue audit (validator, test runner, skill runner, generator)#26
Conversation
The validator checked fixture.input field names against contracts.inputs but had no mirror check for expect blocks, so a typo in an expected output field passed validation and only surfaced as a confusing runtime mismatch. Adds the mirror cross-check (comparison operators excepted) plus regression fixtures. The strict-output-skill fixture doubles as the target for the strict_output work in #20. Fixes #18
retry: "yes", retry: "1", retry: 1.5 and friends were silently coerced to zero retries at execution time via || 0, losing the author's intent. The validator now requires a non-negative integer; the runner uses ?? 0 so only absence defaults, never truthiness. Same bug class as logic-md \#64 (recovery configuration silently lost between spec and execution). Fixes #21
The existence check passed any graph whose references all resolve, so an A -> B -> A cycle validated ok: true and only the test runner caught it at execution. Consumers that validate without executing (IDE plugins, lint-only CI, the MCP covenant_validate tool) were blind. Adds DFS cycle detection reporting the cycle path, and extends the existence check to array-valued depends_on, which the spec schema allows but the check previously mishandled. Fixes #24
quality.gates was entirely unvalidated; grep for gates in validator.js returned nothing while generate.js prompts users to author them. Adds validateQualityGates mirroring validateFixtures: array shape, required id and check, action in the retry|fail enum, integer max_retries, string on_exhaustion and description, and operation cross-checked against interface.surface. Normative fields taken from docs/COVENANT.md, which uses id/check/action/max_retries/on_exhaustion rather than the name/severity/on_fail guessed in the issue. Fixes #19
A fixture can now set strict_output: true to fail when the actual output carries fields absent from contracts.outputs (leaked secrets, undeclared side-effect counters). Default stays partial matching, non-breaking; the default-mode question is deferred to the v1.1 spec discussion. Landing this surfaced two latent problems, both fixed here: 1. executeFixture read this.expect, which is never set, so fixture.expect was never enforced at all. Now reads fixture.expect. 2. partialEqual was partial at the top level but demanded deep equality below, contradicting the documented partial-match semantics. Nested plain objects now match partially at every level (arrays and primitives still require exact equality). With expect enforcement live, the docx-generation example expected file_path literals the contract-driven simulator cannot honour (it synthesizes string defaults; echoing output_path into file_path would need skill-specific knowledge). Those two expect lines are dropped; validation_result.valid remains the real assertion. The validator also type-checks strict_output as boolean. Fixes #20
hasCircularDependencies silently skipped any dependency name absent from the fixture list, so a typo in depends_on became no dependency and the fixture ran as if its dep was satisfied. Unreachable via the validated path, but the runner can be invoked on raw fixture arrays (programmatic callers, MCP tools), where it reported green instead of erroring. Now throws with the offending fixture and dependency named, same posture as the existing cycle throw. Companion to the validator-side cycle fix. Fixes #25
…t error A subprocess that exited 0 but emitted non-JSON stdout (stray console.log before the payload, double-encoded stringify) was reported as process runner failed: exit 0, pointing the author away from the real cause. The parse error is now surfaced with the first 200 chars of raw stdout. Fixes #22
fs.stat(...).catch(() => false) treated EACCES, ENOTDIR and EIO as nonexistence, so a permission error during skill init silently fell into create-directory logic and the secondary error pointed at the wrong root cause. Both stat sites now return false only on ENOENT and rethrow everything else. Same family as the skill-runner stdout fix: swallowed errors narrow the root-cause signal. Fixes #23
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR addresses seven interconnected validation and error-handling issues across the COVENANT validator, test runner, and CLI. Changes narrow error handling to surface real failures, enhance the validator to enforce fixture schema contracts and quality gates, and improve the test runner to support strict output mode with better error attribution. ChangesValidation and Test Execution Enhancements
Sequence DiagramssequenceDiagram
participant Validator
participant TestRunner
participant Execution
Validator->>Validator: Parse and extract fixtures
Validator->>Validator: Validate fixture.expect against contracts.outputs
Validator->>Validator: Type-check retry, strict_output
Validator->>Validator: Validate quality.gates structure and operations
Validator->>Validator: Detect cycles in depends_on graph
Validator-->>TestRunner: Report validation errors or pass
alt Validation passes
TestRunner->>TestRunner: Extract fixtures with contractsOutputs
TestRunner->>TestRunner: Build topological sort (reject unknown deps)
TestRunner->>Execution: Execute fixtures in dependency order
Execution->>Execution: Run fixture operation
Execution->>TestRunner: Return actual output
TestRunner->>TestRunner: Validate output vs fixture.expect
alt strict_output enabled
TestRunner->>TestRunner: Check for undeclared fields
TestRunner-->>Execution: Fail if leaked fields
end
TestRunner-->>TestRunner: Report pass/fail with error details
else Validation fails
Validator-->>TestRunner: Reject covenant
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Closes out all eight findings from @antnewman's validator / test-runner / skill-runner / generator audit. One commit per issue, each with a regression fixture or unit test.
Validator
fixture.expectis now cross-checked againstcontracts.outputs, mirroring the existing input check. Comparison operators excepted.quality.gatesis now validated: array shape, requiredidandcheck,actioninretry | fail, integermax_retries, stringon_exhaustion, andoperationcross-checked againstinterface.surface. Normative fields were taken fromdocs/COVENANT.md(id/check/action/max_retries/on_exhaustion), not thename/severity/on_failguessed in the issue.fixture.retrymust be a non-negative integer; the runner now uses?? 0instead of|| 0.depends_oncycle detection at validation time, reporting the cycle path. The existence check also now handles array-valueddepends_on, which the spec schema allows but the old check mishandled.Test runner
strict_output: trueper fixture fails the run when actual output carries fields absent fromcontracts.outputs. Default behaviour unchanged (non-breaking); the default-mode question is deferred to the v1.1 spec discussion. Landing this surfaced two latent bugs, fixed in the same commit:executeFixturereadthis.expect(never set), sofixture.expectwas never enforced at all; andpartialEqualwas partial only at the top level, contradicting the documented partial-match semantics. The docx-generation example expectedfile_pathliterals the contract-driven simulator cannot honour; those lines are dropped andvalidation_result.validremains the real assertion.hasCircularDependenciesnow throws on adepends_onnaming an unknown fixture instead of silently treating it as no dependency.DX papercuts
process runner failed: exit 0.fs.stat(...).catch(() => false)sites ingenerate.jsnarrowed to ENOENT; permission and I/O errors propagate.Verification
npm run test:fixtures: 4/4 example skills greennpm run test:conformance: 15/15Fixes #18
Fixes #19
Fixes #20
Fixes #21
Fixes #22
Fixes #23
Fixes #24
Fixes #25
Summary by cubic
Closes out audit issues #18–#25 across the validator, test runner, skill runner, and generator. Adds strict output checks, stronger validation, and clearer error messages for safer, more predictable runs.
New Features
strict_output: trueper fixture to fail on undeclared output fields; default behavior unchanged (test runner: validateOutput partial-match allows undeclared output fields #20).quality.gates: requiredid/check,actioninretry|fail, integermax_retries, stringon_exhaustion, andoperationcross-checked withinterface.surface(validator: quality.gates section entirely unvalidated #19).Bug Fixes
fixture.expectagainstcontracts.outputs; enforceretryas a non-negative integer; detectdepends_oncycles and support array-valueddepends_on(validator: fixture.expect shape not cross-checked against contracts.outputs #18, validator + test runner: fixture.retry silently coerced via "|| 0" #21, validator: depends_on cycle detection missing (runner is single point of failure) #24).fixture.expect(fixesthis.expectbug) and make partial matching deep; throw on unknowndepends_oninstead of silently ignoring it (test runner: validateOutput partial-match allows undeclared output fields #20, test runner: hasCircularDependencies silently ignores deps not in fixture list #25).fs.statcatch toENOENTso permission/I/O errors aren’t masked (generate.js: fs.stat(...).catch(() => false) treats permission errors as nonexistence #23).Written for commit 2c42f33. Summary will update on new commits.
Summary by CodeRabbit
New Features
Bug Fixes
Tests