fix(tests): use sys.executable in doc-accuracy subprocess calls (unblocks the whole improve/goal pipeline)#346
Merged
Conversation
…cess calls
The baseline-validation gate (.venv/bin/pytest -q on the fresh workspace clone)
failed because tests/unit/test_documentation_accuracy.py shelled out with bare
['python', '-m', 'pytest', ...]. In the workspace env that 'python' has no pytest,
so stdout was empty and 'assert integration in markers_output' failed. CI only
passed because there bare 'python' is the pytest interpreter. That single failure
exited pytest 1, failing baseline validation and blocking EVERY improve/goal task
('N of N stages failed').
Replace all 7 bare-python subprocess calls with sys.executable (the interpreter
running the test, which always has pytest). Full tree now 9377 passed / 0 failed.
Root-caused via the failure-diagnostics capture added in #345.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cdccc4b to
58cb540
Compare
ProtocolWarden
added a commit
that referenced
this pull request
Jun 20, 2026
…fusal) (#347) With the baseline blocker fixed (#346), tasks reached the agent stages and the #345 diagnostics surfaced the next failure: the planner stage got a prose refusal 'ContextGuard requires CL_ANCHOR to be set ... run eval $(cl session start ...)' instead of a JSON plan. OC's CLAUDE.md ContextGuard requires every Claude session targeting OC to be anchored; operations-center.sh sets CL_ANCHOR on the fleet, but build_allowlist_env (#340) stripped it (not in _ENV_PASSTHROUGH), re-breaking the #311 unblock — same regression class as the #344 PATH bug. Add CL_ANCHOR/CL_HOME/CL_SESSION_ID to the passthrough (forwarded only when present) so the executor agent stays anchored and cl_dispatch_wrap hydrate/capture is not silently disabled. Co-authored-by: ProtocolWarden <ProtocolWarden@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root-caused via the failure-diagnostics capture added in #345. Every improve/goal task was failing
N of N stages failedbecause the executor's baseline-validation gate (.venv/bin/pytest -q, run on the fresh workspace clone) exited 1 — and the single failing test wastests/unit/test_documentation_accuracy.py::test_marker_configuration_in_pytest.Root cause
That test (and 6 siblings) shells out with bare
["python", "-m", "pytest", ...]. In the workspace environment,pythonon PATH has no pytest, so stdout is empty →assert 'integration' in ''. CI passes only because there barepythonis the pytest-equipped interpreter. That one failure exited pytest 1 → baseline validation failed → the executor refused to proceed on every task.Fix
import sys+ replace all 7 bare-pythonsubprocess calls withsys.executable(the interpreter actually running the test, which always has pytest).Verification
Must land in main (baseline validation clones main). Closes the recurring #264/#265 failures.