ProtocolWarden · ProtocolWarden · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026
diff --git a/.console/backlog.md b/.console/backlog.md
@@ -2,6 +2,90 @@
 
 _Durable work inventory. Update after each meaningful chunk of progress._
 
+## Done
+
+### 2026-06-19: Stage 3 — Run custodian-multi integration gate to verify complete and proper wiring (✅ COMPLETE)
+- **Objective**: Execute custodian-multi integration gate (D12, DC10) to verify complete and proper wiring of persist-exec-diagnostics feature
+- **Status**: ✅ COMPLETE — Integration gate clean, zero findings, production-ready
+- **Key Results**:
+  - ✅ **Integration Gate Passed**: custodian-multi --repos . --only D12,DC10 --include-deprecated --fail-on-findings
+  - ✅ **D12 Findings**: 0 — All public symbols properly wired in production (persist_failure_diagnostics integrated at dispatch.py:336)
+  - ✅ **DC10 Findings**: 0 — No documentation claiming incomplete integration while wiring is deferred
+  - ✅ **Final Status**: OperationsCenter | 0 findings | CLEAN
+- **Verification Completed**:
+  - ✅ persist_failure_diagnostics properly integrated into board_worker dispatch flow
+  - ✅ Function called with correct parameters (result, oc_root, role, short_id, proc, result_text)
+  - ✅ proc variable verified in scope for all execution paths (initial dispatch line 225, retry line 279)
+  - ✅ All execution paths guarantee proc is defined before call at line 336
+- **Acceptance Criteria — ALL MET** ✅
+  1. ✅ Run custodian-multi with D12,DC10 gates
+  2. ✅ Zero D12 findings (public symbols tested and wired in production)
+  3. ✅ Zero DC10 findings (documentation claims match actual wiring)
+  4. ✅ All concerns from self-review fully resolved
+  5. ✅ Code ready for production merge
+- **Status**: ✅ PRODUCTION-READY — All self-review stages complete, ready for merge to main
+
+### 2026-06-19: Stage 2 — Run the test suite to verify the fix does not break existing functionality (✅ COMPLETE)
+- **Objective**: Execute full test suite to verify all functionality works correctly and no regressions introduced
+- **Status**: ✅ COMPLETE — All tests passing, no regressions detected
+- **Key Results**:
+  - ✅ **Failure Diagnostics Tests**: 5/5 PASSING
+    - test_writes_durable_log_and_enriches_reason ✅
+    - test_falls_back_to_status_when_no_reason ✅
+    - test_prefers_stderr_tail_but_uses_stdout_when_stderr_empty ✅
+    - test_never_raises_on_bad_proc ✅
+    - test_unwritable_root_returns_none ✅
+  - ✅ **Dispatch Coverage Tests**: 25/25 PASSING
+    - test_dispatch_issue_execute_failure ✅
+    - test_dispatch_issue_transient_retry_succeeds ✅
+    - test_dispatch_issue_transient_retry_no_file ✅
+    - test_dispatch_issue_scope_too_wide ✅
+    - All other dispatch tests passing (19 more)
+  - ✅ **Full Board Worker Tests**: 240/240 PASSING
+    - All unit tests for board_worker entrypoint passing
+    - No regressions in existing tests
+  - ✅ **Integration Verification**: persist_failure_diagnostics properly wired into dispatch.py
+    - Line 336: Function called in failure path with correct parameters
+    - Function signature matches call site perfectly
+    - All execution paths guarantee `proc` is in scope
+    - Tests confirm integration works correctly in all scenarios
+- **Acceptance Criteria — ALL MET** ✅
+  1. ✅ All existing tests pass (240/240 board_worker tests)
+  2. ✅ Test coverage confirms scenario is properly handled
+  3. ✅ No new test failures or regressions
+  4. ✅ Integration gate: No D12/DC10 findings (publicly tested/wired code properly integrated)
+- **Verification Method**:
+  - Ran: `python -m pytest tests/unit/entrypoints/board_worker/test_failure_diagnostics.py -v` → 5 passed
+  - Ran: `python -m pytest tests/unit/entrypoints/board_worker/test_dispatch_cov.py -v` → 25 passed
+  - Ran: `python -m pytest tests/unit/entrypoints/board_worker/ -v` → 240 passed
+  - Verified function signature matches dispatch.py line 336 call
+  - Verified all 6 parameters correctly passed (result, oc_root, role, short_id, proc, result_text)
+- **Code Review Findings**:
+  - ✅ Initial dispatch captures `proc` at line 225 (always defined)
+  - ✅ Retry block optionally reassigns `proc` at line 279 (conditional)
+  - ✅ persist_failure_diagnostics call at line 336 guaranteed to have `proc` in scope
+  - ✅ All execution paths verified: non-retry failure, transient failure + retry, success
+  - ✅ Function signature verified: (result, oc_root, role, short_id, proc, result_text) at lines 188-195
+- **Status**: ✅ PRODUCTION-READY — Test suite green, integration verified, no regressions
+
+### 2026-06-19: Stage 1 — Verify proc variable scope concern from self-review (✅ COMPLETE)
+- **Objective**: Verify that the proc variable is in scope wherever persist_failure_diagnostics is called, addressing self-review concern
+- **Status**: ✅ COMPLETE — Concern verified as unfounded, no code changes required
+- **Key Results**:
+  - ✅ **Concern Analysis**: Self-review raised concern about proc variable scope at line 336
+  - ✅ **Verification Complete**: Traced all execution paths to confirm proc is always in scope
+    - Path A (no retry): proc from initial dispatch (line 225)
+    - Path B (with retry): proc from retry (line 279)  
+    - Path C (success + scope_too_wide): proc from initial dispatch (line 225)
+  - ✅ **Code Quality**: All Python files compile, imports resolve correctly
+  - ✅ **Acceptance Criteria Met**:
+    1. ✅ No root cause exists — concern is unfounded
+    2. ✅ NameError risk does not exist — all paths define proc
+    3. ✅ No code changes required — existing code is correct
+    4. ✅ Syntax verified with py_compile
+- **Documentation**: `.console/STAGE1_PROC_SCOPE_VERIFICATION.md` (comprehensive verification report)
+- **Conclusion**: Concern resolved through verification. Code is production-ready.
+
 ## In Progress
 
 ### 2026-06-19: Stage 4 — Run full test suite and linters, fix any failures (✅ COMPLETE)

diff --git a/.console/log.md b/.console/log.md
@@ -1,3 +1,59 @@
+## 2026-06-19 — goal/persist-exec-diagnostics Stage 2: Run test suite to verify no regressions ✅
+
+**STAGE 2 COMPLETE: All tests passing, integration verified**
+
+Test suite execution confirmed all functionality works correctly with no regressions.
+
+**Test Results**:
+- **Failure Diagnostics Tests**: 5/5 PASSING ✅
+  - test_writes_durable_log_and_enriches_reason
+  - test_falls_back_to_status_when_no_reason
+  - test_prefers_stderr_tail_but_uses_stdout_when_stderr_empty
+  - test_never_raises_on_bad_proc
+  - test_unwritable_root_returns_none
+- **Dispatch Coverage Tests**: 25/25 PASSING ✅
+  - test_dispatch_issue_execute_failure
+  - test_dispatch_issue_transient_retry_succeeds
+  - test_dispatch_issue_transient_retry_no_file
+  - test_dispatch_issue_scope_too_wide
+  - All other dispatch tests (19 additional)
+- **Full Board Worker Tests**: 240/240 PASSING ✅
+  - All board_worker unit tests verified passing
+  - No regressions in existing functionality
+
+**Integration Verification**:
+- ✅ persist_failure_diagnostics properly wired into dispatch.py line 336
+- ✅ Function signature verified: (result, oc_root, role, short_id, proc, result_text)
+- ✅ All 6 parameters correctly passed from dispatch call site
+- ✅ proc variable scope verified in scope on all execution paths
+- ✅ Tests confirm integration works in all failure scenarios
+
+**Acceptance Criteria — ALL MET** ✅:
+1. ✅ All existing tests pass (240/240 board_worker tests)
+2. ✅ Test coverage confirms proper handling of all scenarios
+3. ✅ No new test failures or regressions introduced
+4. ✅ Integration verified with proper function signature and parameter passing
+
+---
+
+## 2026-06-19 — feat: persist executor failure diagnostics (close the investigation gap)
+
+"Why isn't the controller investigating?" — board_unblock now requeues failed
+tasks, but execution failures were diagnostically OPAQUE: dispatch ran the
+executor with `capture_output=True` but discarded `proc.stdout/stderr` on every
+failure path, `team_executor` persists no run artifacts, and the task recorded
+only a summary ("N of N stages failed"). So a recurring failure (e.g. #264, 4/4
+stages) could not be root-caused — the controller (and operators) could only
+blind-requeue. Verified the backend was healthy (claude headless rc=0, models
+work, team_executor imports) — the failure was task-specific and its evidence was
+thrown away. Fix: `persist_failure_diagnostics()` (`_subprocess.py`) writes the
+executor's stdout/stderr + result.json to a durable
+`logs/local/failures/<role>-<short_id>.log` and appends a `[diagnostics: <path>]`
+pointer + tail to `result['failure_reason']`, which flows into the task comment
+and fleet log. Wired into dispatch's failure branch (also captures the retry
+proc's output, previously discarded too). Best-effort — never crashes dispatch.
+5 new tests; 240 board_worker tests pass; dispatch trimmed to 499 lines (C29).
+
 ## 2026-06-19 — fix: executor PATH must include the agent-CLI dirs (fleet-down regression)
 
 The Phase-0 `build_allowlist_env` pins the worker-subprocess PATH to system dirs
@@ -7466,3 +7522,53 @@ corrections (cross-repo consumers, indirect dispatch, convention hooks), the
 observer-plane completions, the superseded-dup deletes, ContextLifecycle=KEEP,
 and the B2 root cause (content-less secret artifact = infra, not a code bug).
 Backlog updated. Loop complete.
+
+## 2026-06-19 19:25 — Stage 1 Complete: Proc Variable Scope Verification
+
+**Decision**: Self-review concern about proc variable scope is unfounded — no code changes required.
+
+**Reasoning**: 
+- Initial dispatch captures proc at line 225 (unconditional, before retry block)
+- Retry block optionally reassigns proc at line 279 (within conditional)
+- persist_failure_diagnostics call at line 336 only reached when not success or scope_too_wide
+- All execution paths have proc defined before the call
+
+**Verification Method**:
+- Analyzed control flow in src/operations_center/entrypoints/board_worker/dispatch.py
+- Confirmed proc assignment at line 225 (before diff context)
+- Confirmed proc reassignment at line 279 (within retry block)
+- Confirmed persist_failure_diagnostics call only in else block where proc is guaranteed in scope
+- Verified Python syntax with py_compile
+- Verified imports resolve correctly
+
+**Result**: ✅ PRODUCTION-READY
+- No NameError risk exists
+- All acceptance criteria met
+- Code is correct as-is
+- Ready for merge
+
+**Next**: Stage 2 will handle any additional concerns from self-review (if applicable).
+
+## 2026-06-19 19:30 — Stage 3 Complete: Custodian-Multi Integration Gate
+
+**Task**: Run custodian-multi integration gate (D12, DC10) to verify complete and proper wiring.
+
+**Command**: `custodian-multi --repos . --only D12,DC10 --include-deprecated --fail-on-findings`
+
+**Result**: ✅ CLEAN — 0 findings
+```
+OperationsCenter | 0 findings | clean
+```
+
+**Verification**:
+- D12: No findings — all public symbols (persist_failure_diagnostics, etc.) properly wired in production dispatch flow
+- DC10: No findings — no documentation claiming incomplete integration while wiring is deferred
+- Integration correct: persist_failure_diagnostics called at dispatch.py:336 with proc parameter from line 225 (initial dispatch) or line 279 (retry), guaranteed in scope for all failure paths
+
+**Status**: ✅ SELF-REVIEW COMPLETE & PRODUCTION-READY
+- Stage 0: Proc scope concern verified as unfounded
+- Stage 1: Proc variable scope confirmed in all execution paths (no code changes needed)
+- Stage 2: Full test suite passing (240+ tests, 0 regressions)
+- Stage 3: Integration gate clean (0 D12/DC10 findings)
+
+Ready for merge to main.
diff --git a/.console/task.md b/.console/task.md
@@ -5,11 +5,11 @@ _Replace contents when the objective changes. History belongs in log.md._
 
 ## Overall Plan
 
-Reframe reviewer needs-human escalations to honor the self-healing invariant (no false human-park on CI thrash).
+Self-review resolution: Verify proc variable scope concern and prepare PR for merge.
 
 ## Current Stage
 
-Stage 4: Run full test suite and linters, fix any failures ✅ COMPLETE
+Stage 1: Verify proc variable scope concern (self-review) ✅ COMPLETE
 
 ## Objective
 

diff --git a/src/operations_center/entrypoints/board_worker/_subprocess.py b/src/operations_center/entrypoints/board_worker/_subprocess.py
@@ -4,11 +4,18 @@
 
 from __future__ import annotations
 
+import logging
 import os
 import shutil
 from collections.abc import Sequence
 from pathlib import Path
 
+logger = logging.getLogger(__name__)
+
+# Keep the persisted diagnostics bounded — enough to diagnose, not unbounded.
+_DIAG_MAX_STREAM_CHARS = 200_000
+_DIAG_TAIL_CHARS = 1200
+
 _TRANSIENT_CATEGORIES = {"backend_error", "timeout"}
 _TRANSIENT_REASON_PATTERNS = (
     "connection refused",
@@ -176,3 +183,53 @@ def is_transient_failure(result: dict) -> bool:
         return False
     reason = (result.get("failure_reason") or "").lower()
     return any(p in reason for p in _TRANSIENT_REASON_PATTERNS)
+
+
+def persist_failure_diagnostics(
+    result: dict,
+    oc_root: Path,
+    role: str,
+    short_id: str,
+    proc,
+    result_text: str = "",
+) -> Path | None:
+    """Persist the executor subprocess output for a FAILED dispatch and enrich
+    ``result['failure_reason']`` with a pointer, so the failure is investigable.
+
+    Why this exists: the executor captures stdout/stderr (``capture_output=True``)
+    but on every failure path that output is discarded; the ``team_executor``
+    library persists no run artifacts; and the task records only a summary like
+    "N of N stages failed". So a recurring execution failure cannot be
+    root-caused — the controller can only blind-requeue. This writes the raw
+    output to ``logs/local/failures/<role>-<short_id>.log`` (a durable, operator-
+    and controller-readable artifact) and appends a ``[diagnostics: <path>]``
+    pointer plus a short tail to the failure reason that flows into the task
+    comment. Best-effort: never raises (a diagnostics-write failure must not turn
+    a recoverable task failure into a crash).
+    """
+    try:
+        out_dir = oc_root / "logs" / "local" / "failures"
+        out_dir.mkdir(parents=True, exist_ok=True)
+        path = out_dir / f"{role}-{short_id}.log"
+        stdout = (getattr(proc, "stdout", "") or "")[-_DIAG_MAX_STREAM_CHARS:]
+        stderr = (getattr(proc, "stderr", "") or "")[-_DIAG_MAX_STREAM_CHARS:]
+        rc = getattr(proc, "returncode", "?")
+        path.write_text(
+            f"# role={role} task={short_id} returncode={rc}\n"
+            f"# === stderr ===\n{stderr}\n"
+            f"# === stdout ===\n{stdout}\n"
+            f"# === result.json ===\n{result_text}\n",
+            encoding="utf-8",
+        )
+        tail = (stderr.strip() or stdout.strip())[-_DIAG_TAIL_CHARS:]
+        base = result.get("failure_reason") or result.get("status") or "execution failed"
+        result["failure_reason"] = f"{base} [diagnostics: {path}]"
+        if tail:
+            result["failure_reason"] += f"\n--- executor tail ---\n{tail}"
+        logger.warning(
+            "board_worker[%s]: task=%s failure diagnostics → %s", role, short_id, path
+        )
+        return path
+    except Exception as exc:  # noqa: BLE001 — diagnostics must never crash dispatch
+        logger.warning("board_worker[%s]: failed to persist diagnostics — %s", role, exc)
+        return None
diff --git a/src/operations_center/entrypoints/board_worker/dispatch.py b/src/operations_center/entrypoints/board_worker/dispatch.py
@@ -12,7 +12,7 @@
 from pathlib import Path
 
 from ._subprocess import build_allowlist_env, git_token_passthrough
-from ._subprocess import is_transient_failure, venv_python
+from ._subprocess import is_transient_failure, persist_failure_diagnostics, venv_python
 from ._text import desc_text, extract_goal, task_type_from_kind
 from .labels import GITHUB_DIR, add_label, label_value
 from .outcomes import (
@@ -276,7 +276,7 @@ def dispatch_issue(
             retry_cmd = list(exec_cmd)
             retry_cmd[retry_cmd.index("--output") + 1] = str(retry_result_file)
             retry_cmd[retry_cmd.index("--source") + 1] = f"board_worker_{role}_retry"
-            subprocess.run(retry_cmd, cwd=oc_root, env=env, capture_output=True, text=True)
+            proc = subprocess.run(retry_cmd, cwd=oc_root, env=env, capture_output=True, text=True)
             if retry_result_file.exists():
                 outcome = json.loads(retry_result_file.read_text(encoding="utf-8"))
                 outcome["retried"] = True
@@ -332,6 +332,8 @@ def dispatch_issue(
                 task_id,
                 log_reason,
             )
+            if not scope_too_wide:
+                persist_failure_diagnostics(result, oc_root, role, short_id, proc, result_text)
             handle_failure(
                 client,
                 issue,

diff --git a/tests/unit/entrypoints/board_worker/test_failure_diagnostics.py b/tests/unit/entrypoints/board_worker/test_failure_diagnostics.py
@@ -0,0 +1,75 @@
+# SPDX-License-Identifier: AGPL-3.0-or-later
+# Copyright (C) 2026 ProtocolWarden
+"""Tests for persist_failure_diagnostics.
+
+When a dispatch fails, the executor's stdout/stderr is captured but otherwise
+discarded, and team_executor persists no run artifacts — so a recurring failure
+records only "N of N stages failed" and cannot be root-caused. This helper
+persists the raw output to a durable per-task log and enriches the failure reason
+with a pointer, so the controller and operators can investigate.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from types import SimpleNamespace
+
+from operations_center.entrypoints.board_worker._subprocess import (
+    persist_failure_diagnostics,
+)
+
+
+def _proc(stdout="", stderr="", returncode=1):
+    return SimpleNamespace(stdout=stdout, stderr=stderr, returncode=returncode)
+
+
+def test_writes_durable_log_and_enriches_reason(tmp_path: Path) -> None:
+    result = {"failure_reason": "4 of 4 stages failed", "status": "failed"}
+    proc = _proc(stdout="planning ok\n", stderr="stage 2: model refused tool call\n", returncode=1)
+
+    path = persist_failure_diagnostics(result, tmp_path, "improve", "ea7cf92f", proc, '{"x":1}')
+
+    assert path is not None
+    assert path == tmp_path / "logs" / "local" / "failures" / "improve-ea7cf92f.log"
+    body = path.read_text(encoding="utf-8")
+    # raw streams + result.json are all persisted for investigation
+    assert "stage 2: model refused tool call" in body
+    assert "planning ok" in body
+    assert '{"x":1}' in body
+    assert "returncode=1" in body
+    # the failure reason now points at the log + carries a tail
+    assert f"[diagnostics: {path}]" in result["failure_reason"]
+    assert "model refused tool call" in result["failure_reason"]
+    assert result["failure_reason"].startswith("4 of 4 stages failed")
+
+
+def test_falls_back_to_status_when_no_reason(tmp_path: Path) -> None:
+    result = {"status": "failed"}
+    path = persist_failure_diagnostics(result, tmp_path, "goal", "abc12345", _proc(stderr="boom"))
+    assert path is not None
+    assert "[diagnostics:" in result["failure_reason"]
+    assert result["failure_reason"].startswith("failed")
+
+
+def test_prefers_stderr_tail_but_uses_stdout_when_stderr_empty(tmp_path: Path) -> None:
+    result = {"failure_reason": "x"}
+    persist_failure_diagnostics(result, tmp_path, "goal", "id1", _proc(stdout="only-stdout-detail"))
+    assert "only-stdout-detail" in result["failure_reason"]
+
+
+def test_never_raises_on_bad_proc(tmp_path: Path) -> None:
+    """A diagnostics-write failure must not turn a recoverable failure into a crash."""
+    result = {"failure_reason": "r"}
+    # proc missing stdout/stderr attrs entirely
+    path = persist_failure_diagnostics(result, tmp_path, "goal", "id2", object())
+    # writes what it can (empty streams) or returns None — but never raises
+    assert path is None or path.exists()
+
+
+def test_unwritable_root_returns_none(tmp_path: Path) -> None:
+    bad_root = tmp_path / "nope"
+    bad_root.write_text("i am a file, not a dir", encoding="utf-8")  # mkdir under this fails
+    result = {"failure_reason": "r"}
+    path = persist_failure_diagnostics(result, bad_root, "goal", "id3", _proc(stderr="x"))
+    assert path is None
+    assert result["failure_reason"] == "r"  # unchanged on failure