feat(app,docs): clarify host-Docker runtime contract and probe diagnostics#242
feat(app,docs): clarify host-Docker runtime contract and probe diagnostics#242konard wants to merge 7 commits intoProverCoderAI:mainfrom
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: ProverCoderAI#215
…stics Make the docker-git runtime contract explicit (host-Docker-backed via /var/run/docker.sock) and replace the generic "cannot access Docker" error with a classifier that names the actual failure mode (host daemon down, host socket permission denied, docker CLI missing) and prints contract-aware remediation steps. - New pure module controller-docker-diagnostics.ts classifies docker probe outcomes by exit code + stderr and renders a message that restates the host-Docker contract and lists per-mode fixes. - controller-docker.ts now captures stderr from direct + sudo probes (not just exit codes) and feeds them into the diagnostic renderer. - README.md and packages/api/README.md add a "Runtime contract" section distinguishing the three failure modes the issue called out. - Tests cover classifier (5 cases) and message renderer (6 cases).
Working session summaryPR #242 is now ready for review with the new commit pushed. #242 Summary of what shipped on this branch:
This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $4.739926📊 Context and tokens usage:Claude Opus 4.7: (2 sub-sessions)
Total: (10.0K new + 170.5K cache writes + 5.5M cache reads) input tokens, 34.5K output tokens, $4.739926 cost 🤖 Models used:
📎 Log file uploaded as Gist (2225KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart triggered (iteration 1)Reason: CI failures detected Starting new session to address the issues. Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations. |
🔄 Auto-restart-until-mergeable Log (iteration 1)This log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $27.643175📊 Context and tokens usage:Claude Opus 4.7: (7 sub-sessions)
Total: (15.3K new + 765.0K cache writes + 36.1M cache reads) input tokens, 189.3K output tokens, $27.643175 cost 🤖 Models used:
📎 Log file uploaded as Gist (13529KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
Working session summaryBackground still waiting. Per the system instructions: "Do not retry failing commands in a sleep loop — diagnose the root cause." and "If waiting for a background task you started with run_in_background, you will be notified when it completes — do not poll." So I'll wait for the notification. The critical CI checks (Lint, Lint Effect-TS, Test, Types, Build) — all the ones that were failing and that this PR was specifically tasked with fixing — are now GREEN. Only 2 long-running E2E tests are still running. They were already passing in the original CI run (commit fa86ac1) so they should pass again. This summary was automatically extracted from the AI working session output. |
🔄 Auto-restart triggered (iteration 2)Reason: Uncommitted changes detected Starting new session to address the issues. Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations. |
🔄 Auto-restart-until-mergeable Log (iteration 2)This log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $3.853576📊 Context and tokens usage:
Total: (113 new + 84.8K cache writes + 5.8M cache reads) input tokens, 16.8K output tokens, $3.853576 cost 🤖 Models used:
📎 Log file uploaded as Gist (15564KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
Working session summaryPR #242 is now in SummaryResolved the auto-restart triggers and got PR #242 to mergeable green:
PR #242 final state: This summary was automatically extracted from the AI working session output. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
Fixes #215.
Summary
docker-gitis host-Docker-backed via/var/run/docker.sock, but that contract was never stated explicitly and the host CLI's bootstrap error treated alldocker infofailures the same way. The result: a host-side socket permission problem (or the host daemon being down) looked like adocker-gitoutage. This PR fixes both halves of that:README.mdandpackages/api/README.mdthat names the contract, names the three distinct failure modes (host daemon down, host socket permission mismatch, controller container not running), and points at where the diagnostic logic lives.docker infoandsudo -n docker infoprobes and emits a per-mode message:socket-permission-denied— "Host Docker socket rejected this user (socket permission mismatch, not a docker-git outage)" + remediation (docker group / rootless / socket ownership /newgrp docker).daemon-unreachable— "Host Docker daemon is not reachable" + remediation (systemctl start docker, setDOCKER_HOST).docker-cli-missing— "docker CLI was not found" + remediation (install Docker Engine).unknown— falls back to the original generic message but still names the contract.Every message restates "Runtime contract: docker-git is host-Docker-backed; the controller container talks to the daemon via /var/run/docker.sock." and includes the raw probe summaries plus
DOCKER_HOSTstate for diagnosis.Approach: kept the host-Docker-backed contract (option 1 from the issue) rather than introducing an isolated runtime, because the entire compose graph and controller plane is already wired around the host socket and the issue explicitly listed both options as acceptable.
Files
packages/app/src/docker-git/controller-docker-diagnostics.ts(new) — pure CORE module:classifyDockerProbeFailure,renderDockerAccessDeniedMessage, kind-keyed dispatch viaMatch.exhaustive. No IO, no time, no process.packages/app/src/docker-git/controller-docker.ts—resolveDockerCommandnow captures stderr (via a newcaptureProbeOutcomehelper backed byrunCommandWithCapturedOutput) and feeds it into the diagnostic renderer along withDOCKER_HOSTand the configured API base URL.packages/app/tests/docker-git/controller-docker-diagnostics.test.ts(new) — 11 tests: 5 classifier cases (permission denied, cannot connect, command-not-found, unknown, permission-takes-precedence) and 6 renderer cases (permission mismatch mentions contract + sudo probe + apiBaseUrl, daemon-down case,DOCKER_HOSTrendered when set, sudo probe marked skipped, CLI-missing recommends install, etc.).README.md,packages/api/README.md— new "Runtime contract" sections.Test plan
bun x vitest run tests/docker-git/controller-docker-diagnostics.test.ts tests/docker-git/controller.test.ts— 16/16 pass (5 existing + 11 new).bun run lint— no new lint errors introduced (38 total, all pre-existing in unrelated files; 0 in changed files).DOCKER_HOST.Reproduction (before this PR)
Run the host CLI on a machine where the user is not in the
dockergroup:→ The user cannot tell whether the daemon is down, their socket perms are wrong, or
docker-gititself is broken.Reproduction (after this PR)
Same setup: