fix: resolve follow-up issues from CodeQL sandbox recipe refactor (#50)#51
Conversation
- Fix Phase 1b/1c graceful completion artifact set swap - Rename REQUIRED_NOTES_1B to PHASE_1C_REQUIRED_NOTES - Fix prompts: 1b reading list, 1c summary path, old CodeQL paths - Fix Makefile and docstring stale phase references - Add missing _validate_codeql_plan_for_repair function - Expand COMPILED_LANGUAGES to include csharp/java-kotlin - Inject total_alerts into manifest before health computation - Fix write_summary undefined progress reference - Strengthen sandbox-recipe.yml validation with service/environment checks - Wire recipe validation into sandbox-validate - Use None instead of -1 for unknown extractor sentinel - Make completed-partial health classification reachable
📝 WalkthroughWalkthroughThis PR restructures Phase 1 of a CodeCome workflow by redefining subphase artifact ownership: Phase 1b now gates on sandbox bootstrap artifacts, Phase 1c gates on reconnaissance notes. It introduces CodeQL plan repair validation, improves CodeQL health classification with optional integer handling, expands sandbox recipe validation with docker-specific rules, and updates all related tests and documentation to reflect the new structure. ChangesPhase 1 Subphase Ownership and CodeQL Validation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Coverage Report
Generated by pytest-cov on |
Greptile SummaryThis PR resolves 19 follow-up issues from the CodeQL sandbox recipe refactor (#50), primarily correcting swapped phase-ownership assignments and a number of related crash and classification bugs.
Confidence Score: 4/5Safe to merge after fixing the build_command path check in recipe validation. The newly added build_command path existence check in _validate_build_targets passes the full command string (including arguments) to Path(root) / build_command instead of extracting just the first token the way _command_looks_path_like does. Any path-like build command that includes arguments will be resolved to a path that can never exist, causing false-positive validation errors. All other changes in the PR are well-tested and look correct. tools/sandbox/recipe.py — the build_command path resolution in _validate_build_targets Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Phase 1a] -->|target-profile.md\nbuild-model.md\ncodeql-plan.yml| B[CodeQL Analysis]
B --> C[Phase 1b\nSandbox Bootstrap]
C -->|sandbox-plan.md\nsandbox-recipe.yml\nCODECOME-GENERATED.md| D[Phase 1c\nDetailed Recon]
D -->|attack-surface.md\nthreat-model.md\nfile-risk-index.yml\n...| E[Phase 2]
C --> V[sandbox-validate]
V --> RV{Recipe Valid?}
RV -->|Yes| VS[overall: passed]
RV -->|No| VF[overall: failed]
B --> H[compute_health]
H --> H1{total_alerts in manifest?}
H1 -->|None| CP[completed-partial]
H1 -->|0| CEV[completed-empty-valid]
H1 -->|gt 0| CC[completed]
Reviews (2): Last reviewed commit: "fix: address PR review feedback — plan v..." | Re-trigger Greptile |
There was a problem hiding this comment.
Pull request overview
This PR addresses follow-up correctness gaps after the CodeQL + sandbox-recipe refactor by realigning Phase 1b/1c artifact ownership, improving sandbox recipe validation and reporting, and tightening CodeQL run-health classification and manifest contents so downstream consumers can reliably interpret results.
Changes:
- Correct Phase 1b vs 1c artifact ownership in gates, graceful-completion logic, phase progress reporting, and prompts.
- Expand sandbox recipe validation (docker compose/service checks, build-target validation) and surface validation results in
sandbox-bootstrap validate. - Improve CodeQL health/summary correctness by injecting
total_alerts, makingcompleted-partialreachable, and refining compiled-language extraction checks.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/sandbox/recipe.py | Adds docker-specific recipe validation and deeper build target checks. |
| tools/sandbox-bootstrap.py | Wires recipe validation into validate output and JSON payload. |
| tools/phases/phase_1_gates.py | Renames Phase 1c required recon notes constant and updates gate logic usage. |
| tools/phases/completion.py | Realigns Phase 1 artifact ownership and graceful-completion checks per subphase. |
| tools/codeql/runner.py | Fixes write_summary error handling for alerts count rendering. |
| tools/codeql/pipeline.py | Injects total_alerts from normalized output before health computation. |
| tools/codeql/health.py | Expands compiled language set and improves classification for unknown extractor/alert counts. |
| tools/codecome/phase_1.py | Adds missing check-codeql-plan validation implementation and plan error collection helper. |
| tools/codecome.py | Updates phase progress reporting for new Phase 1b/1c semantics. |
| tests/test_sandbox_recipe.py | Updates sandbox recipe tests to satisfy new validation expectations. |
| tests/test_phases_completion.py | Updates Phase 1 completion tests for new notes root layout and ownership. |
| tests/test_phase_graceful_completion_subphases.py | Updates subphase graceful-completion tests for new artifact ownership. |
| tests/test_phase_1_gates_threat_model.py | Updates tests to use renamed Phase 1c notes constant. |
| prompts/phase-1c-recon.md | Fixes CodeQL signals path resolution via last-run-manifest.yml + run id. |
| prompts/phase-1b-sandbox.md | Removes Phase 1c artifacts from Phase 1b required reading; points to recipe template. |
| Makefile | Updates sandbox bootstrap phase references from 1c → 1b. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tools/codecome.py (1)
356-395: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftMove phase progress implementation out of
tools/codecome.pyand delegate from the wrapper.This changed logic keeps business implementation in a root
tools/*.pyscript, which should remain a thin delegator to a package module.♻️ Suggested direction
- def check_phase_progress() -> None: - ... + from codecome.status import check_phase_progress# tools/codecome/status.py def check_phase_progress(root: Path, yaml_module) -> None: ...As per coding guidelines, “Standalone scripts at the tools/ root ... should be thin wrappers that delegate to their respective packages. Implementation must live in the package, not the script.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codecome.py` around lines 356 - 395, The phase-progress logic in tools/codecome.py (the block using ROOT, notes_dir, evidence_root, count_findings(), PHASE_1C_REQUIRED_NOTES, manifest_path and constructing rows for "Phase 1a"/"CodeQL"/"Phase 1b"/"Phase 1c") should be moved into a package helper function (e.g. tools/codecome/status.py:def check_phase_progress(root: Path, yaml_module) -> list[tuple[str,str,str]]). Implement check_phase_progress to accept the root Path and yaml module, perform the same checks (count_findings(), presence checks for target-profile.md/build-model.md/codeql-plan.yml, sandbox-plan.md, read run-manifest.yml with safe_load and status handling, evaluate PHASE_1C_REQUIRED_NOTES) and return the rows list; then replace the moved code in tools/codecome.py with a thin delegator that calls check_phase_progress(ROOT, yaml) and uses its returned rows. Ensure you reference PHASE_1C_REQUIRED_NOTES inside the new module by importing it and keep the same status/level mapping logic.Source: Coding guidelines
tools/sandbox-bootstrap.py (1)
1521-1571:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRecipe validation failures currently do not fail
validateexit status.
recipe_errorsonly affects output fields; if tiers pass, the command still returns success. That can greenlight an invalid recipe even though downstream CodeQL flow consumes recipe service/compose settings.Suggested fix
recipe_errors: list[str] = [] if SANDBOX_RECIPE_PATH.exists(): try: from sandbox.recipe import load_recipe, validate_recipe recipe = load_recipe(SANDBOX_RECIPE_PATH) recipe_errors = validate_recipe(recipe, root=ROOT) except Exception as exc: recipe_errors = [f"Failed to load recipe: {exc}"] elif not args.scripts_only: recipe_errors = ["sandbox-recipe.yml not found; run Phase 1b to generate it"] + + if recipe_errors: + overall_outcome = "failed"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/sandbox-bootstrap.py` around lines 1521 - 1571, The recipe validation errors are only used for output but don't affect the exit status; after validating and building recipe_errors (the block that computes recipe_errors from SANDBOX_RECIPE_PATH / args.scripts_only), if recipe_errors is non-empty set overall_outcome = "failed" (or set an explicit exit_code variable to non-zero) and ensure payload["recipe_valid"] is False before emitting/printing; keep the final return logic (return 0 if overall_outcome == "passed" else 1) so the validate command fails when recipe_errors exist.
🧹 Nitpick comments (1)
tools/codeql/runner.py (1)
660-671: 💤 Low valueError handling added for alerts.yml read failures.
The broad exception handler is acceptable here for summary generation resilience. The user gets explicit feedback rather than a crash or missing information.
Optional: Consider using
manifest["total_alerts"]instead of re-reading.
write_summary()currently re-parsesalerts.ymlto count alerts, duplicating work already done by_inject_total_alerts()in the pipeline. Using the manifest field would be more efficient and ensure consistency with health classification:total_alerts = manifest.get("total_alerts") if total_alerts is not None: lines.append(f"- **Total alerts**: {total_alerts}") lines.append("") elif (normalized_dir / "alerts.yml").is_file(): lines.append("- **Total alerts**: (error reading alerts.yml)") lines.append("")This change is out of scope for the current PR but could be considered for future cleanup.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/runner.py` around lines 660 - 671, The current write_summary() block re-parses alerts.yml using alerts_path and load_yaml_mapping and catches Exception; instead, use the already-populated manifest entry populated by _inject_total_alerts: read total_alerts = manifest.get("total_alerts") inside write_summary(), and if total_alerts is not None append the lines with that value; only fall back to checking alerts_path.is_file() to append the “(error reading alerts.yml)” message if manifest lacks the key. Update references to normalized_dir/alerts.yml and remove the duplicated load_yaml_mapping path to avoid re-parsing and keep behavior identical when manifest has the count.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/codecome/phase_1.py`:
- Around line 224-228: _summary: _collect_plan_errors() currently hard-fails
unsupported languages and empty language lists causing check-codeql-plan to
report failures that check_phase_1a() only blocks under fail_policy=hard._
Modify _collect_plan_errors to accept a fail_policy (or soft_fail boolean)
parameter and stop treating unsupported languages / empty analysis_units as hard
errors; instead emit non-blocking notices or warnings when fail_policy is soft,
and preserve hard errors when fail_policy is hard; update callers (e.g.,
check_phase_1a and any invocation of _collect_plan_errors) to pass the correct
fail_policy so behavior matches Gate 1a semantics and ensure the same change is
applied to the other similar blocks referenced (the checks around analysis_units
and language lists).
In `@tools/sandbox-bootstrap.py`:
- Around line 1521-1531: The recipe-loading and validation logic in
tools/sandbox-bootstrap.py (the block using SANDBOX_RECIPE_PATH, load_recipe and
validate_recipe and assigning recipe_errors) should be moved into a package
helper (e.g., a new function like
sandbox.recipe_helper.load_and_validate_recipe) and the script should call that
helper instead; implement a function in the sandbox package that accepts the
path, ROOT and args.scripts_only (or equivalent) and returns the recipe_errors
list (including catching exceptions and returning the failure message), then
replace the inline logic in tools/sandbox-bootstrap.py with a single call to
that helper and use its returned recipe_errors.
---
Outside diff comments:
In `@tools/codecome.py`:
- Around line 356-395: The phase-progress logic in tools/codecome.py (the block
using ROOT, notes_dir, evidence_root, count_findings(), PHASE_1C_REQUIRED_NOTES,
manifest_path and constructing rows for "Phase 1a"/"CodeQL"/"Phase 1b"/"Phase
1c") should be moved into a package helper function (e.g.
tools/codecome/status.py:def check_phase_progress(root: Path, yaml_module) ->
list[tuple[str,str,str]]). Implement check_phase_progress to accept the root
Path and yaml module, perform the same checks (count_findings(), presence checks
for target-profile.md/build-model.md/codeql-plan.yml, sandbox-plan.md, read
run-manifest.yml with safe_load and status handling, evaluate
PHASE_1C_REQUIRED_NOTES) and return the rows list; then replace the moved code
in tools/codecome.py with a thin delegator that calls check_phase_progress(ROOT,
yaml) and uses its returned rows. Ensure you reference PHASE_1C_REQUIRED_NOTES
inside the new module by importing it and keep the same status/level mapping
logic.
In `@tools/sandbox-bootstrap.py`:
- Around line 1521-1571: The recipe validation errors are only used for output
but don't affect the exit status; after validating and building recipe_errors
(the block that computes recipe_errors from SANDBOX_RECIPE_PATH /
args.scripts_only), if recipe_errors is non-empty set overall_outcome = "failed"
(or set an explicit exit_code variable to non-zero) and ensure
payload["recipe_valid"] is False before emitting/printing; keep the final return
logic (return 0 if overall_outcome == "passed" else 1) so the validate command
fails when recipe_errors exist.
---
Nitpick comments:
In `@tools/codeql/runner.py`:
- Around line 660-671: The current write_summary() block re-parses alerts.yml
using alerts_path and load_yaml_mapping and catches Exception; instead, use the
already-populated manifest entry populated by _inject_total_alerts: read
total_alerts = manifest.get("total_alerts") inside write_summary(), and if
total_alerts is not None append the lines with that value; only fall back to
checking alerts_path.is_file() to append the “(error reading alerts.yml)”
message if manifest lacks the key. Update references to
normalized_dir/alerts.yml and remove the duplicated load_yaml_mapping path to
avoid re-parsing and keep behavior identical when manifest has the count.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b08ac110-238c-4f4f-8284-72b0e9d967c6
📒 Files selected for processing (16)
Makefileprompts/phase-1b-sandbox.mdprompts/phase-1c-recon.mdtests/test_phase_1_gates_threat_model.pytests/test_phase_graceful_completion_subphases.pytests/test_phases_completion.pytests/test_sandbox_recipe.pytools/codecome.pytools/codecome/phase_1.pytools/codeql/health.pytools/codeql/pipeline.pytools/codeql/runner.pytools/phases/completion.pytools/phases/phase_1_gates.pytools/sandbox-bootstrap.pytools/sandbox/recipe.py
…, alerts injection - _collect_plan_errors() now respects fail_policy (soft vs hard) for unsupported languages and empty language lists, matching Gate 1a - Removed confidence check from _collect_plan_errors (gate warns only) - Recipe validation failures now fail sandbox-validate exit status - _inject_total_alerts() appends warning instead of silently passing - build_command validation only path-checks when first token looks like a filesystem path (./ or /), allowing shell commands - Phase 1b progress check now requires both sandbox-plan.md and recipe - write_summary() uses manifest total_alerts instead of re-parsing
pruiz
left a comment
There was a problem hiding this comment.
Review follow-up on the three out-of-diff items from coderabbit:
1. tools/codecome.py phase progress → Deferred
Moving check_phase_progress() into a package module is an architectural cleanup (thin-wrapper guideline). tools/codecome.py already contains many non-thin functions (check_codeql_status, check_sandbox_status, etc.). Extracting just this one function is a good idea but out of scope for this bug-fix PR. Will address in a future refactor.
2. tools/sandbox-bootstrap.py recipe validation exit status → ✅ Addressed in eee7c6d
Recipe validation errors now set overall_outcome = "failed", which flows through the existing exit gate: return 0 if overall_outcome == "passed" else 1. The JSON payload already exposes recipe_valid: false and recipe_errors.
3. tools/codeql/runner.py write_summary use manifest total_alerts → ✅ Addressed in eee7c6d
write_summary() now reads manifest.get("total_alerts") first, and only falls back to re-parsing alerts.yml when the manifest lacks the key. This avoids the duplicated parse and keeps the summary consistent with the health classification.
Summary
Resolves the 19 follow-up issues identified in #50 after the CodeQL sandbox recipe refactor (PR #45), plus addresses all 8 PR review comments from Copilot, CodeRabbit, and Greptile.
Original fixes (commit 107e1b2)
Phase ownership fixes:
completion.pycodecome.pyphase status reportingREQUIRED_NOTES_1B→PHASE_1C_REQUIRED_NOTESPrompt fixes:
phase-1b-sandbox.md: removed Phase 1c artifacts from required readingphase-1c-recon.md: corrected run summary path and old CodeQL pathCodeQL health model fixes:
COMPILED_LANGUAGES, injectedtotal_alerts, madecompleted-partialreachable, switched unknown sentinel toNoneCrash fixes:
_validate_codeql_plan_for_repairfunctionprogressreference inwrite_summarySandbox recipe validation:
sandbox-validateStale docs: Makefile, docstring
Review feedback fixes (commit eee7c6d)
_inject_total_alertssilently discards exceptionsmanifest["warnings"]build_commandtreated as filesystem path (rejects shell commands)_command_looks_path_like()helper; only path-validates when first token starts with./or/sandbox-plan.md, missingsandbox-recipe.yml_collect_plan_errors()hard-fails unsupported languages even underfail_policy=softfail_policy; only errors underhard_collect_plan_errors()hard-fails emptylanguages:[]under softfail_policy=hard_collect_plan_errors()treats unexpected confidence as error (gate only warns)sandbox-validateexit statusrecipe_errorsnow setsoverall_outcome = "failed"write_summaryre-parsesalerts.ymlinstead of using manifestmanifest["total_alerts"]firsttools/codecome.pyalready has many non-thin functionsTests
775/775 pass.