Skip to content

CNTRLPLANE-3339: add eval-agents job for openshift/hypershift#78630

Open
enxebre wants to merge 1 commit into
openshift:mainfrom
enxebre:hypershift-eval-agents
Open

CNTRLPLANE-3339: add eval-agents job for openshift/hypershift#78630
enxebre wants to merge 1 commit into
openshift:mainfrom
enxebre:hypershift-eval-agents

Conversation

@enxebre

@enxebre enxebre commented Apr 30, 2026

Copy link
Copy Markdown
Member

Summary

  • Add hypershift-eval-agents step-registry workflow that runs promptfoo-based LLM judge evaluations for Claude Code agent definitions
  • Setup step installs Node.js/npm on the claude-ai-helpers image, run step invokes make eval-agents which uses npx promptfoo to execute eval scenarios
  • Pairs with openshift/hypershift@main...enxebre:hypershift:evals-promptfoo which adds the promptfoo config and Makefile target

Test plan

  • Trigger the eval-agents presubmit via gangway or /test eval-agents to verify Node.js installs and promptfoo runs successfully
  • Verify JUnit results are written to ${ARTIFACT_DIR}/results.xml

🤖 Generated with Claude Code

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 30, 2026
@openshift-ci-robot

openshift-ci-robot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

@enxebre: This pull request references CNTRLPLANE-3339 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add an optional, on-demand Prow job (eval-agents) for openshift/hypershift
  • Tests Claude Code agent definitions and AGENTS.md conventions using an LLM judge framework
  • Uses claude-ai-helpers image with Vertex AI authentication
  • Trigger on PRs with: /test eval-agents

Files

  • ci-operator/step-registry/hypershift/eval-agents/ — workflow + setup/run refs
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml — optional job config

Depends on

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new optional HyperShift CI workflow hypershift-eval-agents (setup + run steps), scripts, refs, metadata and OWNERS files, and wires an eval-agents test job that runs make eval-agents with optional EVAL_FOCUS, producing results in ARTIFACT_DIR.

Changes

eval-agents workflow + step-refs

Layer / File(s) Summary
Pipeline wiring
ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
Adds optional eval-agents test job referencing hypershift-eval-agents with always_run: false, optional: true.
Workflow definition
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
Adds hypershift-eval-agents workflow with prehypershift-eval-agents-setup and testhypershift-eval-agents-run; documents Node/npm + Claude CLI check and make eval-agents usage with EVAL_FOCUS.
Workflow metadata
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
New metadata mapping to the workflow YAML and listing owners (approvers/reviewers).
Setup ref
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml
Adds hypershift-eval-agents-setup ref (from claude-ai-helpers) that runs the setup commands; defines Vertex/credential env defaults, resources, and secret mount.
Setup commands
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
New strict-mode script: verifies claude --version (errors if missing), installs Node.js/npm (dnf with yum fallback), prints versions.
Setup metadata
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
New metadata referencing the setup ref YAML; assigns owners (approvers/reviewers).
Run ref
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
Adds hypershift-eval-agents-run ref (from claude-ai-helpers) executing run commands; configures Vertex/Google auth env vars, EVAL_FOCUS and override, resources, and credential mount.
Run commands
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
New strict-mode script: cd into repo, build MAKE_ARGS including EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml, optionally append EVAL_FILTER=${EVAL_FOCUS}, run make eval-agents.
Run metadata
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
New metadata referencing the run ref YAML; assigns owners (approvers/reviewers).
OWNERS
ci-operator/step-registry/hypershift/eval-agents/OWNERS, .../setup/OWNERS, .../run/OWNERS
Adds/updates OWNERS files listing approvers and reviewers (bryan-cox, csrwng, celebdor, devguyio, enxebre, sjenning).
sequenceDiagram
    participant Prow as Prow CI
    participant Job as Job Container
    participant Repo as Hypershift Repo
    participant Vertex as Vertex AI / Claude Service
    Prow->>Job: schedule `hypershift-eval-agents`
    Job->>Repo: run setup script (verify `claude`, install node/npm)
    Job->>Vertex: authenticate using mounted credentials
    Job->>Repo: run eval (make eval-agents, write ${ARTIFACT_DIR}/results.xml)
    Job->>Prow: upload artifacts/results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding an eval-agents job for the openshift/hypershift repository, which is the primary objective across all modified/added files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR does not add or modify any Ginkgo test files. Changes are purely CI/CD infrastructure (shell scripts, YAML configs, metadata JSON, OWNERS files). Stable test name check is not applicable.
Test Structure And Quality ✅ Passed Custom check is for Ginkgo test code review. This PR contains only CI/CD configuration (YAML, shell scripts, JSON metadata, OWNERS files) with zero Go/Ginkgo test files. Check is not applicable.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests added. PR adds CI infrastructure (Prow job config, workflow YAML, scripts) only. Custom check applies only to new Ginkgo tests, hence not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR does not add any Ginkgo e2e tests. The eval-agents job is a CI workflow that runs promptfoo LLM evaluations, not Ginkgo e2e tests. SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds CI test infrastructure only (Prow jobs, step-registry, shell scripts). No deployment manifests, operators, or controllers with scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed PR adds Prow CI infrastructure and bash scripts, not OTE binaries. OTE Stdout Contract applies only to OpenShift test binaries, not auxiliary CI job scripts.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. It only adds CI/CD pipeline configuration (YAML, shell scripts, metadata) to the openshift/release repository. The custom check does not apply.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2026
@openshift-ci openshift-ci Bot requested review from bryan-cox and jparrill April 30, 2026 11:40
@enxebre

enxebre commented Apr 30, 2026

Copy link
Copy Markdown
Member Author

/hold
on openshift/hypershift#8382

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2026
@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Apr 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 16-24: Add a precheck before invoking "make eval-agents" that when
Vertex mode is enabled (e.g., VERTEX_MODE=true) verifies the
GOOGLE_APPLICATION_CREDENTIALS env var is set and the referenced file is
readable; if the var is empty or the file is not readable, print an error and
exit 1 so the script fails fast instead of letting make eval-agents (and targets
driven by EVAL_FOCUS/EVAL_VERBOSE) fail later.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: dee9528e-8165-4b24-9e92-faed98366fee

📥 Commits

Reviewing files that changed from the base of the PR and between 2f4532c and 1714d74.

📒 Files selected for processing (10)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
  • ci-operator/step-registry/hypershift/eval-agents/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

Comment on lines +16 to +24
cd /go/src/github.com/openshift/hypershift

if [ -n "${EVAL_FOCUS:-}" ]; then
echo "Focus: ${EVAL_FOCUS}"
make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1
else
echo "Running all eval scenarios..."
make eval-agents EVAL_VERBOSE=1
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast on missing Vertex credentials before running evaluations.

When Vertex mode is enabled, missing/unreadable GOOGLE_APPLICATION_CREDENTIALS will fail later with less actionable output. Add an explicit precheck here.

Suggested patch
 cd /go/src/github.com/openshift/hypershift
 
+if [ "${CLAUDE_CODE_USE_VERTEX:-1}" = "1" ]; then
+    if [ -z "${GOOGLE_APPLICATION_CREDENTIALS:-}" ] || [ ! -r "${GOOGLE_APPLICATION_CREDENTIALS}" ]; then
+        echo "ERROR: GOOGLE_APPLICATION_CREDENTIALS is unset or unreadable: ${GOOGLE_APPLICATION_CREDENTIALS:-<unset>}"
+        exit 1
+    fi
+fi
+
 if [ -n "${EVAL_FOCUS:-}" ]; then
     echo "Focus: ${EVAL_FOCUS}"
     make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cd /go/src/github.com/openshift/hypershift
if [ -n "${EVAL_FOCUS:-}" ]; then
echo "Focus: ${EVAL_FOCUS}"
make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1
else
echo "Running all eval scenarios..."
make eval-agents EVAL_VERBOSE=1
fi
cd /go/src/github.com/openshift/hypershift
if [ "${CLAUDE_CODE_USE_VERTEX:-1}" = "1" ]; then
if [ -z "${GOOGLE_APPLICATION_CREDENTIALS:-}" ] || [ ! -r "${GOOGLE_APPLICATION_CREDENTIALS}" ]; then
echo "ERROR: GOOGLE_APPLICATION_CREDENTIALS is unset or unreadable: ${GOOGLE_APPLICATION_CREDENTIALS:-<unset>}"
exit 1
fi
fi
if [ -n "${EVAL_FOCUS:-}" ]; then
echo "Focus: ${EVAL_FOCUS}"
make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1
else
echo "Running all eval scenarios..."
make eval-agents EVAL_VERBOSE=1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 16 - 24, Add a precheck before invoking "make eval-agents" that
when Vertex mode is enabled (e.g., VERTEX_MODE=true) verifies the
GOOGLE_APPLICATION_CREDENTIALS env var is set and the referenced file is
readable; if the var is empty or the file is not readable, print an error and
exit 1 so the script fails fast instead of letting make eval-agents (and targets
driven by EVAL_FOCUS/EVAL_VERBOSE) fail later.

@enxebre enxebre force-pushed the hypershift-eval-agents branch from b174a9d to f8ff049 Compare May 5, 2026 11:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 8-16: The script builds MAKE_ARGS as a plain string so expanding
${MAKE_ARGS} unquoted when calling make can split values with spaces (e.g.,
EVAL_FOCUS) into multiple args; change MAKE_ARGS to an array: initialize
MAKE_ARGS=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" ), when EVAL_FOCUS is set
append MAKE_ARGS+=( "EVAL_FILTER=${EVAL_FOCUS}" ), and invoke make as: make
eval-agents "${MAKE_ARGS[@]}" so EVAL_FILTER is passed as a single argument;
update references to MAKE_ARGS, EVAL_FOCUS, EVAL_FILTER and the make eval-agents
call accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 15cbe2ee-52dd-49a8-b6dc-c9ad0fd4b8bd

📥 Commits

Reviewing files that changed from the base of the PR and between 1714d74 and b174a9d.

📒 Files selected for processing (10)
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
  • ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
  • ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml
✅ Files skipped from review due to trivial changes (2)
  • ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

Comment on lines +8 to +16
MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"

if [ -n "${EVAL_FOCUS:-}" ]; then
echo "Filter: ${EVAL_FOCUS}"
MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}"
fi

echo "Running eval-agents with promptfoo..."
make eval-agents ${MAKE_ARGS}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verifies shell splitting/globbing risk from unquoted scalar expansion.
set -euo pipefail
tmp="$(mktemp -d)"
cd "$tmp"
touch a b
EVAL_FOCUS='*'
MAKE_ARGS="EVAL_OUTPUT=/tmp/results.xml EVAL_FILTER=${EVAL_FOCUS}"
echo "Args produced by unquoted expansion:"
for arg in ${MAKE_ARGS}; do printf '  [%s]\n' "$arg"; done

Repository: openshift/release

Length of output: 149


🏁 Script executed:

# Check if file exists and read the relevant section
git ls-files | grep -i "hypershift-eval-agents-run-commands.sh"

Repository: openshift/release

Length of output: 153


🏁 Script executed:

# Read the file and verify lines 8-16
file_path="ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh"
if [ -f "$file_path" ]; then
  echo "=== File found. Lines 8-16: ==="
  sed -n '8,16p' "$file_path"
else
  echo "File not found at: $file_path"
fi

Repository: openshift/release

Length of output: 347


🏁 Script executed:

# More comprehensive test with spaces and special characters
#!/bin/bash
set -euo pipefail
tmp="$(mktemp -d)"
cd "$tmp"
touch "file with spaces" "another*file"

# Test 1: EVAL_FOCUS with spaces
echo "=== Test 1: EVAL_FOCUS with spaces ==="
EVAL_FOCUS='value with spaces'
MAKE_ARGS="EVAL_OUTPUT=/tmp/results.xml EVAL_FILTER=${EVAL_FOCUS}"
echo "Unquoted expansion (current approach):"
for arg in ${MAKE_ARGS}; do printf '  [%s]\n' "$arg"; done
echo "Expected: 2 args, Got: $(echo ${MAKE_ARGS} | wc -w) args"

# Test 2: Array-based approach (proposed fix)
echo ""
echo "=== Test 2: Array-based approach (proposed fix) ==="
MAKE_ARGS=("EVAL_OUTPUT=/tmp/results.xml")
EVAL_FOCUS='value with spaces'
MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}")
echo "Quoted array expansion:"
for arg in "${MAKE_ARGS[@]}"; do printf '  [%s]\n' "$arg"; done
echo "Expected: 2 args, Got: ${`#MAKE_ARGS`[@]} args"

Repository: openshift/release

Length of output: 418


Use an argument array for make vars to avoid shell word splitting.

On line 16, ${MAKE_ARGS} is expanded unquoted. If EVAL_FOCUS contains spaces, EVAL_FILTER gets split into multiple arguments before reaching make. For example, EVAL_FOCUS='value with spaces' produces EVAL_FILTER=value with spaces as three separate arguments instead of one.

Proposed patch
-MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"
+MAKE_ARGS=("EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml")
 
 if [ -n "${EVAL_FOCUS:-}" ]; then
     echo "Filter: ${EVAL_FOCUS}"
-    MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}"
+    MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}")
 fi
 
 echo "Running eval-agents with promptfoo..."
-make eval-agents ${MAKE_ARGS}
+make eval-agents "${MAKE_ARGS[@]}"
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 16-16: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 8 - 16, The script builds MAKE_ARGS as a plain string so expanding
${MAKE_ARGS} unquoted when calling make can split values with spaces (e.g.,
EVAL_FOCUS) into multiple args; change MAKE_ARGS to an array: initialize
MAKE_ARGS=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" ), when EVAL_FOCUS is set
append MAKE_ARGS+=( "EVAL_FILTER=${EVAL_FOCUS}" ), and invoke make as: make
eval-agents "${MAKE_ARGS[@]}" so EVAL_FILTER is passed as a single argument;
update references to MAKE_ARGS, EVAL_FOCUS, EVAL_FILTER and the make eval-agents
call accordingly.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh (2)

8-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use an argument array for make variables to avoid shell word splitting.

This issue was previously identified. The unquoted expansion ${MAKE_ARGS} on line 16 will split EVAL_FOCUS values containing spaces into multiple arguments. For example, EVAL_FOCUS="agent validation" produces three separate arguments (EVAL_FILTER=agent, validation) instead of EVAL_FILTER="agent validation".

🔧 Array-based fix
-MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"
+MAKE_ARGS=("EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml")
 
 if [ -n "${EVAL_FOCUS:-}" ]; then
     echo "Filter: ${EVAL_FOCUS}"
-    MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}"
+    MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}")
 fi
 
 echo "Running eval-agents with promptfoo..."
-make eval-agents ${MAKE_ARGS}
+make eval-agents "${MAKE_ARGS[@]}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 8 - 16, The make invocation uses an unquoted string variable
MAKE_ARGS causing word-splitting of values like EVAL_FOCUS; replace the string
accumulation with an array (e.g., MAKE_ARGS_ARRAY) and push elements such as
"EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" and, when EVAL_FOCUS is set,
"EVAL_FILTER=${EVAL_FOCUS}" into that array, then invoke make with make
eval-agents "${MAKE_ARGS_ARRAY[@]}" so space-containing values in EVAL_FOCUS are
passed as a single argument; update references to MAKE_ARGS to use the new array
name and ensure expansions are quoted where used.

6-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast on missing Vertex credentials before running evaluations.

This issue was previously identified. When Vertex mode is enabled (likely the default), missing or unreadable GOOGLE_APPLICATION_CREDENTIALS will cause make eval-agents to fail later with less actionable output. Adding an explicit precheck after line 6 would improve the debugging experience.

🛡️ Suggested precheck
 cd /go/src/github.com/openshift/hypershift
 
+if [ "${CLAUDE_CODE_USE_VERTEX:-1}" = "1" ]; then
+    if [ -z "${GOOGLE_APPLICATION_CREDENTIALS:-}" ] || [ ! -r "${GOOGLE_APPLICATION_CREDENTIALS}" ]; then
+        echo "ERROR: GOOGLE_APPLICATION_CREDENTIALS is unset or unreadable: ${GOOGLE_APPLICATION_CREDENTIALS:-<unset>}"
+        exit 1
+    fi
+fi
+
 MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 6 - 7, Add a fail-fast precheck in
hypershift-eval-agents-run-commands.sh to validate
GOOGLE_APPLICATION_CREDENTIALS before invoking make eval-agents: detect if
Vertex mode is enabled (e.g., via the existing env flag or by checking
$VERTEX_MODE/$USE_VERTEX variable) and then verify the
GOOGLE_APPLICATION_CREDENTIALS env var is set and points to a readable file; if
not, print a clear error to stderr and exit non‑zero so the job fails early with
actionable output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 8-16: The make invocation uses an unquoted string variable
MAKE_ARGS causing word-splitting of values like EVAL_FOCUS; replace the string
accumulation with an array (e.g., MAKE_ARGS_ARRAY) and push elements such as
"EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" and, when EVAL_FOCUS is set,
"EVAL_FILTER=${EVAL_FOCUS}" into that array, then invoke make with make
eval-agents "${MAKE_ARGS_ARRAY[@]}" so space-containing values in EVAL_FOCUS are
passed as a single argument; update references to MAKE_ARGS to use the new array
name and ensure expansions are quoted where used.
- Around line 6-7: Add a fail-fast precheck in
hypershift-eval-agents-run-commands.sh to validate
GOOGLE_APPLICATION_CREDENTIALS before invoking make eval-agents: detect if
Vertex mode is enabled (e.g., via the existing env flag or by checking
$VERTEX_MODE/$USE_VERTEX variable) and then verify the
GOOGLE_APPLICATION_CREDENTIALS env var is set and points to a readable file; if
not, print a clear error to stderr and exit non‑zero so the job fails early with
actionable output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c07bd22c-54f6-4702-9903-34a9d785dc0a

📥 Commits

Reviewing files that changed from the base of the PR and between b174a9d and f8ff049.

📒 Files selected for processing (12)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
  • ci-operator/step-registry/hypershift/eval-agents/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
  • ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
  • ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml
✅ Files skipped from review due to trivial changes (10)
  • ci-operator/step-registry/hypershift/eval-agents/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml

@enxebre enxebre force-pushed the hypershift-eval-agents branch 3 times, most recently from 5147370 to b100381 Compare May 5, 2026 12:09
Add a promptfoo-based eval workflow that tests Claude Code agent
definitions using LLM judge assertions. The setup step installs
Node.js/npm and the run step invokes make eval-agents via npx promptfoo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@enxebre enxebre force-pushed the hypershift-eval-agents branch from b100381 to 52bfa65 Compare May 5, 2026 12:12
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@enxebre: no rehearsable tests are affected by this change

Note: If this PR includes changes to step registry files (ci-operator/step-registry/) and you expected jobs to be found, try rebasing your PR onto the base branch. This helps pj-rehearse accurately detect changes when the base branch has moved forward.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@enxebre

enxebre commented May 5, 2026

Copy link
Copy Markdown
Member Author

/hold
on openshift/hypershift#8419

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh (1)

8-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use an argument array to prevent make arg splitting.

On Line 16, unquoted ${MAKE_ARGS} can split/glob values introduced on Line 12 (for example, EVAL_FOCUS='foo bar'), so EVAL_FILTER may be passed incorrectly.

Proposed patch
-MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"
+MAKE_ARGS=("EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml")
 
 if [ -n "${EVAL_FOCUS:-}" ]; then
     echo "Filter: ${EVAL_FOCUS}"
-    MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}"
+    MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}")
 fi
 
 echo "Running eval-agents with promptfoo..."
-make eval-agents ${MAKE_ARGS}
+make eval-agents "${MAKE_ARGS[@]}"
#!/bin/bash
# Read-only verification for arg-splitting risk in current implementation.
set -euo pipefail

file="$(fd -p 'hypershift-eval-agents-run-commands.sh' | head -n1)"
echo "Inspecting: ${file}"
sed -n '8,18p' "${file}"

tmp="$(mktemp -d)"
cd "${tmp}"
touch a b

EVAL_FOCUS='value with spaces'
MAKE_ARGS="EVAL_OUTPUT=/tmp/results.xml EVAL_FILTER=${EVAL_FOCUS}"

echo "Unquoted expansion arguments:"
for arg in ${MAKE_ARGS}; do
  printf '  [%s]\n' "${arg}"
done

echo "Expected behavior after array fix: EVAL_FILTER stays one argument."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 8 - 16, The current script builds a space-joined string in
MAKE_ARGS and then expands it unquoted in the make call, which will split/glob
values like EVAL_FOCUS='foo bar'; change to build an array (e.g.,
MAKE_ARGS_ARR=() ; MAKE_ARGS_ARR+=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" )
and if EVAL_FOCUS is set then MAKE_ARGS_ARR+=( "EVAL_FILTER=${EVAL_FOCUS}" ) )
and pass it to make as: make eval-agents "${MAKE_ARGS_ARR[@]}", updating the
MAKE_ARGS variable usage and the final make invocation to use the array form to
preserve arguments with spaces.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 8-16: The current script builds a space-joined string in MAKE_ARGS
and then expands it unquoted in the make call, which will split/glob values like
EVAL_FOCUS='foo bar'; change to build an array (e.g., MAKE_ARGS_ARR=() ;
MAKE_ARGS_ARR+=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" ) and if EVAL_FOCUS
is set then MAKE_ARGS_ARR+=( "EVAL_FILTER=${EVAL_FOCUS}" ) ) and pass it to make
as: make eval-agents "${MAKE_ARGS_ARR[@]}", updating the MAKE_ARGS variable
usage and the final make invocation to use the array form to preserve arguments
with spaces.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: cbf05b7f-1299-4de9-93c7-f01bf4c9396b

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea6695 and b100381.

📒 Files selected for processing (12)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
  • ci-operator/step-registry/hypershift/eval-agents/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
  • ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
  • ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml
✅ Files skipped from review due to trivial changes (6)
  • ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
  • ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
  • ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
  • ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@enxebre: no rehearsable tests are affected by this change

Note: If this PR includes changes to step registry files (ci-operator/step-registry/) and you expected jobs to be found, try rebasing your PR onto the base branch. This helps pj-rehearse accurately detect changes when the base branch has moved forward.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci

openshift-ci Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

@enxebre: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants