CNTRLPLANE-3339: add eval-agents job for openshift/hypershift by enxebre · Pull Request #78630 · openshift/release

enxebre · 2026-04-30T11:39:31Z

Summary

Add hypershift-eval-agents step-registry workflow that runs promptfoo-based LLM judge evaluations for Claude Code agent definitions
Setup step installs Node.js/npm on the claude-ai-helpers image, run step invokes make eval-agents which uses npx promptfoo to execute eval scenarios
Pairs with openshift/hypershift@main...enxebre:hypershift:evals-promptfoo which adds the promptfoo config and Makefile target

Test plan

Trigger the eval-agents presubmit via gangway or /test eval-agents to verify Node.js installs and promptfoo runs successfully
Verify JUnit results are written to ${ARTIFACT_DIR}/results.xml

🤖 Generated with Claude Code

openshift-ci-robot · 2026-04-30T11:39:36Z

@enxebre: This pull request references CNTRLPLANE-3339 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Add an optional, on-demand Prow job (eval-agents) for openshift/hypershift

Tests Claude Code agent definitions and AGENTS.md conventions using an LLM judge framework

Uses claude-ai-helpers image with Vertex AI authentication

Trigger on PRs with: /test eval-agents

Files

ci-operator/step-registry/hypershift/eval-agents/ — workflow + setup/run refs

ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml — optional job config

Depends on

CNTRLPLANE-3339: add agent and convention eval framework hypershift#8382 (adds make eval-agents target)

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-04-30T11:40:02Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a new optional HyperShift CI workflow hypershift-eval-agents (setup + run steps), scripts, refs, metadata and OWNERS files, and wires an eval-agents test job that runs make eval-agents with optional EVAL_FOCUS, producing results in ARTIFACT_DIR.

Changes

eval-agents workflow + step-refs

Layer / File(s)	Summary
Pipeline wiring `ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml`	Adds optional `eval-agents` test job referencing `hypershift-eval-agents` with `always_run: false`, `optional: true`.
Workflow definition `ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml`	Adds `hypershift-eval-agents` workflow with `pre`→`hypershift-eval-agents-setup` and `test`→`hypershift-eval-agents-run`; documents Node/npm + Claude CLI check and `make eval-agents` usage with `EVAL_FOCUS`.
Workflow metadata `ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json`	New metadata mapping to the workflow YAML and listing owners (approvers/reviewers).
Setup ref `ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml`	Adds `hypershift-eval-agents-setup` ref (from `claude-ai-helpers`) that runs the setup commands; defines Vertex/credential env defaults, resources, and secret mount.
Setup commands `ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh`	New strict-mode script: verifies `claude --version` (errors if missing), installs Node.js/npm (dnf with yum fallback), prints versions.
Setup metadata `ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json`	New metadata referencing the setup ref YAML; assigns owners (approvers/reviewers).
Run ref `ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml`	Adds `hypershift-eval-agents-run` ref (from `claude-ai-helpers`) executing run commands; configures Vertex/Google auth env vars, `EVAL_FOCUS` and override, resources, and credential mount.
Run commands `ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`	New strict-mode script: cd into repo, build `MAKE_ARGS` including `EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml`, optionally append `EVAL_FILTER=${EVAL_FOCUS}`, run `make eval-agents`.
Run metadata `ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json`	New metadata referencing the run ref YAML; assigns owners (approvers/reviewers).
OWNERS `ci-operator/step-registry/hypershift/eval-agents/OWNERS`, `.../setup/OWNERS`, `.../run/OWNERS`	Adds/updates OWNERS files listing approvers and reviewers (`bryan-cox`, `csrwng`, `celebdor`, `devguyio`, `enxebre`, `sjenning`).

sequenceDiagram
    participant Prow as Prow CI
    participant Job as Job Container
    participant Repo as Hypershift Repo
    participant Vertex as Vertex AI / Claude Service
    Prow->>Job: schedule `hypershift-eval-agents`
    Job->>Repo: run setup script (verify `claude`, install node/npm)
    Job->>Vertex: authenticate using mounted credentials
    Job->>Repo: run eval (make eval-agents, write ${ARTIFACT_DIR}/results.xml)
    Job->>Prow: upload artifacts/results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 12

✅ Passed checks (12 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding an eval-agents job for the openshift/hypershift repository, which is the primary objective across all modified/added files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR does not add or modify any Ginkgo test files. Changes are purely CI/CD infrastructure (shell scripts, YAML configs, metadata JSON, OWNERS files). Stable test name check is not applicable.
Test Structure And Quality	✅ Passed	Custom check is for Ginkgo test code review. This PR contains only CI/CD configuration (YAML, shell scripts, JSON metadata, OWNERS files) with zero Go/Ginkgo test files. Check is not applicable.
Microshift Test Compatibility	✅ Passed	No Ginkgo e2e tests added. PR adds CI infrastructure (Prow job config, workflow YAML, scripts) only. Custom check applies only to new Ginkgo tests, hence not applicable.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	The PR does not add any Ginkgo e2e tests. The eval-agents job is a CI workflow that runs promptfoo LLM evaluations, not Ginkgo e2e tests. SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility	✅ Passed	PR adds CI test infrastructure only (Prow jobs, step-registry, shell scripts). No deployment manifests, operators, or controllers with scheduling constraints introduced.
Ote Binary Stdout Contract	✅ Passed	PR adds Prow CI infrastructure and bash scripts, not OTE binaries. OTE Stdout Contract applies only to OpenShift test binaries, not auxiliary CI job scripts.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	This PR does not add any Ginkgo e2e tests. It only adds CI/CD pipeline configuration (YAML, shell scripts, metadata) to the openshift/release repository. The custom check does not apply.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-04-30T11:40:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift/hypershift/OWNERS~~ [enxebre]
~~ci-operator/step-registry/hypershift/OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enxebre · 2026-04-30T11:42:19Z

/hold
on openshift/hypershift#8382

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 16-24: Add a precheck before invoking "make eval-agents" that when
Vertex mode is enabled (e.g., VERTEX_MODE=true) verifies the
GOOGLE_APPLICATION_CREDENTIALS env var is set and the referenced file is
readable; if the var is empty or the file is not readable, print an error and
exit 1 so the script fails fast instead of letting make eval-agents (and targets
driven by EVAL_FOCUS/EVAL_VERBOSE) fail later.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: dee9528e-8165-4b24-9e92-faed98366fee

📥 Commits

Reviewing files that changed from the base of the PR and between 2f4532c and 1714d74.

📒 Files selected for processing (10)

ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
ci-operator/step-registry/hypershift/eval-agents/OWNERS
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

coderabbitai · 2026-04-30T11:44:44Z

+cd /go/src/github.com/openshift/hypershift
+
+if [ -n "${EVAL_FOCUS:-}" ]; then
+    echo "Focus: ${EVAL_FOCUS}"
+    make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1
+else
+    echo "Running all eval scenarios..."
+    make eval-agents EVAL_VERBOSE=1
+fi


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast on missing Vertex credentials before running evaluations.

When Vertex mode is enabled, missing/unreadable GOOGLE_APPLICATION_CREDENTIALS will fail later with less actionable output. Add an explicit precheck here.

Suggested patch

cd /go/src/github.com/openshift/hypershift +if [ "${CLAUDE_CODE_USE_VERTEX:-1}" = "1" ]; then + if [ -z "${GOOGLE_APPLICATION_CREDENTIALS:-}" ] || [ ! -r "${GOOGLE_APPLICATION_CREDENTIALS}" ]; then + echo "ERROR: GOOGLE_APPLICATION_CREDENTIALS is unset or unreadable: ${GOOGLE_APPLICATION_CREDENTIALS:-<unset>}" + exit 1 + fi +fi + if [ -n "${EVAL_FOCUS:-}" ]; then echo "Focus: ${EVAL_FOCUS}" make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cd /go/src/github.com/openshift/hypershift

if [ -n "${EVAL_FOCUS:-}" ]; then

echo "Focus: ${EVAL_FOCUS}"

make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1

else

echo "Running all eval scenarios..."

make eval-agents EVAL_VERBOSE=1

fi

cd /go/src/github.com/openshift/hypershift

if [ "${CLAUDE_CODE_USE_VERTEX:-1}" = "1" ]; then

if [ -z "${GOOGLE_APPLICATION_CREDENTIALS:-}" ] || [ ! -r "${GOOGLE_APPLICATION_CREDENTIALS}" ]; then

echo "ERROR: GOOGLE_APPLICATION_CREDENTIALS is unset or unreadable: ${GOOGLE_APPLICATION_CREDENTIALS:-<unset>}"

exit 1

fi

fi

if [ -n "${EVAL_FOCUS:-}" ]; then

echo "Focus: ${EVAL_FOCUS}"

make eval-agents EVAL_FOCUS="${EVAL_FOCUS}" EVAL_VERBOSE=1

else

echo "Running all eval scenarios..."

make eval-agents EVAL_VERBOSE=1

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh` around lines 16 - 24, Add a precheck before invoking "make eval-agents" that when Vertex mode is enabled (e.g., VERTEX_MODE=true) verifies the GOOGLE_APPLICATION_CREDENTIALS env var is set and the referenced file is readable; if the var is empty or the file is not readable, print an error and exit 1 so the script fails fast instead of letting make eval-agents (and targets driven by EVAL_FOCUS/EVAL_VERBOSE) fail later.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 8-16: The script builds MAKE_ARGS as a plain string so expanding
${MAKE_ARGS} unquoted when calling make can split values with spaces (e.g.,
EVAL_FOCUS) into multiple args; change MAKE_ARGS to an array: initialize
MAKE_ARGS=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" ), when EVAL_FOCUS is set
append MAKE_ARGS+=( "EVAL_FILTER=${EVAL_FOCUS}" ), and invoke make as: make
eval-agents "${MAKE_ARGS[@]}" so EVAL_FILTER is passed as a single argument;
update references to MAKE_ARGS, EVAL_FOCUS, EVAL_FILTER and the make eval-agents
call accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 15cbe2ee-52dd-49a8-b6dc-c9ad0fd4b8bd

📥 Commits

Reviewing files that changed from the base of the PR and between 1714d74 and b174a9d.

📒 Files selected for processing (10)

ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

✅ Files skipped from review due to trivial changes (2)

ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS

🚧 Files skipped from review as they are similar to previous changes (1)

ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

coderabbitai · 2026-05-05T11:54:37Z

+MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"
+
+if [ -n "${EVAL_FOCUS:-}" ]; then
+    echo "Filter: ${EVAL_FOCUS}"
+    MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}"
+fi
+
+echo "Running eval-agents with promptfoo..."
+make eval-agents ${MAKE_ARGS}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verifies shell splitting/globbing risk from unquoted scalar expansion. set -euo pipefail tmp="$(mktemp -d)" cd "$tmp" touch a b EVAL_FOCUS='*' MAKE_ARGS="EVAL_OUTPUT=/tmp/results.xml EVAL_FILTER=${EVAL_FOCUS}" echo "Args produced by unquoted expansion:" for arg in ${MAKE_ARGS}; do printf ' [%s]\n' "$arg"; done

Repository: openshift/release

Length of output: 149

🏁 Script executed:

# Check if file exists and read the relevant section git ls-files | grep -i "hypershift-eval-agents-run-commands.sh"

Repository: openshift/release

Length of output: 153

🏁 Script executed:

# Read the file and verify lines 8-16 file_path="ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh" if [ -f "$file_path" ]; then echo "=== File found. Lines 8-16: ===" sed -n '8,16p' "$file_path" else echo "File not found at: $file_path" fi

Repository: openshift/release

Length of output: 347

🏁 Script executed:

# More comprehensive test with spaces and special characters #!/bin/bash set -euo pipefail tmp="$(mktemp -d)" cd "$tmp" touch "file with spaces" "another*file" # Test 1: EVAL_FOCUS with spaces echo "=== Test 1: EVAL_FOCUS with spaces ===" EVAL_FOCUS='value with spaces' MAKE_ARGS="EVAL_OUTPUT=/tmp/results.xml EVAL_FILTER=${EVAL_FOCUS}" echo "Unquoted expansion (current approach):" for arg in ${MAKE_ARGS}; do printf ' [%s]\n' "$arg"; done echo "Expected: 2 args, Got: $(echo ${MAKE_ARGS} | wc -w) args" # Test 2: Array-based approach (proposed fix) echo "" echo "=== Test 2: Array-based approach (proposed fix) ===" MAKE_ARGS=("EVAL_OUTPUT=/tmp/results.xml") EVAL_FOCUS='value with spaces' MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}") echo "Quoted array expansion:" for arg in "${MAKE_ARGS[@]}"; do printf ' [%s]\n' "$arg"; done echo "Expected: 2 args, Got: ${`#MAKE_ARGS`[@]} args"

Repository: openshift/release

Length of output: 418

Use an argument array for make vars to avoid shell word splitting.

On line 16, ${MAKE_ARGS} is expanded unquoted. If EVAL_FOCUS contains spaces, EVAL_FILTER gets split into multiple arguments before reaching make. For example, EVAL_FOCUS='value with spaces' produces EVAL_FILTER=value with spaces as three separate arguments instead of one.

Proposed patch

-MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" +MAKE_ARGS=("EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml") if [ -n "${EVAL_FOCUS:-}" ]; then echo "Filter: ${EVAL_FOCUS}" - MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}" + MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}") fi echo "Running eval-agents with promptfoo..." -make eval-agents ${MAKE_ARGS} +make eval-agents "${MAKE_ARGS[@]}"

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 16-16: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh` around lines 8 - 16, The script builds MAKE_ARGS as a plain string so expanding ${MAKE_ARGS} unquoted when calling make can split values with spaces (e.g., EVAL_FOCUS) into multiple args; change MAKE_ARGS to an array: initialize MAKE_ARGS=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" ), when EVAL_FOCUS is set append MAKE_ARGS+=( "EVAL_FILTER=${EVAL_FOCUS}" ), and invoke make as: make eval-agents "${MAKE_ARGS[@]}" so EVAL_FILTER is passed as a single argument; update references to MAKE_ARGS, EVAL_FOCUS, EVAL_FILTER and the make eval-agents call accordingly.

coderabbitai

♻️ Duplicate comments (2)

ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh (2)

8-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use an argument array for make variables to avoid shell word splitting.

This issue was previously identified. The unquoted expansion ${MAKE_ARGS} on line 16 will split EVAL_FOCUS values containing spaces into multiple arguments. For example, EVAL_FOCUS="agent validation" produces three separate arguments (EVAL_FILTER=agent, validation) instead of EVAL_FILTER="agent validation".

🔧 Array-based fix

-MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"
+MAKE_ARGS=("EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml")
 
 if [ -n "${EVAL_FOCUS:-}" ]; then
     echo "Filter: ${EVAL_FOCUS}"
-    MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}"
+    MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}")
 fi
 
 echo "Running eval-agents with promptfoo..."
-make eval-agents ${MAKE_ARGS}
+make eval-agents "${MAKE_ARGS[@]}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 8 - 16, The make invocation uses an unquoted string variable
MAKE_ARGS causing word-splitting of values like EVAL_FOCUS; replace the string
accumulation with an array (e.g., MAKE_ARGS_ARRAY) and push elements such as
"EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" and, when EVAL_FOCUS is set,
"EVAL_FILTER=${EVAL_FOCUS}" into that array, then invoke make with make
eval-agents "${MAKE_ARGS_ARRAY[@]}" so space-containing values in EVAL_FOCUS are
passed as a single argument; update references to MAKE_ARGS to use the new array
name and ensure expansions are quoted where used.

6-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast on missing Vertex credentials before running evaluations.

This issue was previously identified. When Vertex mode is enabled (likely the default), missing or unreadable GOOGLE_APPLICATION_CREDENTIALS will cause make eval-agents to fail later with less actionable output. Adding an explicit precheck after line 6 would improve the debugging experience.

🛡️ Suggested precheck

 cd /go/src/github.com/openshift/hypershift
 
+if [ "${CLAUDE_CODE_USE_VERTEX:-1}" = "1" ]; then
+    if [ -z "${GOOGLE_APPLICATION_CREDENTIALS:-}" ] || [ ! -r "${GOOGLE_APPLICATION_CREDENTIALS}" ]; then
+        echo "ERROR: GOOGLE_APPLICATION_CREDENTIALS is unset or unreadable: ${GOOGLE_APPLICATION_CREDENTIALS:-<unset>}"
+        exit 1
+    fi
+fi
+
 MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 6 - 7, Add a fail-fast precheck in
hypershift-eval-agents-run-commands.sh to validate
GOOGLE_APPLICATION_CREDENTIALS before invoking make eval-agents: detect if
Vertex mode is enabled (e.g., via the existing env flag or by checking
$VERTEX_MODE/$USE_VERTEX variable) and then verify the
GOOGLE_APPLICATION_CREDENTIALS env var is set and points to a readable file; if
not, print a clear error to stderr and exit non‑zero so the job fails early with
actionable output.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 8-16: The make invocation uses an unquoted string variable
MAKE_ARGS causing word-splitting of values like EVAL_FOCUS; replace the string
accumulation with an array (e.g., MAKE_ARGS_ARRAY) and push elements such as
"EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" and, when EVAL_FOCUS is set,
"EVAL_FILTER=${EVAL_FOCUS}" into that array, then invoke make with make
eval-agents "${MAKE_ARGS_ARRAY[@]}" so space-containing values in EVAL_FOCUS are
passed as a single argument; update references to MAKE_ARGS to use the new array
name and ensure expansions are quoted where used.
- Around line 6-7: Add a fail-fast precheck in
hypershift-eval-agents-run-commands.sh to validate
GOOGLE_APPLICATION_CREDENTIALS before invoking make eval-agents: detect if
Vertex mode is enabled (e.g., via the existing env flag or by checking
$VERTEX_MODE/$USE_VERTEX variable) and then verify the
GOOGLE_APPLICATION_CREDENTIALS env var is set and points to a readable file; if
not, print a clear error to stderr and exit non‑zero so the job fails early with
actionable output.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c07bd22c-54f6-4702-9903-34a9d785dc0a

📥 Commits

Reviewing files that changed from the base of the PR and between b174a9d and f8ff049.

📒 Files selected for processing (12)

ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
ci-operator/step-registry/hypershift/eval-agents/OWNERS
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

✅ Files skipped from review due to trivial changes (10)

ci-operator/step-registry/hypershift/eval-agents/OWNERS
ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml

Add a promptfoo-based eval workflow that tests Claude Code agent definitions using LLM judge assertions. The setup step installs Node.js/npm and the run step invokes make eval-agents via npx promptfoo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-merge-bot · 2026-05-05T12:12:44Z

[REHEARSALNOTIFIER]
@enxebre: no rehearsable tests are affected by this change

Note: If this PR includes changes to step registry files (ci-operator/step-registry/) and you expected jobs to be found, try rebasing your PR onto the base branch. This helps pj-rehearse accurately detect changes when the base branch has moved forward.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

enxebre · 2026-05-05T12:15:36Z

/hold
on openshift/hypershift#8419

coderabbitai

♻️ Duplicate comments (1)

ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh (1)

8-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use an argument array to prevent make arg splitting.

On Line 16, unquoted ${MAKE_ARGS} can split/glob values introduced on Line 12 (for example, EVAL_FOCUS='foo bar'), so EVAL_FILTER may be passed incorrectly.

Proposed patch

-MAKE_ARGS="EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml"
+MAKE_ARGS=("EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml")
 
 if [ -n "${EVAL_FOCUS:-}" ]; then
     echo "Filter: ${EVAL_FOCUS}"
-    MAKE_ARGS="${MAKE_ARGS} EVAL_FILTER=${EVAL_FOCUS}"
+    MAKE_ARGS+=("EVAL_FILTER=${EVAL_FOCUS}")
 fi
 
 echo "Running eval-agents with promptfoo..."
-make eval-agents ${MAKE_ARGS}
+make eval-agents "${MAKE_ARGS[@]}"

#!/bin/bash
# Read-only verification for arg-splitting risk in current implementation.
set -euo pipefail

file="$(fd -p 'hypershift-eval-agents-run-commands.sh' | head -n1)"
echo "Inspecting: ${file}"
sed -n '8,18p' "${file}"

tmp="$(mktemp -d)"
cd "${tmp}"
touch a b

EVAL_FOCUS='value with spaces'
MAKE_ARGS="EVAL_OUTPUT=/tmp/results.xml EVAL_FILTER=${EVAL_FOCUS}"

echo "Unquoted expansion arguments:"
for arg in ${MAKE_ARGS}; do
  printf '  [%s]\n' "${arg}"
done

echo "Expected behavior after array fix: EVAL_FILTER stays one argument."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`
around lines 8 - 16, The current script builds a space-joined string in
MAKE_ARGS and then expands it unquoted in the make call, which will split/glob
values like EVAL_FOCUS='foo bar'; change to build an array (e.g.,
MAKE_ARGS_ARR=() ; MAKE_ARGS_ARR+=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" )
and if EVAL_FOCUS is set then MAKE_ARGS_ARR+=( "EVAL_FILTER=${EVAL_FOCUS}" ) )
and pass it to make as: make eval-agents "${MAKE_ARGS_ARR[@]}", updating the
MAKE_ARGS variable usage and the final make invocation to use the array form to
preserve arguments with spaces.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh`:
- Around line 8-16: The current script builds a space-joined string in MAKE_ARGS
and then expands it unquoted in the make call, which will split/glob values like
EVAL_FOCUS='foo bar'; change to build an array (e.g., MAKE_ARGS_ARR=() ;
MAKE_ARGS_ARR+=( "EVAL_OUTPUT=${ARTIFACT_DIR}/results.xml" ) and if EVAL_FOCUS
is set then MAKE_ARGS_ARR+=( "EVAL_FILTER=${EVAL_FOCUS}" ) ) and pass it to make
as: make eval-agents "${MAKE_ARGS_ARR[@]}", updating the MAKE_ARGS variable
usage and the final make invocation to use the array form to preserve arguments
with spaces.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: cbf05b7f-1299-4de9-93c7-f01bf4c9396b

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea6695 and b100381.

📒 Files selected for processing (12)

ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
ci-operator/step-registry/hypershift/eval-agents/OWNERS
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
ci-operator/step-registry/hypershift/eval-agents/run/OWNERS
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-commands.sh
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

✅ Files skipped from review due to trivial changes (6)

ci-operator/step-registry/hypershift/eval-agents/setup/OWNERS
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.yaml
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.metadata.json
ci-operator/step-registry/hypershift/eval-agents/run/hypershift-eval-agents-run-ref.yaml
ci-operator/step-registry/hypershift/eval-agents/hypershift-eval-agents-workflow.metadata.json

🚧 Files skipped from review as they are similar to previous changes (2)

ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-commands.sh
ci-operator/step-registry/hypershift/eval-agents/setup/hypershift-eval-agents-setup-ref.yaml

openshift-merge-bot · 2026-05-05T12:16:25Z

[REHEARSALNOTIFIER]
@enxebre: no rehearsable tests are affected by this change

Note: If this PR includes changes to step registry files (ci-operator/step-registry/) and you expected jobs to be found, try rebasing your PR onto the base branch. This helps pj-rehearse accurately detect changes when the base branch has moved forward.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

openshift-ci · 2026-05-05T12:42:46Z

@enxebre: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 30, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2026

openshift-ci Bot requested review from bryan-cox and jparrill April 30, 2026 11:40

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2026

openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Apr 30, 2026

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

enxebre force-pushed the hypershift-eval-agents branch from b174a9d to f8ff049 Compare May 5, 2026 11:50

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

enxebre force-pushed the hypershift-eval-agents branch 3 times, most recently from 5147370 to b100381 Compare May 5, 2026 12:09

enxebre force-pushed the hypershift-eval-agents branch from b100381 to 52bfa65 Compare May 5, 2026 12:12

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Conversation

enxebre commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

openshift-ci-robot commented Apr 30, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Depends on

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci Bot commented Apr 30, 2026

Uh oh!

enxebre commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-merge-bot Bot commented May 5, 2026

Uh oh!

enxebre commented May 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-merge-bot Bot commented May 5, 2026

Uh oh!

openshift-ci Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enxebre commented Apr 30, 2026 •

edited

Loading

openshift-ci-robot commented Apr 30, 2026 •

edited by atlassian Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading