Skip to content

Kestrel Operator Onprem#17

Open
ramanv0 wants to merge 6 commits into
mainfrom
kestrel-operator-onprem-branch
Open

Kestrel Operator Onprem#17
ramanv0 wants to merge 6 commits into
mainfrom
kestrel-operator-onprem-branch

Conversation

@ramanv0
Copy link
Copy Markdown
Collaborator

@ramanv0 ramanv0 commented Jan 27, 2026

Implement support for Kestrel Operators deployed on-prem.

Summary by CodeRabbit

  • New Features

    • Configurable option to skip TLS certificate verification (Helm value & env var; default: disabled).
    • gRPC message size limit increased to 10MB.
  • Bug Fixes

    • Skip JSON validation for log, exec, and proxy endpoints to avoid unwanted parsing.
  • Chores

    • CI workflows accept optional image and chart name inputs.
    • Container tooling install now pins a specific kubectl version for stability.

✏️ Tip: You can customize this high-level summary in your review settings.

Evanj80 and others added 3 commits November 5, 2025 11:18
…erify for on-prem

Server changes:
- Add GetHelmChartVersionForChart() to fetch versions for specific charts
- handleOnboardCluster returns chart_name based on ONPREM_MODE
- Remove dead code: fetchLatestHelmVersion, helmCache, ServerCertificateBundle

Operator changes:
- Add TLS_INSECURE_SKIP_VERIFY env var support in deployment.yaml
- Add server.tls.insecureSkipVerify option in values.yaml (default: false)
- stream_client.go reads TLS_INSECURE_SKIP_VERIFY and applies to TLS config

Frontend changes:
- Add chartName to TokenModalData interface and API response type
- Use dynamic chart name in helm install commands
- Generate port: 443 and insecureSkipVerify: true for on-prem operators

This enables on-prem operators to connect via NLB without server cert
hostname verification, while keeping TLS encryption enabled.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

Adds a Helm value and Deployment env var to control TLS insecure-skip, wires that through the StreamClient tls.Config, increases gRPC default message sizes to 10MB, and skips JSON unmarshalling for log/exec/proxy k8s executor endpoints.

Changes

Cohort / File(s) Summary
Helm TLS configuration
helm/kestrel-operator/templates/deployment.yaml, helm/kestrel-operator/values.yaml
Add TLS_INSECURE_SKIP_VERIFY env var in Deployment sourced from values.server.tls.insecureSkipVerify (default: false); add server.tls.insecureSkipVerify to values.
Client TLS & gRPC
pkg/client/stream_client.go
Add InsecureSkipVerify bool to ServerConfig; set tls.Config.InsecureSkipVerify from config; add gRPC default options MaxCallRecvMsgSize/MaxCallSendMsgSize = 10MB; update related logging to reflect insecure vs mTLS modes.
k8s API executor JSON handling
pkg/k8s_api/k8s_api_executor.go
Skip JSON unmarshalling for log, exec, and proxy endpoints via endpoint-type checks; include is_log_endpoint flag in success logs.
CI workflow naming inputs
.github/workflows/build-image.yml, .github/workflows/push-helm.yml
Add workflow_dispatch inputs (image_name, chart_name); update image/chart name resolution and packaging/push steps to use dynamic names.
Tooling pinning
Dockerfile
Pin kubectl install to specific version v1.31.0 instead of querying latest stable.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant HelmValues as Helm Values
participant Deployment as Deployment Pod
participant StreamClient as Stream Client
participant TLS as tls.Config
participant GRPC as gRPC Server

HelmValues->>Deployment: render env `TLS_INSECURE_SKIP_VERIFY`
Deployment->>StreamClient: container starts, read env
StreamClient->>TLS: build tls.Config (InsecureSkipVerify = env)
StreamClient->>GRPC: dial with TLS and 10MB call sizes
GRPC-->>StreamClient: respond to RPCs
StreamClient->>Deployment: log connection mode (insecure vs mTLS)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I tucked a flag beneath the pod's hat,

so TLS can nap while packets chat.
I widened lanes for messages to stream,
and told some logs to skip their JSON dream.
A tiny hop — the operator's gleam.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Kestrel Operator Onprem' is partially related to the changeset. While the PR does implement on-prem support features, the title is overly vague and doesn't clearly summarize the primary technical changes made (TLS configuration, gRPC message size limits, endpoint-type validation, and workflow customization). Consider a more specific title that captures the main technical accomplishments, such as 'Add on-prem TLS configuration and gRPC message size support' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/k8s_api/k8s_api_executor.go`:
- Around line 99-118: The current substring checks
(isLogEndpoint/isExecEndpoint/isProxyEndpoint) on apiPath are too broad and can
skip JSON validation erroneously; replace them with a helper (e.g.,
isNonJSONEndpoint(apiPath, suffix)) that strips query params, checks path
segments and suffixes (use strings.Split(apiPath, "?")[0], strings.HasSuffix for
exact suffixes, and for proxy allow "/proxy/" or suffix "/proxy" matches) and
then use that helper for "/log", "/exec", and "/proxy" when deciding whether to
run json.Unmarshal on resultBytes; update the condition around json.Unmarshal,
keep setting result fields and e.Logger.Error behavior unchanged.
🧹 Nitpick comments (3)
pkg/k8s_api/k8s_api_executor.go (1)

125-128: Logging is incomplete for debugging non-JSON endpoints.

Only is_log_endpoint is logged, but is_exec_endpoint and is_proxy_endpoint are also checked. For debugging purposes, consider logging which specific non-JSON endpoint type was matched.

♻️ Suggested improvement
 	e.Logger.Debug("API call completed successfully",
 		zap.String("api_path", apiPath),
-		zap.Int("response_size", len(resultBytes)),
-		zap.Bool("is_log_endpoint", isLogEndpoint))
+		zap.Int("response_size", len(resultBytes)),
+		zap.Bool("is_non_json_endpoint", isLogEndpoint || isExecEndpoint || isProxyEndpoint))

Or, if more granularity is needed:

 	e.Logger.Debug("API call completed successfully",
 		zap.String("api_path", apiPath),
-		zap.Int("response_size", len(resultBytes)),
-		zap.Bool("is_log_endpoint", isLogEndpoint))
+		zap.Int("response_size", len(resultBytes)),
+		zap.Bool("skip_json_validation", isLogEndpoint || isExecEndpoint || isProxyEndpoint))
pkg/client/stream_client.go (1)

364-368: Consider extracting the message size as a named constant.

The 10MB limit is reasonable for streaming large payloads like pod logs. For better maintainability and discoverability, consider defining this as a named constant alongside other constants in this file.

♻️ Suggested refactor
 const (
 	runtimeSecretName    = "kestrel-operator-jwt-runtime"
 	tokenRenewalInterval = 3 * time.Hour // Based on 24 hour JWT TTL

 	defaultHubbleRelayNamespace     = "kube-system"
 	dataplaneV2HubbleRelayNamespace = "gke-managed-dpv2-observability"

 	// Constants for liveness probe
 	minEOFErrorsForLiveness  = 3                // Minimum EOF errors for liveness failure
 	minUnhealthyForLiveness  = 1 * time.Minute  // Minimum unhealthy duration for liveness failure
 	maxHealthyGapForLiveness = 30 * time.Second // Max gap between EOFs for liveness failure
+
+	// gRPC message size limit (10MB for large payloads like pod logs)
+	maxGRPCMessageSize = 10 * 1024 * 1024
 )

Then use maxGRPCMessageSize in the dial options.

helm/kestrel-operator/templates/deployment.yaml (1)

35-38: Add nil-safety for nested value access.

If a user has an older values.yaml without the tls section, accessing .Values.server.tls.insecureSkipVerify could fail. Consider using a more defensive template pattern.

♻️ Suggested fix for nil-safety
             # TLS configuration
             - name: TLS_INSECURE_SKIP_VERIFY
-              value: "{{ .Values.server.tls.insecureSkipVerify | default false }}"
+              value: "{{ ((.Values.server).tls).insecureSkipVerify | default false }}"

This uses Helm's parenthetical nil-safe access pattern to prevent template errors when intermediate keys are missing.

Comment on lines +99 to 118
// Check if this is a log endpoint or other non-JSON endpoint
// Pod logs return plain text, not JSON
isLogEndpoint := strings.Contains(apiPath, "/log")
isExecEndpoint := strings.Contains(apiPath, "/exec")
isProxyEndpoint := strings.Contains(apiPath, "/proxy")

// Only validate JSON for non-log/exec/proxy endpoints
if !isLogEndpoint && !isExecEndpoint && !isProxyEndpoint {
var jsonValidation interface{}
if err := json.Unmarshal(resultBytes, &jsonValidation); err != nil {
result.Success = false
result.ErrorMessage = fmt.Sprintf("Invalid JSON response: %v", err)
result.StatusCode = http.StatusInternalServerError

e.Logger.Error("Invalid JSON response from Kubernetes API",
zap.String("api_path", apiPath),
zap.Error(err))
return result
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Substring matching for endpoint detection is too broad and may cause false positives.

Using strings.Contains(apiPath, "/log") will match any path containing that substring, including resources with "log" in their name (e.g., /api/v1/namespaces/logging/pods/my-pod, /apis/apps/v1/deployments/my-logger-app). This could incorrectly skip JSON validation for standard JSON-returning endpoints.

Consider using more precise path matching that checks for these as terminal path segments:

🔧 Proposed fix using suffix/segment matching
-	// Check if this is a log endpoint or other non-JSON endpoint
-	// Pod logs return plain text, not JSON
-	isLogEndpoint := strings.Contains(apiPath, "/log")
-	isExecEndpoint := strings.Contains(apiPath, "/exec")
-	isProxyEndpoint := strings.Contains(apiPath, "/proxy")
+	// Check if this is a log endpoint or other non-JSON endpoint
+	// Pod logs return plain text, not JSON
+	// Match paths ending with /log, /exec, /proxy or having them as segments before query params
+	isLogEndpoint := isNonJSONEndpoint(apiPath, "/log")
+	isExecEndpoint := isNonJSONEndpoint(apiPath, "/exec")
+	isProxyEndpoint := isNonJSONEndpoint(apiPath, "/proxy")

Add a helper function:

// isNonJSONEndpoint checks if the API path ends with the given suffix
// or has it as a path segment (possibly followed by query params or sub-paths for proxy)
func isNonJSONEndpoint(apiPath, suffix string) bool {
	// Remove query params for checking
	path := strings.Split(apiPath, "?")[0]
	
	// Check if path ends with the suffix
	if strings.HasSuffix(path, suffix) {
		return true
	}
	
	// For proxy endpoints, check if suffix appears as a path segment (proxy can have sub-paths)
	if suffix == "/proxy" && strings.Contains(path, "/proxy/") {
		return true
	}
	
	return false
}
🤖 Prompt for AI Agents
In `@pkg/k8s_api/k8s_api_executor.go` around lines 99 - 118, The current substring
checks (isLogEndpoint/isExecEndpoint/isProxyEndpoint) on apiPath are too broad
and can skip JSON validation erroneously; replace them with a helper (e.g.,
isNonJSONEndpoint(apiPath, suffix)) that strips query params, checks path
segments and suffixes (use strings.Split(apiPath, "?")[0], strings.HasSuffix for
exact suffixes, and for proxy allow "/proxy/" or suffix "/proxy" matches) and
then use that helper for "/log", "/exec", and "/proxy" when deciding whether to
run json.Unmarshal on resultBytes; update the condition around json.Unmarshal,
keep setting result fields and e.Logger.Error behavior unchanged.

Allows building kestrel-operator-onprem image and chart by specifying:
- chart_name: kestrel-operator-onprem (for push-helm.yml)
- image_name: kestrel-operator-onprem (for build-image.yml)

Default values remain kestrel-operator for backward compatibility.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/build-image.yml:
- Line 50: The images path uses github.repository_owner raw which can contain
uppercase letters; change it to a lowercase owner to satisfy GHCR. Either inline
lowercase the owner in the images expression (use the toLower(...) expression
around github.repository_owner) or add a prior step that sets an env var
REPO_OWNER to a lowercased github.repository_owner (e.g., using tr) and then use
${ env.REPO_OWNER } in the images value; update the images line that references
env.REGISTRY and github.event.inputs.image_name || 'kestrel-operator' to use the
lowercased owner reference instead.

uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ github.repository }}
images: ${{ env.REGISTRY }}/${{ github.repository_owner }}/${{ github.event.inputs.image_name || 'kestrel-operator' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Lowercase github.repository_owner for GHCR compatibility.

GHCR requires lowercase image paths. Unlike push-helm.yml (line 61) which explicitly lowercases REPO_OWNER, this workflow uses github.repository_owner directly. If the owner contains uppercase characters (e.g., "KestrelAI"), this could cause push failures.

🔧 Proposed fix

You can lowercase the owner inline. One approach:

-        images: ${{ env.REGISTRY }}/${{ github.repository_owner }}/${{ github.event.inputs.image_name || 'kestrel-operator' }}
+        images: ${{ env.REGISTRY }}/${{ format('{0}', github.repository_owner) | lower }}/${{ github.event.inputs.image_name || 'kestrel-operator' }}

Alternatively, add a step to set the lowercased owner as an environment variable (similar to push-helm.yml):

    - name: Set lowercase owner
      run: echo "REPO_OWNER=$(echo '${{ github.repository_owner }}' | tr '[:upper:]' '[:lower:]')" >> $GITHUB_ENV

    - name: Docker meta for operator
      id: meta-operator
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.REPO_OWNER }}/${{ github.event.inputs.image_name || 'kestrel-operator' }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
images: ${{ env.REGISTRY }}/${{ github.repository_owner }}/${{ github.event.inputs.image_name || 'kestrel-operator' }}
images: ${{ env.REGISTRY }}/${{ format('{0}', github.repository_owner) | lower }}/${{ github.event.inputs.image_name || 'kestrel-operator' }}
🤖 Prompt for AI Agents
In @.github/workflows/build-image.yml at line 50, The images path uses
github.repository_owner raw which can contain uppercase letters; change it to a
lowercase owner to satisfy GHCR. Either inline lowercase the owner in the images
expression (use the toLower(...) expression around github.repository_owner) or
add a prior step that sets an env var REPO_OWNER to a lowercased
github.repository_owner (e.g., using tr) and then use ${ env.REPO_OWNER } in the
images value; update the images line that references env.REGISTRY and
github.event.inputs.image_name || 'kestrel-operator' to use the lowercased owner
reference instead.

The nested curl command to fetch the latest stable version was failing
in GitHub Actions. Pinning to a specific version for reliability.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@Dockerfile`:
- Around line 25-30: The RUN block that downloads the pinned kubectl binary
(uses ARCH variable, curl to
"https://dl.k8s.io/release/v1.31.0/bin/linux/${ARCH}/kubectl" and moves kubectl
to /usr/local/bin/) lacks checksum verification; update the RUN sequence to also
download the matching .sha256 (e.g., curl -LO
"https://dl.k8s.io/release/v1.31.0/bin/linux/${ARCH}/kubectl.sha256"), verify
the binary with sha256sum -c or by comparing the file's sha256 to the downloaded
value before chmod +x and mv, and fail the build if the check does not pass so
kubectl is only installed when the checksum matches.

Comment thread Dockerfile
Comment on lines +25 to 30
# Install kubectl (multi-arch) - pinned to v1.31.0 for reliability
RUN ARCH=$(uname -m) && \
if [ "$ARCH" = "x86_64" ]; then ARCH="amd64"; elif [ "$ARCH" = "aarch64" ]; then ARCH="arm64"; fi && \
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/${ARCH}/kubectl" && \
curl -LO "https://dl.k8s.io/release/v1.31.0/bin/linux/${ARCH}/kubectl" && \
chmod +x kubectl && \
mv kubectl /usr/local/bin/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Kubernetes kubectl official installation checksum verification v1.31.0 site:kubernetes.io

💡 Result:

Official Kubernetes docs show verifying a kubectl download by fetching the matching .sha256 file and checking it locally. Here are the equivalent commands for kubectl v1.31.0 (adjust amd64/arm64 as needed):

Linux (example: amd64)

curl -LO "https://dl.k8s.io/release/v1.31.0/bin/linux/amd64/kubectl"
curl -LO "https://dl.k8s.io/release/v1.31.0/bin/linux/amd64/kubectl.sha256"
echo "$(cat kubectl.sha256)  kubectl" | sha256sum --check

[1]

macOS (example: Intel amd64)

curl -LO "https://dl.k8s.io/release/v1.31.0/bin/darwin/amd64/kubectl"
curl -LO "https://dl.k8s.io/release/v1.31.0/bin/darwin/amd64/kubectl.sha256"
echo "$(cat kubectl.sha256)  kubectl" | shasum -a 256 --check

[2]

Windows (example: amd64)

curl.exe -LO "https://dl.k8s.io/release/v1.31.0/bin/windows/amd64/kubectl.exe"
curl.exe -LO "https://dl.k8s.io/v1.31.0/bin/windows/amd64/kubectl.exe.sha256"

# Command Prompt manual compare:
CertUtil -hashfile kubectl.exe SHA256
type kubectl.exe.sha256

# PowerShell boolean check:
$(Get-FileHash -Algorithm SHA256 .\kubectl.exe).Hash -eq $(Get-Content .\kubectl.exe.sha256)

[3]

Sources: [1] Linux install / checksum verification docs, [2] macOS install / checksum verification docs, [3] Windows install / checksum verification docs.


🌐 Web query:

dl.k8s.io release v1.31.0 kubectl.sha256 checksum download

💡 Result:

kubectl.sha256 is published per OS/arch under the dl.k8s.io/release/v1.31.0/bin/... path. Download it like this:

# Linux x86_64
curl -LO "https://dl.k8s.io/release/v1.31.0/bin/linux/amd64/kubectl.sha256"

# macOS Intel
curl -LO "https://dl.k8s.io/release/v1.31.0/bin/darwin/amd64/kubectl.sha256"

# macOS Apple Silicon
curl -LO "https://dl.k8s.io/release/v1.31.0/bin/darwin/arm64/kubectl.sha256"

(These URLs follow the official Kubernetes install docs’ download pattern for kubectl.sha256.) [1], [2]

Sources: [1] [2]


Add checksum verification for the pinned kubectl binary.

Pinning helps reproducibility, but downloading without integrity verification leaves a supply-chain gap. Fetch and verify the official .sha256 before installing—this follows Kubernetes' official installation guidance and is supported for v1.31.0.

🔐 Suggested patch (checksum verification)
-RUN ARCH=$(uname -m) && \
-    if [ "$ARCH" = "x86_64" ]; then ARCH="amd64"; elif [ "$ARCH" = "aarch64" ]; then ARCH="arm64"; fi && \
-    curl -LO "https://dl.k8s.io/release/v1.31.0/bin/linux/${ARCH}/kubectl" && \
-    chmod +x kubectl && \
-    mv kubectl /usr/local/bin/
+RUN ARCH=$(uname -m) && \
+    if [ "$ARCH" = "x86_64" ]; then ARCH="amd64"; elif [ "$ARCH" = "aarch64" ]; then ARCH="arm64"; fi && \
+    KUBECTL_VERSION="v1.31.0" && \
+    curl -fL -O "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/${ARCH}/kubectl" && \
+    curl -fL -O "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/${ARCH}/kubectl.sha256" && \
+    echo "$(cat kubectl.sha256)  kubectl" | sha256sum -c - && \
+    chmod +x kubectl && \
+    mv kubectl /usr/local/bin/ && \
+    rm kubectl.sha256
🤖 Prompt for AI Agents
In `@Dockerfile` around lines 25 - 30, The RUN block that downloads the pinned
kubectl binary (uses ARCH variable, curl to
"https://dl.k8s.io/release/v1.31.0/bin/linux/${ARCH}/kubectl" and moves kubectl
to /usr/local/bin/) lacks checksum verification; update the RUN sequence to also
download the matching .sha256 (e.g., curl -LO
"https://dl.k8s.io/release/v1.31.0/bin/linux/${ARCH}/kubectl.sha256"), verify
the binary with sha256sum -c or by comparing the file's sha256 to the downloaded
value before chmod +x and mv, and fail the build if the check does not pass so
kubectl is only installed when the checksum matches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants