Conversation
The chart has no way to inject environment variables into the containers
it manages. Users who need to set chart-wide env — most commonly
HTTP_PROXY / HTTPS_PROXY / NO_PROXY for clusters behind a corporate
egress proxy, but also any other binary-level configuration — have to
fork the chart or maintain external overlays that patch every
Deployment, Job, DaemonSet, and sidecar by hand.
Implementation Approach:
The fix is a generic `defaults.env` value: a list of EnvVar entries that
gets merged into every chart-managed container's env block. Every binary
the chart deploys honors `http.ProxyFromEnvironment`, so the standard
proxy env vars are sufficient for the proxy use case, and the same field
covers any other chart-wide env need without a dedicated API per use case.
A new `cloudzero-agent.generateEnv` helper in `helm/templates/_helpers.tpl`
performs the merge. It follows the same precedence pattern as the existing
`generateLabels` and `generateAnnotations` helpers: a list of sources is
merged by `name` with last-wins-by-name semantics, the result is emitted
as an `env:` block, and nothing renders when the merged result is empty.
First-seen wins for list ordering; overrides keep the entry's original
position. The `generateLabels`/`generateAnnotations` mechanism merges
dicts via `mergeOverwrite`, while `generateEnv` merges over a list keyed
by `name` — the surfaces are analogous at the precedence level but not
literally the same.
Precedence (lowest → highest priority) inside every container's call:
1. `.Values.defaults.env` — chart-wide user override; lowest priority.
2. Component-specific user env (e.g. `.Values.server.env` on the
Prometheus container) — wins over the chart-wide override.
3. Chart-emitted helper output (e.g. `validatorEnv`).
4. Chart-emitted hardcoded literals (`SERVER_PORT`, `NODE_NAME`,
`HOSTNAME` fieldRefs) — highest priority. These are load-bearing
for chart correctness and must not be overridable from values.
To override a chart-emitted entry on a single component (for example, to
tweak `K8S_NAMESPACE` on the Prometheus server only), use that
component's env value rather than `defaults.env`.
Functional Requirements:
1. A user must be able to set environment variables on every container
the chart manages with a single values change.
Added `defaults.env` (typed as a list of K8s EnvVar entries) to
`helm/values.yaml` and `helm/values.schema.yaml`. Every container in
`helm/templates/*.yaml` now calls `generateEnv` with
`.Values.defaults.env` as the LOWEST-priority source — aggregator
collector and shipper, agent prometheus-server, agent alloy, both
validator init containers, the configmap-reload sidecars, the agent
daemonset's config-subst init and prometheus-server, the webhook
server, the backfill init-scrape, the config-loader run-validator,
helmless, and init-cert. The kube-state-metrics subchart is
intentionally untouched (no outbound traffic; users can set
`kubeStateMetrics.env` directly if needed).
2. Chart-emitted env entries (validatorEnv, hardcoded SERVER_PORT /
NODE_NAME / HOSTNAME) must continue to render and win over
`defaults.env` on name collision, and `.Values.server.env` must
continue to apply to the Prometheus container as a middle-tier
override.
`generateEnv` takes a list of env-entry lists and merges them in
order, with later sources overriding earlier sources on `name`
collision. Each call site places `.Values.defaults.env` first
(lowest priority), then `.Values.server.env` where applicable
(middle), then chart-emitted helpers / literals last (highest).
3. The values schema must enforce the shape of `defaults.env` at chart-
render time, and the enforcement must be regression-tested.
`defaults.env` references the K8s `io.k8s.api.core.v1.EnvVar` `$ref`
in `helm/values.schema.yaml`, so malformed entries (wrong type,
missing `name`, bad `valueFrom`) are rejected before the template
engine sees them. Added `tests/helm/schema/defaults.env.valid.pass.yaml`
and `tests/helm/schema/defaults.env.invalid.fail.yaml` so
`make helm-test-schema` exercises a typical value list (with `value`,
`secretKeyRef`, and `fieldRef` shapes) against the valid path and an
`env: "not-an-array"` string against the fail path.
4. The values.yaml comment for `defaults.env` must accurately describe
what the field does and what NO_PROXY needs to cover.
The comment states the precedence rule (defaults.env is lowest;
chart-emitted entries win), explains how to override a chart-emitted
entry on a specific component, and includes a worked NO_PROXY example
with the explanation placed above the entry. It documents the
cluster-specific entries the user must supply (pod CIDR, service
CIDR, kube-apiserver IP) on top of the standard in-cluster Service
DNS suffixes and cloud-provider instance metadata IPs.
5. The helper's merge contract must be unit-tested.
`helm/tests/defaults_env_test.yaml` covers propagation to every
chart-managed container, valueFrom preservation as a deep dict, the
empty-input case (no `env:` block rendered), the precedence rules
(defaults.env does NOT override chart-emitted SERVER_PORT or
validatorEnv; `.Values.server.env` DOES override defaults.env on the
Prometheus container; first-seen position is preserved when a later
source overrides by name), and the backfill CronJob's
`spec.jobTemplate.spec.template.spec.containers[0].env` path on
documentIndex 0 alongside the Job's flat path on documentIndex 1.
Validation:
- `make helm-test` clean (569/569 helm-unittest cases pass plus the new
schema validation tests; `helm-test-template` regenerates the goldens
with the new env blocks; `helm-lint` clean).
- Deployed to a GKE cluster with a default-deny egress NetworkPolicy
applied to the target namespace, using an HTTP proxy as the only
allowed egress route. Confirmed every chart-managed container picks up
the env vars from `defaults.env` via a `kubectl get pod -o jsonpath`
sweep across all chart containers and both validator init containers.
Confirmed the proxy's access log shows only the intended outbound
destination (api.cloudzero.com) and no in-cluster Service hostnames or
cloud metadata endpoints. Confirmed the config-loader job reaches
api.cloudzero.com and returns HTTP 403 from the upstream (the expected
result for a fake API key).
- Two pre-existing inconsistencies the refactor incidentally fixes:
`validatorEnv` is now always emitted on the Prometheus server containers
(was previously skipped when `.Values.server.env` was unset, leaving the
validator lifecycle hooks without `K8S_NAMESPACE` / `K8S_POD_NAME`); and
`.Values.server.env` can now override `defaults.env` on a per-container
basis instead of blindly appending.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The chart has no way to inject environment variables into the containers it manages. Users who need to set chart-wide env — most commonly HTTP_PROXY / HTTPS_PROXY / NO_PROXY for clusters behind a corporate egress proxy, but also any other binary-level configuration — have to fork the chart or maintain external overlays that patch every Deployment, Job, DaemonSet, and sidecar by hand.
Implementation Approach:
The fix is a generic
defaults.envvalue: a list of EnvVar entries that gets merged into every chart-managed container's env block. Every binary the chart deploys honorshttp.ProxyFromEnvironment, so the standard proxy env vars are sufficient for the proxy use case, and the same field covers any other chart-wide env need without a dedicated API per use case.A new
cloudzero-agent.generateEnvhelper inhelm/templates/_helpers.tplperforms the merge. It follows the same precedence pattern as the existinggenerateLabelsandgenerateAnnotationshelpers: a list of sources is merged bynamewith last-wins-by-name semantics, the result is emitted as anenv:block, and nothing renders when the merged result is empty. First-seen wins for list ordering; overrides keep the entry's original position. ThegenerateLabels/generateAnnotationsmechanism merges dicts viamergeOverwrite, whilegenerateEnvmerges over a list keyed byname— the surfaces are analogous at the precedence level but not literally the same.Precedence (lowest → highest priority) inside every container's call:
.Values.defaults.env— chart-wide user override; lowest priority..Values.server.envon the Prometheus container) — wins over the chart-wide override.validatorEnv).SERVER_PORT,NODE_NAME,HOSTNAMEfieldRefs) — highest priority. These are load-bearing for chart correctness and must not be overridable from values.To override a chart-emitted entry on a single component (for example, to tweak
K8S_NAMESPACEon the Prometheus server only), use that component's env value rather thandefaults.env.Functional Requirements:
A user must be able to set environment variables on every container the chart manages with a single values change.
Added
defaults.env(typed as a list of K8s EnvVar entries) tohelm/values.yamlandhelm/values.schema.yaml. Every container inhelm/templates/*.yamlnow callsgenerateEnvwith.Values.defaults.envas the LOWEST-priority source — aggregator collector and shipper, agent prometheus-server, agent alloy, both validator init containers, the configmap-reload sidecars, the agent daemonset's config-subst init and prometheus-server, the webhook server, the backfill init-scrape, the config-loader run-validator, helmless, and init-cert. The kube-state-metrics subchart is intentionally untouched (no outbound traffic; users can setkubeStateMetrics.envdirectly if needed).Chart-emitted env entries (validatorEnv, hardcoded SERVER_PORT / NODE_NAME / HOSTNAME) must continue to render and win over
defaults.envon name collision, and.Values.server.envmust continue to apply to the Prometheus container as a middle-tier override.generateEnvtakes a list of env-entry lists and merges them in order, with later sources overriding earlier sources onnamecollision. Each call site places.Values.defaults.envfirst (lowest priority), then.Values.server.envwhere applicable (middle), then chart-emitted helpers / literals last (highest).The values schema must enforce the shape of
defaults.envat chart- render time, and the enforcement must be regression-tested.defaults.envreferences the K8sio.k8s.api.core.v1.EnvVar$refinhelm/values.schema.yaml, so malformed entries (wrong type, missingname, badvalueFrom) are rejected before the template engine sees them. Addedtests/helm/schema/defaults.env.valid.pass.yamlandtests/helm/schema/defaults.env.invalid.fail.yamlsomake helm-test-schemaexercises a typical value list (withvalue,secretKeyRef, andfieldRefshapes) against the valid path and anenv: "not-an-array"string against the fail path.The values.yaml comment for
defaults.envmust accurately describe what the field does and what NO_PROXY needs to cover.The comment states the precedence rule (defaults.env is lowest; chart-emitted entries win), explains how to override a chart-emitted entry on a specific component, and includes a worked NO_PROXY example with the explanation placed above the entry. It documents the cluster-specific entries the user must supply (pod CIDR, service CIDR, kube-apiserver IP) on top of the standard in-cluster Service DNS suffixes and cloud-provider instance metadata IPs.
The helper's merge contract must be unit-tested.
helm/tests/defaults_env_test.yamlcovers propagation to every chart-managed container, valueFrom preservation as a deep dict, the empty-input case (noenv:block rendered), the precedence rules (defaults.env does NOT override chart-emitted SERVER_PORT or validatorEnv;.Values.server.envDOES override defaults.env on the Prometheus container; first-seen position is preserved when a later source overrides by name), and the backfill CronJob'sspec.jobTemplate.spec.template.spec.containers[0].envpath on documentIndex 0 alongside the Job's flat path on documentIndex 1.Validation:
make helm-testclean (569/569 helm-unittest cases pass plus the new schema validation tests;helm-test-templateregenerates the goldens with the new env blocks;helm-lintclean).defaults.envvia akubectl get pod -o jsonpathsweep across all chart containers and both validator init containers. Confirmed the proxy's access log shows only the intended outbound destination (api.cloudzero.com) and no in-cluster Service hostnames or cloud metadata endpoints. Confirmed the config-loader job reaches api.cloudzero.com and returns HTTP 403 from the upstream (the expected result for a fake API key).validatorEnvis now always emitted on the Prometheus server containers (was previously skipped when.Values.server.envwas unset, leaving the validator lifecycle hooks withoutK8S_NAMESPACE/K8S_POD_NAME); and.Values.server.envcan now overridedefaults.envon a per-container basis instead of blindly appending.