Skip to content

[Feature] Add shared OpenTelemetry producer config#297

Merged
royischoss merged 8 commits into
mlrun:developmentfrom
royischoss:ceml-708
Jun 11, 2026
Merged

[Feature] Add shared OpenTelemetry producer config#297
royischoss merged 8 commits into
mlrun:developmentfrom
royischoss:ceml-708

Conversation

@royischoss

@royischoss royischoss commented May 25, 2026

Copy link
Copy Markdown
Collaborator

📝 Description

Adds a top-level telemetry: block to the umbrella chart so any service in the bundle can read the same OpenTelemetry producer-side config. Today it wires MLRUN_TELEMETRY__* env vars into mlrun-api via the existing
mlrun-common-env ConfigMap; Nuclio and other services are left for follow-up.

Mirrors the producer side of the MLRun PR (mlrun/mlrun#9668) and the analogous mlefi PR (IG4-2293, #532) on the IG4 side.

Resolution table

How the MLRUN_TELEMETRY__* env vars resolve at render time. All four telemetry.* knobs default to "" (fall back to MLRun's own defaults).

Path ENABLED ENDPOINT INSECURE
Zero config false
Collector on (in-cluster) true otel-collector.<ns>.svc.cluster.local:4317 omit → MLRun default true (plaintext, correct for in-cluster)
User endpoint (external, collector off) true user value omit → MLRun default true (user must set false for TLS)
User endpoint + collector on true user value (wins) omit → MLRun default true
User endpoint + explicit insecure=false true user value false (TLS)
enabled=true with no collector and no endpoint false (safety)

MLRUN_TELEMETRY__INSECURE and MLRUN_TELEMETRY__HEADERS_SECRET_NAME are only emitted when the user supplies a value; otherwise mlrun-api falls back to its own defaults (true and "" respectively). The README includes a TLS callout for the external-endpoint path.


🛠️ Changes Made

File Change
values.yaml Add top-level telemetry: block (enabled / otlpEndpoint / insecure / headersSecretName), sibling to mlrun: / nuclio:
templates/config/mlrun-env-configmap.yaml Append MLRUN_TELEMETRY__* keys with resolution + safety logic
README.md Document the opt-in knobs + truth table + external-endpoint example
Chart.yaml Bump to 0.12.0-rc.1

✅ Checklist

  • I have tested the changes in this PR
  • I confirmed whether my changes require a change in documentation and if so, I created another PR in MLRun for the relevant documentation.
  • I confirmed whether my changes require a changes in QA tests, for example: credentials changes, resources naming change and if so, I updated the relevant Jira ticket for QA.
  • I increased the Chart version in charts/mlrun-ce/Chart.yaml.
  • I confirmed that the installation works both on a local Docker Desktop environment and on a real cluster when using the required prerequisites.
    • If installation issues were found, I updated the relevant Jira ticket with the issue and steps to reproduce, or updated the prerequisites documentation if the issue is related to missing or outdated prerequisites.
  • If needed, update https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/README.md with the relevant installation instructions and version Matrix.
  • If needed, update the following values files for multi namespace support:

🧪 Testing

Live cluster verification on app1.vmdev232ig4.lab.iguaz.io:
- Test 1 (no OTel): mlrun-common-env ConfigMap shows MLRUN_TELEMETRY__ENABLED: "false"
- Test 2 (OTel enabled): ConfigMap shows ENABLED: "true", OTLP_ENDPOINT: otel-collector.mlrun.svc.cluster.local:4317
- Probe pod with same envFrom: configMapRef: mlrun-common-env receives all three env vars at runtime


🔗 References


🚨 Breaking Changes?

  • Yes (explain below)
  • No

🔍️ Additional Notes

  - values.yaml — top-level telemetry: block with "" default for enabled (inherits collector state)
  - templates/config/mlrun-env-configmap.yaml — resolution logic + safety override + endpoint derivation
  - README.md — opt-in docs with truth table and external-endpoint example
  - Chart.yaml — bumped to 0.11.0-rc.38

@davesh0812 davesh0812 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good overall.
please add some unit tests if you want you can use those tests I added in mlefi :

  - test_telemetry_default_inherits_collector_disabled — default values, expect MLRUN_TELEMETRY__ENABLED: "false", no endpoint key.
  - test_telemetry_inherits_collector_enabled — --set opentelemetry.collector.enabled=true, expect ENABLED: "true" + in-cluster endpoint.
  - test_telemetry_external_endpoint — --set telemetry.enabled=true --set telemetry.otlpEndpoint=external.com:4317 --set opentelemetry.collector.enabled=false, expect endpoint passes through.
  - test_telemetry_safety_force_disable — --set telemetry.enabled=true with collector off and no endpoint, expect ENABLED: "false".
  - test_telemetry_headers_secret_emitted_only_when_enabled — headersSecretName set but enabled=false, expect no HEADERS_SECRET_NAME key.

Comment thread CLAUDE.md Outdated
  Review feedback from davesh0812 on PR mlrun#297:

  - Add 5 helm-template-test.sh functions covering the telemetry
    resolution matrix (14 asserts):
      * test_telemetry_default_inherits_collector_disabled
      * test_telemetry_inherits_collector_enabled
      * test_telemetry_external_endpoint
      * test_telemetry_safety_force_disable
      * test_telemetry_headers_secret_emitted_only_when_enabled
  - Apply suggested fix to CLAUDE.md line 7 (drop stray "-it ").
  - Bump chart version to 0.11.0-rc.39.
Comment thread charts/mlrun-ce/README.md Outdated
Comment thread charts/mlrun-ce/templates/config/mlrun-env-configmap.yaml Outdated
Comment thread charts/mlrun-ce/values.yaml
@royischoss royischoss requested a review from GiladShapira94 June 9, 2026 11:48
@royischoss royischoss merged commit 12bcfee into mlrun:development Jun 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants