Skip to content

Pixie Vizier NATS healthcheck failed with KubeDNS #2366

@nyadav4-ai

Description

@nyadav4-ai

Describe the bug
While installing Pixie, Vizier can fail its healthcheck even when the pl-nats pod is running and the NATS monitor endpoint is healthy.
The failure appears to happen when Pixie checks the NATS monitor endpoint using the generated pod DNS name:
<pod-ip-with-dashes>.<namespace>.pod.cluster.local:8222
In our cluster, this DNS name was not resolvable from the component performing the healthcheck. As a result, Pixie marked pl-nats-0 pod as unhealthy and vizor operator deleting the pod and installation did not complete successfully.
We temporarily worked around this by adding hostAliases so the generated pod DNS name resolves to the NATS pod IP. However, this is fragile because pod IPs can change and the workaround has to be maintained outside Pixie.

Observed Behavior

  • Pixie install waits for Vizier healthcheck.
  • pl-nats pod is running.
  • NATS monitor endpoint on port 8222 is reachable by pod IP.
  • The healthcheck fails because the generated pod DNS name cannot be resolved.
  • Adding hostAliases for the generated pod DNS name allows the healthcheck to pass.
    Example failing endpoint format:
http://<pod-ip-with-dashes>.<namespace>.pod.cluster.local:8222

Expected behavior
Pixie’s NATS healthcheck should succeed when the NATS monitor endpoint is reachable, even if pod DNS names such as *.pod.cluster.local are not resolvable in the cluster.

For the NATS monitor endpoint specifically, Pixie should avoid relying on pod DNS resolution where TLS hostname validation is not required.

App information (please complete the following information):

  • Pixie version: release/cloud/v0.1.9
  • K8s cluster version: v1.33.9-gke.1060000
  • Node Kernel version: 6.6.122+

Additional context
The GKE cluster is running Kube-DNS(schema "1.0.0") with Node-local-cache enabled.

Recommendation
For StatefullSet/Headless service can we make use of Hostname and Subdomain for pods status?

Existing references:
#1544
#1581

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions