Add nvidia.com/gpu toleration to Ray Serve GPU workloads by tylerpotts · Pull Request #18 · nebari-dev/rayserve-pack

tylerpotts · 2026-06-15T18:53:51Z

Summary

nebari-infrastructure-core auto-taints AWS GPU node groups with nvidia.com/gpu=true:NoSchedule (nebari-dev/nebari-infrastructure-core#370), and EKS has no admission controller to inject a matching toleration. Without one, GPU-requesting Ray Serve head/worker pods stop scheduling onto GPU nodes once the taint lands.

This injects the toleration onto a Ray group's pod template when its resources request nvidia.com/gpu:

tolerations:
- key: "nvidia.com/gpu"
  operator: "Exists"
  effect: "NoSchedule"

operator: Exists matches any taint value, so it works with NIC's value: "true" and any other.

Changes

chart/templates/_helpers.tpl — new nebari-rayserve.tolerations helper. Given a group config (.Values.head/.Values.worker), it appends the nvidia.com/gpu toleration when resources.limits/resources.requests contains nvidia.com/gpu, plus any explicit tolerations from the group config. Renders nothing otherwise.
chart/templates/rayservice.yaml — wired the helper into the head and worker pod templates (only emits a tolerations: block when non-empty).
chart/values.yaml — added documented tolerations: [] to head and worker.

Acceptance criteria

A worker group that requests a GPU gets the toleration (verified via helm template).
Ray pods that do not request a GPU render no tolerations and are unchanged.

Testing

helm lint passes. helm template checked across three cases (output parsed with PyYAML to confirm valid YAML):

Case	Head	Worker
Default (no GPU)	none	none
Worker requests GPU	none	`nvidia.com/gpu`
Head GPU + explicit worker toleration	`nvidia.com/gpu`	explicit + `nvidia.com/gpu`

The acceptance criterion that a GPU pod actually schedules onto a tainted node and serves can't be exercised by the GPU-less CI kind cluster. It can be validated on kind without real GPU hardware by tainting a node nvidia.com/gpu=true:NoSchedule and advertising fake nvidia.com/gpu capacity on it, or end-to-end on real GPU infra — worth confirming before the NIC taint rolls out.

Closes #14

nebari-infrastructure-core auto-taints AWS GPU node groups with nvidia.com/gpu=true:NoSchedule, and EKS has no admission controller to inject a matching toleration. Without one, GPU-requesting Ray Serve head/worker pods stop scheduling onto GPU nodes once the taint lands. Inject the nvidia.com/gpu toleration (operator: Exists, so it matches any taint value) on a Ray group's pod template when its resources request nvidia.com/gpu, via a new nebari-rayserve.tolerations helper. Pods that don't request a GPU render no tolerations and are unchanged. Explicit head.tolerations / worker.tolerations are appended. Closes #14

marcelovilla

LGTM @tylerpotts 🚀 !

Tested this on a live EKS cluster deployed via nebari-dev/nebari-infrastructure-core#370.

I deployed the Helm chart from this branch and added a GPU worker. Can confirm that the worker (requesting GPU resources) got scheduled in a GPU node while the head (which does not request GPU resources) did not get scheduled in the GPU node.

I'm left a small comment but I'm approving. Up to you if you want to address it.

Only inject the nvidia.com/gpu toleration when the group config does not already define a toleration for that key, so a user-provided toleration acts as an intentional override rather than producing a duplicate entry.

oren-openteams · 2026-06-26T19:34:06Z

Late drive-by feedback after verifying #14 against an EKS 1.34 cluster running v0.3.1 of this chart — implementation passes all three rendering scenarios (no GPU / GPU implicit / GPU + explicit user toleration). Full verification posted on the issue: #14 comment.

Two observations worth recording for context:

The premise about EKS turns out to be incorrect

The PR body says "EKS has no admission controller to inject a matching toleration". That isn't what I observed. Reproduction on stock EKS 1.34:

# kubectl apply -- raw pod, no kubespawner, no manual toleration
apiVersion: v1
kind: Pod
metadata: {name: gpu-no-toleration, namespace: default}
spec:
  containers:
  - name: c
    image: nvcr.io/nvidia/cuda:12.4.1-base-ubuntu22.04
    command: ["nvidia-smi"]
    resources: {limits: {nvidia.com/gpu: 1}}

Read back from the apiserver, the persisted pod has {key: nvidia.com/gpu, operator: Exists, effect: NoSchedule} auto-injected. Ruled out all mutating webhooks on the cluster — by elimination it's the ExtendedResourceToleration admission controller, which AWS appears to enable silently (it isn't in their published admission-plugin list, which is probably what led the source issue to the opposite conclusion).

So on managed EKS / GKE / AKS, the chart-side auto-injection here is technically redundant — ERT would do the same job at admission time. On vanilla / kubeadm / on-prem clusters where ERT isn't enabled, this PR is the only thing keeping GPU Ray workloads scheduling.

Why this PR is still correct on EKS

The implementation cleanly composes with ERT for two reasons:

Conditional — only fires when resources.limits or resources.requests contains nvidia.com/gpu. Non-GPU Ray workloads stay clean.
Append + dedupe-on-key — $hasGpuToleration check skips auto-injection when the user already provided one for the nvidia.com/gpu key, and explicit user tolerations are appended rather than replaced.

The combination means on EKS the chart-injected toleration matches what ERT would inject, ERT's idempotency skips the duplicate during admission, and net pod spec is identical with or without this PR. On non-managed clusters it's load-bearing. No false-positive injection, no override of user intent — exactly the shape one wants.

Pattern worth replicating

The analogous data-science-pack docs (PR nebari-dev/data-science-pack#139) recommend a kubespawner_override.tolerations: [...] recipe that I demonstrated silently wipes any global c.KubeSpawner.tolerations configured by the chart (e.g. z2jh's hub.jupyter.org/dedicated=user) for the affected profile, because kubespawner_override overrides rather than merges. The implementation pattern in this PR — conditional auto-injection in chart-rendered pod templates with explicit-overrides-win — is exactly what dsp issue #117 needs. Worth borrowing.

Nice work on the helper logic — leaving the issue verified.

tylerpotts marked this pull request as ready for review June 17, 2026 20:18

marcelovilla self-requested a review June 18, 2026 15:34

marcelovilla approved these changes Jun 18, 2026

View reviewed changes

Comment thread chart/templates/_helpers.tpl

fix: avoid duplicate nvidia.com/gpu toleration when user defines one

e4514c7

Only inject the nvidia.com/gpu toleration when the group config does not already define a toleration for that key, so a user-provided toleration acts as an intentional override rather than producing a duplicate entry.

tylerpotts merged commit b6b54fc into main Jun 18, 2026
2 of 3 checks passed

oren-openteams mentioned this pull request Jun 26, 2026

Add nvidia.com/gpu toleration to Ray Serve GPU workloads #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add nvidia.com/gpu toleration to Ray Serve GPU workloads#18

Add nvidia.com/gpu toleration to Ray Serve GPU workloads#18
tylerpotts merged 2 commits into
mainfrom
feat/gpu-toleration

tylerpotts commented Jun 15, 2026

Uh oh!

marcelovilla left a comment

Uh oh!

Uh oh!

Uh oh!

oren-openteams commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

tylerpotts commented Jun 15, 2026

Summary

Changes

Acceptance criteria

Testing

Uh oh!

marcelovilla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

oren-openteams commented Jun 26, 2026

The premise about EKS turns out to be incorrect

Why this PR is still correct on EKS

Pattern worth replicating

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants